Certainty threshold for a convnet in Keras

Suppose you want to fine-tune a convnet such as VGG16 in Keras. You normally replace the final dense layer with your own dense (+softmax) layer that has as many categories as you need, and freeze all the other layers (assuming they are preloaded with useful weights). Then you train the net on your dataset.
 t = Dense(num_categories, activation='softmax', name='predictions')(vgg16.layers[-1].output)  
 classifier = Model(inputs=vgg16.input, outputs=classifier_output)  

Now suppose you want to know not just the relative certainty of each output (which is what a softmax gives you), but something like an "absolute certainty" for each neuron? You should be able to look at the values before the softmax activation. In that case, you can split out the activation layer:
 t = Dense(num_categories, activation=None, name='predictions')(vgg16.layers[-1].output)  
 classifier_output = Activation('softmax')(t)  
 classifier = Model(inputs=vgg16.input, outputs=classifier_output)  

And now suppose we want to train a single binary neuron that will fire only if one of those dense neurons' values exceeds some threshold. You don't know the threshold in advance, but you suspect that for faces belonging to any of the trained categories, its respective neuron should generate a high value, and for other faces none of them will be very high. So you generate training data where faces from the existing categories are true and all other faces are false.

A first guess would be to use a MaxPooling1D layer to collect the maximum of the previous layer, and then feed it to a single dense neuron with sigmoid, so that it can be scaled and produce an output in [0, 1].

Even simpler is to use GlobalMaxPool1D, which instead of using a sliding window just gets the max of all input values. But that layer still requires a 3D input of (batch_size, steps, features). It's not clear to me from the doc what "steps" and "features" mean, but if you look at the impl it's just taking the max along dimension 1 (steps). Presumably features is something like layers.

Our 'predictions' layer is only 2D (batch_size, steps), so we need to make it 3D. First guess:
 global_max = GlobalMaxPool1D()(K.expand_dims(t))  
 yesno_output = Dense(1, activation='sigmoid')(global_max)  
 yesno = Model(inputs=vgg16.input, outputs=yesno_output)  

Alas, that produces a strange error:

AttributeError: 'Tensor' object has no attribute '_keras_history'

It turns out you have to use a Lambda:
 expanded = Lambda(lambda x: K.expand_dims(x))(t)  
 global_max = GlobalMaxPool1D()(expanded)  
 yesno_output = Dense(1, activation='sigmoid')(global_max)  
 yesno = Model(inputs=vgg16.input, outputs=yesno_output)  

And voila, everything works!

No comments:

Post a Comment

Maximum Likelihood Estimation for dummies

What is Maximum Likelihood Estimation (MLE)? It's simple, but there are some gotchas. First, let's recall what likelihood  is. ...