ML Practicum: Image Classification

Leveraging Pretrained Models

Training a convolutional neural network to perform image classification tasks typically requires an extremely large amount of training data, and can be very time-consuming, taking days or even weeks to complete. But what if you could leverage existing image models trained on enormous datasets, such as via TensorFlow-Slim, and adapt them for use in your own classification tasks?

One common technique for leveraging pretrained models is feature extraction: retrieving intermediate representations produced by the pretrained model, and then feeding these representations into a new model as input. For example, if you're training an image-classification model to distinguish different types of vegetables, you could feed training images of carrots, celery, and so on, into a pretrained model, and then extract the features from its final convolution layer, which capture all the information the model has learned about the images' higher-level attributes: color, texture, shape, etc. Then, when building your new classification model, instead of starting with raw pixels, you can use these extracted features as input, and add your fully connected classification layers on top. To increase performance when using feature extraction with a pretrained model, engineers often fine-tune the weight parameters applied to the extracted features.

For a more in-depth exploration of feature extraction and fine tuning when using pretrained models, see the following Exercise.