In a 2013 Kaggle competition, people need to write an algorithm to distinguish whether the animal in an image is a dog or a cat. It's an easy task for human, but for a machine, it may not be so.
Internet protection usually faces a challenge. It should be easily identified by a human and let machine unable to distriguish. For example, verification code can effectively reduce spam mail and prevent users' password from malicious crack.
Asirra (Animal Species Image Recognition for Restricting Access) is a HIP (Human Interactive Proof) designed by Microsoft Research Labs that works by asking users to identify photographs of cats and dogs. Asirra has a lot of photos of different cats and dogs since its partnerships with the world largest website devoted to finding homes for homeless pets, pet finder. They have provided three million images of cats and dogs. We will use its subset as the training dataset. The dataset can be downloaded from Kaggle.
In this task, we can write a neural network on our own. There is a possibility to find our network is not so effective. It not only has low accuracy but also converges slowly or doesn't converge. At this time, we can use mature models, such as VggNet, GoogleNet, ResNet etc, to help us solve these problems. These excellent networks are implemented by world-leading deep learning laboratories after numerous trials and error and are the champions or the second places in ImageNet. As a result, using these networks can guarantee a degree of performance. Nowadays, the threshold of deep learning has become lower and lower. On one hand, current frameworks make writing a network very easy. On the other hand, these laboratories are willing to open source their models and experiment results.
We can use existing models to train on other datasets and fine tune them. However, this brings some problems of computation resource since running an experiment can consume large computation resource for a large dataset. Sometimes, we don't have such powerful resource. Is it means that there is no other way? No, it's not. With transfer learning, it can let people without powerful computation resource to accomplish the training of complex models in deep learning.
In a classic supervised learning of machine learning, if we are training a model for task A, we provide the data and label of task A. Now we have trained a model A from the given dataset and expect it can perform well in the unknown data of the same task. In another scenario, given the data and label of task B, we can do the same thing.
But in some cases, there is not enough dataset for a specific task. Then classic supervised learning can't support it. Transfer learning enables borrowing existing data and label from relative tasks to solve this situation, preserves the information of solving relative tasks, and applies it to our target mission.
As a consequence, we can use pretrained neural networks in ImageNet to perform transfer learning. These pretrained networks contain the information, the weights and parameters, to classify the 1000 classes in Imagenet, including cats and dogs.
Convolution neural network consists of two parts, convolution layers and classifying layers. What convolution layers mainly do is extracting features in images and the effect of feature extracting in pretrained networks are very good since the networks have already learned the necessary weghits. In our task, binary classification of cats and dogs, we use fully connected classifying layers.
To summarize, we transfer the pretrained convolution layers, only update the weights of fully connected layers. and obtain our target of binary classification.
Finally, transfer learning may not be approiate for any scenario. As previously mentioned, it has to be relative tasks. As a result, transfer learning works well on similar dataset. For example, the weights of a pretrained network is trained from classifying naural landscape. Then using these weights to do face recognition may not obtain a good result since the feature extraction of human faces is different from that of the natural landscape and the corresponding trained weights is different.
After download the data from Kaggle, there is a file called all.zip. Put it into the directory of data. In the directory of data, perform these three bash commands:
unzip all.zip
unzip train.zip
unzip test.zip
Now, there should be a directory train that contains the training images, test that contains the testing images, and sample_submission.csv for the sample submission file. And you are ready to go.
import os
import operator
import cv2
from tqdm import tqdm
import h5py
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.cm as cm
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.optim as optim
from torch.autograd import Variable
import torchvision
import torchvision.models as models
import torchvision.transforms as transforms
DATA_PATH = 'data'
TRAIN_PATH = 'data/train'
TEST_PATH = 'data/test'
classes = ('dog', 'cat')
print(os.listdir(DATA_PATH))
IMG_SIZE = (224, 224)
img_classes = 2
BATCH_SIZE = 512
NB_EPOCH = 5
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
print(device)
train_img_list = []
train_label_list = []
test_img_dict = {}
img_size_dict = {}
for file in os.listdir(TRAIN_PATH):
img = cv2.imread(os.path.join(TRAIN_PATH, file))
train_img_list.append(img)
img_size_dict[img.shape] = img_size_dict.get(img.shape, 0) + 1
if 'dog' in file:
train_label_list.append(0)
else:
train_label_list.append(1)
for file in os.listdir(TEST_PATH):
img = cv2.imread(os.path.join(TEST_PATH, file))
test_img_dict[int(file.replace('.jpg', ''))] = img
img_size_dict[img.shape] = img_size_dict.get(img.shape, 0) + 1
print('There are {} training data and {} testing data'.format(len(train_img_list), len(test_img_dict)))
print('Among them, there {} dogs and {} cats in training data'.format(train_label_list.count(0), train_label_list.count(1)))
fig = plt.figure('Example of a dog and a cat')
ax0 = fig.add_subplot(1, 2, 1)
ax0.imshow(train_img_list[20010])
ax0.axis('off')
ax1 = fig.add_subplot(1, 2, 2)
ax1.imshow(train_img_list[1])
ax1.axis('off')
plt.suptitle('Example of a Dog and a Cat')
plt.show()
print('There are {} different image sizes'.format(len(img_size_dict)))
most_10_img_size_dict = dict(sorted(img_size_dict.items(), key=operator.itemgetter(1), reverse=True)[:10])
plt.bar(range(len(most_10_img_size_dict)), list(most_10_img_size_dict.values()), align='center')
plt.xticks(range(len(most_10_img_size_dict)), list(most_10_img_size_dict.keys()), rotation='vertical')
plt.title('10 Most Common Image Size')
plt.show()
print('Resize images to {}'.format(IMG_SIZE))
for i, img in enumerate(train_img_list):
train_img_list[i] = cv2.resize(train_img_list[i], IMG_SIZE)
for key in test_img_dict:
test_img_dict[key] = cv2.resize(test_img_dict[key], IMG_SIZE)
train_img = np.array(train_img_list)
train_mean = np.mean(train_img, axis=(0, 1, 2), keepdims=True)
train_std = np.std(train_img, axis=(0, 1, 2), keepdims=True)
print('Traing image mean {} and std {}'.format(train_mean, train_std))
# zero mean and unit variance
train_img = (train_img - train_mean) / train_std
for key in test_img_dict:
test_img_dict[key] = (test_img_dict[key] - train_mean[0]) / train_std[0]
x_train, x_val, y_train, y_val = train_test_split(train_img, train_label_list, test_size = 0.1)
# NCHW format
tensor_train_img = torch.stack([torch.Tensor(i).permute(2, 0, 1) for i in x_train])
tensor_train_label = torch.stack([torch.LongTensor([i]) for i in y_train]).view(-1)
train_dataset = torch.utils.data.TensorDataset(tensor_train_img, tensor_train_label)
train_dataloader = torch.utils.data.DataLoader(train_dataset,
batch_size=BATCH_SIZE,
shuffle=True)
tensor_val_img = torch.stack([torch.Tensor(i).permute(2, 0, 1) for i in x_val])
tensor_val_label = torch.stack([torch.LongTensor([i]) for i in y_val]).view(-1)
val_dataset = torch.utils.data.TensorDataset(tensor_val_img, tensor_val_label)
val_dataloader = torch.utils.data.DataLoader(val_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
tensor_test_img = torch.stack([torch.Tensor(test_img_dict[key]).permute(2, 0, 1) for key in test_img_dict])
test_dataset = torch.utils.data.TensorDataset(tensor_test_img)
test_dataloader = torch.utils.data.DataLoader(test_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
net = models.resnet18(pretrained=True)
dim_in = net.fc.in_features
net.fc = nn.Linear(dim_in, img_classes)
net = net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-3)
def test(net, dataloader):
correct = 0
total = 0
with torch.no_grad():
for data in dataloader:
images, labels = data
images, labels = images.to(device), labels.to(device)
outputs = net(images)
_, predicted = torch.max(outputs.data, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accrracy: %d %%' % (100 * correct / total))
def train(net, dataloader, val_dataloader, optimizer, criterion):
for epoch in range(NB_EPOCH):
running_loss = 0.0
for i, data in enumerate(dataloader, 0):
inputs, labels = data
inputs, labels = inputs.to(device), labels.to(device)
optimizer.zero_grad()
outputs = net(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 40 == 39:
print('[%d, %5d] loss: %.3f' % (epoch + 1, i + 1, running_loss / 40))
test(net, val_dataloader)
running_loss = 0.0
print('Finished Training')
train(net, train_dataloader, val_dataloader, optimizer, criterion)
After uploading pretrained_prediction.csv to Late Submission, the result is
Log Loss: 0.13165
def predict(net, dataloader):
predicted_list = []
prob = nn.Softmax(dim = 1)
with torch.no_grad():
for data in dataloader:
images = data[0]
images = images.to(device)
outputs = prob(net(images))
predicted_list.append(outputs[:, 0].cpu().data.numpy())
return np.concatenate(predicted_list, axis=0)
predicted = predict(net, test_dataloader)
df = pd.read_csv(os.path.join(DATA_PATH, 'sample_submission.csv'), index_col='id')
for i, key in enumerate(test_img_dict):
df.at[key, 'label'] = predicted[i]
df.to_csv('pretrained_prediction.csv')
net = models.resnet18(pretrained=True)
for param in net.parameters():
param.requires_grad = False
dim_in = net.fc.in_features
net.fc = nn.Linear(dim_in, img_classes)
net = net.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam (net.fc.parameters(), lr=1e-3)#, momentum=0.9)
train(net, train_dataloader, val_dataloader, optimizer, criterion)
After uploading pretrained_prediction.csv to Late Submission, the result is
Log Loss: 0.11055
predicted = predict(net, test_dataloader)
df = pd.read_csv(os.path.join(DATA_PATH, 'sample_submission.csv'), index_col='id')
for i, key in enumerate(test_img_dict):
df.at[key, 'label'] = predicted[i]
df.to_csv('fixed_pretrained_prediction.csv')
In the third section, we combine multiplce pretrained models, fix their weights of convolution layers, and only update the weights of last fully connected layer. Since there is no update in the convolution layers, the results of forwarding through convolution layers are the same. There is no need to forward the whole dataset in every iteration. We can store the forwarding result as feature vectors after one iteration of forwarding.
feature_net is the feature extracting network. It can accept vgg, inceptionv3, and resnet152 as the input of parameter model, representing the 19-layered Vgg network, Inception V3, or 152-layered Residual network. With these pretrained network, we remove their fully connected layers, add average pooling layers, and transform the dataset into feature vectors.
classifier is the fully connected classifier network for our dataset with Dropout preventing overfitting.
Then we can input the dataset into the network to perform forwarding, obtain their feature vectors, and store the result into h5 files.
BATCH_SIZE = 4
train_dataloader = torch.utils.data.DataLoader(train_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
val_dataloader = torch.utils.data.DataLoader(val_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
test_dataloader = torch.utils.data.DataLoader(test_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
model_list = ['vgg', 'inceptionv3', 'resnet152']
feature_dim = {}
class feature_net(nn.Module):
def __init__(self, model):
super(feature_net, self).__init__()
if model == 'vgg':
vgg = models.vgg19(pretrained=True)
self.feature = nn.Sequential(*list(vgg.children())[:-1])
self.feature.add_module('global average', nn.AvgPool2d(7))
elif model == 'inceptionv3':
inception = models.inception_v3(pretrained=True)
self.feature = nn.Sequential(*list(inception.children())[:-1])
self.feature._modules.pop('13')
self.feature.add_module('global average', nn.AvgPool2d(26))
elif model == 'resnet152':
resnet = models.resnet152(pretrained=True)
self.feature = nn.Sequential(*list(resnet.children())[:-1])
def forward(self, x):
x = self.feature(x)
x = x.view(x.size(0), -1)
return x
class classifier(nn.Module):
def __init__(self, dim, n_classes):
super(classifier, self).__init__()
self.fc = nn.Sequential(
nn.Linear(dim, 1000),
nn.ReLU(True),
nn.Dropout(0.5),
nn.Linear(1000, n_classes)
)
def forward(self, x):
x = self.fc(x)
return x
h5_list = {}
for model in model_list:
for phase, dataloader in zip(['train', 'val', 'test'], [train_dataloader, val_dataloader, test_dataloader]):
featurenet = feature_net(model).to(device)
feature_map = torch.FloatTensor()
label_map = torch.LongTensor()
for data in tqdm(dataloader):
if phase != 'test':
img, label = data
else:
img = data[0]
img = Variable(img).to(device)
out = featurenet(img)
feature_map = torch.cat((feature_map, out.cpu().data), 0)
if phase != 'test':
label_map = torch.cat((label_map, label), 0)
feature_map = feature_map.numpy()
label_map = label_map.numpy()
file_name = '{}_feature_{}.hd5f'.format(phase, model)
h5_path = file_name
phase_list = h5_list.get(phase, [])
phase_list.append(file_name)
h5_list[phase] = phase_list
with h5py.File(h5_path, 'w') as h:
h.create_dataset('data', data=feature_map)
if phase != 'test':
h.create_dataset('label', data=label_map)
feature_dim[model] = feature_map.shape[1]
class h5Dataset(torch.utils.data.Dataset):
def __init__(self, h5py_list, nSamples=None, train=True):
label_file = h5py.File(h5py_list[0], 'r')
if train:
self.label = torch.from_numpy(label_file['label'].value)
self.nSamples = len(label_file['data'].value)
temp_dataset = torch.FloatTensor()
for file in h5py_list:
h5_file = h5py.File(file, 'r')
dataset = torch.from_numpy(h5_file['data'].value)
temp_dataset = torch.cat((temp_dataset, dataset), 1)
self.train = train
self.dataset = temp_dataset
def __len__(self):
return self.nSamples
def __getitem__(self, index):
assert index < len(self), 'index range error'
data = self.dataset[index]
if self.train:
label = self.label[index]
return (data, label)
else:
return (data,)
After uploading pretrained_prediction.csv to Late Submission, the result is
Log Loss: 0.09273
BATCH_SIZE = 128
train_dataset = h5Dataset(h5_list['train'])
train_dataloader = torch.utils.data.DataLoader(train_dataset,
batch_size=BATCH_SIZE,
shuffle=True)
val_dataset = h5Dataset(h5_list['val'])
val_dataloader = torch.utils.data.DataLoader(val_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
test_dataset = h5Dataset(h5_list['test'], train=False)
test_dataloader = torch.utils.data.DataLoader(test_dataset,
batch_size=BATCH_SIZE,
shuffle=False)
dim = 0
for key in feature_dim:
dim += feature_dim[key]
net = classifier(dim, img_classes).to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(net.parameters(), lr=1e-3)
train(net, train_dataloader, val_dataloader, optimizer, criterion)
predicted = predict(net, test_dataloader)
df = pd.read_csv(os.path.join(DATA_PATH, 'sample_submission.csv'), index_col='id')
for i, key in enumerate(test_img_dict):
df.at[key, 'label'] = predicted[i]
df.to_csv('feature_prediction.csv')
f, axes = plt.subplots(1, 4, figsize = (12, 10))
for i in range(4):
axes[i].imshow((x_val[i] * train_std[0] + train_mean[0]).astype(int))
axes[i].set_title('%5s' % classes[y_val[i]])
plt.suptitle('GroundTruth')
plt.tight_layout(rect=[0, 0.6, 1, 1])
plt.show()
dataiter = iter(val_dataloader)
tensors, labels = dataiter.next()
outputs = net(tensors.to(device))
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]]
for j in range(4)))
With transfer learning , we can use pretrained network to fine tune the accuracy of neural network, use convolution layers to perform feature extraction, and save computation resource with it.