Image Captioning with Mxnet

This code is modified from mxnet-image-caption.

What do you see in the below picture?

Surfer

Well, some of you might say "A man is surfing on a wave", some might say "Surfer in the ocean riding a large wave", and yet some others might say "A man is riding a wave in the ocean". All of these answers are saying the same thing with different words and different word orders. Such a task that prodices a caption given an images is called image captioning. There are several existing online services providing this API, such as CaptionBot powered by Microsoft Cognitive Service.

Image2Captioning

This code implements the paper, Show and Tell: A Neural Image Caption Generator and the model is trained on MSCOCO dataset. Here, we only use the dataset of a set of images each with 1 caption and their feature vectors ($1\times2048$).

System Versioning

  • OS: Ubuntu 16.04.6 LTS
  • Python: 3.5.2
  • CUDA: V10.0.130
  • CUDNN: 7_7.4.1.5

Python Package Versioning

In [1]:
import os
import cv2
import sys
import time
import urllib
import pickle
import logging
import numpy as np
import matplotlib
import matplotlib.pyplot as plt
from collections import namedtuple

import mxnet as mx
from mxnet.io import DataBatch


import types
def imports():
    for name, val in globals().items():
        if isinstance(val, types.ModuleType):
            yield val.__name__, val.__version__ if hasattr(val, '__version__') else 'NaN'
for name, version in list(imports()):
    print(name, version)
    
!wget -nc https://www.dropbox.com/s/vgtvff6zt15k6m4/captiondata10k.pickle?dl=1 --output-document captiondata10k.pickle
!wget -nc https://www.dropbox.com/s/dnsre7x5dxhxlfw/captiondataval10k.pickle?dl=1 --output-document captiondataval10k.pickle
pickle NaN
logging 0.5.1.2
builtins NaN
numpy 1.17.0
os NaN
builtins NaN
sys NaN
time NaN
urllib NaN
types NaN
matplotlib.pyplot NaN
matplotlib 3.0.3
cv2.cv2 4.1.0
mxnet 1.4.1
--2019-09-07 19:47:46--  https://www.dropbox.com/s/vgtvff6zt15k6m4/captiondata10k.pickle?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1, 2620:100:6032:1::a27d:5201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/dl/vgtvff6zt15k6m4/captiondata10k.pickle [following]
--2019-09-07 19:47:47--  https://www.dropbox.com/s/dl/vgtvff6zt15k6m4/captiondata10k.pickle
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com/cd/0/get/AoFgRh3YMuAyM60rfRQzQdkhbZxafxXQYonvMVepqMAmWF-ugud4FHABX52UfQgOPfKc2MPdwwwOhtmpnFQuZfZxkFHXW7Qh3O0OIZr6ta3PvA/file?dl=1# [following]
--2019-09-07 19:47:47--  https://uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com/cd/0/get/AoFgRh3YMuAyM60rfRQzQdkhbZxafxXQYonvMVepqMAmWF-ugud4FHABX52UfQgOPfKc2MPdwwwOhtmpnFQuZfZxkFHXW7Qh3O0OIZr6ta3PvA/file?dl=1
Resolving uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com (uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com)... 162.125.82.6, 2620:100:6032:6::a27d:5206
Connecting to uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com (uc8e114488ff25c37f7a6d5429e9.dl.dropboxusercontent.com)|162.125.82.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21548903 (21M) [application/binary]
Saving to: ‘captiondata10k.pickle’

captiondata10k.pick 100%[===================>]  20.55M   746KB/s    in 40s     

2019-09-07 19:48:28 (525 KB/s) - ‘captiondata10k.pickle’ saved [21548903/21548903]

--2019-09-07 19:48:28--  https://www.dropbox.com/s/dnsre7x5dxhxlfw/captiondataval10k.pickle?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1, 2620:100:6032:1::a27d:5201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/dl/dnsre7x5dxhxlfw/captiondataval10k.pickle [following]
--2019-09-07 19:48:28--  https://www.dropbox.com/s/dl/dnsre7x5dxhxlfw/captiondataval10k.pickle
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com/cd/0/get/AoFobLp_3SpXG9sWEl1_k841ta08Yr4wYTQ6dSl7FTqBWl-cZtm7d1w_NnMXTVAnQKCnlHDZ2OtQ3u7y5d6rEmr3AX4BpCCt2b2tFLizBuya4g/file?dl=1# [following]
--2019-09-07 19:48:29--  https://ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com/cd/0/get/AoFobLp_3SpXG9sWEl1_k841ta08Yr4wYTQ6dSl7FTqBWl-cZtm7d1w_NnMXTVAnQKCnlHDZ2OtQ3u7y5d6rEmr3AX4BpCCt2b2tFLizBuya4g/file?dl=1
Resolving ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com (ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com)... 162.125.82.6, 2620:100:6032:6::a27d:5206
Connecting to ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com (ucf02e66e4a867a707dfdd6f689a.dl.dropboxusercontent.com)|162.125.82.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 21553091 (21M) [application/binary]
Saving to: ‘captiondataval10k.pickle’

captiondataval10k.p 100%[===================>]  20.55M  17.7MB/s    in 1.2s    

2019-09-07 19:48:31 (17.7 MB/s) - ‘captiondataval10k.pickle’ saved [21553091/21553091]

Loading COCO Dataset

allwords contains all the feature vectors and their corresponding 1 caption.

Image feature vector is the output of the last layer before the fully connectted layer of ResNet-50 in Mxnet Model Zoo. There are total 16 residual blocks, 1 shortcut connection and 3 convolution layers in each block as shown below.

Residual Block

A caption is a list of word indices. < S > is the start of a sentence and < \S > is the end of a sentence.

allindexes is the indices of all images.

vocabwords is the dictionary from word to id.

vocabids is the dictionary from id to word.

In [2]:
with open('captiondata10k.pickle', 'rb') as f:
    [allwords, allindexes, vocabwords, vocabids, _] = pickle.load(f, encoding='latin1')

key = list(allwords.keys())[0]
print('feature vector: ', allwords[key][0])
print('caption indexes: ', allwords[key][1])
print('imgid: ', allindexes[key])
key = list(vocabwords.keys())[0]
print('word -> index: ', key, '->', vocabwords[key])
key = list(vocabids.keys())[0]
print('index -> word: ', key, '->', vocabids[key])

words = []
for key in allwords[key][1]:
    words.append(vocabids[key])
print('caption word: ', words)
feature vector:  [[0.23896003 0.         0.9243923  ... 0.42736265 0.5047168  1.5080156 ]]
caption indexes:  [1, 19, 1019, 7, 14, 133, 490, 81, 211, 3, 2]
imgid:  262145
word -> index:  curtained -> 4550
index -> word:  0 -> a
caption word:  ['<S>', 'people', 'shopping', 'in', 'an', 'open', 'market', 'for', 'vegetables', '.', '</S>']

LSTM cell and LSTM network

i2h is the fully connected layer from input $x_t$ to the gates.

h2h is the fully connected layer from previous hidden state $h_{t-1}$ to the gates.

There are 4 gates (forget gate forget_gate, input gate in_gate, output gate out_gate, and candidate memory in_transformation).

This is an illustration of a LSTM cell.

LSTM cell

In [3]:
'''
module that defines lstm network that is used for image captioning
'''

LSTMState = namedtuple("LSTMState", ["c", "h"])
LSTMParam = namedtuple("LSTMParam", ["i2h_weight", "i2h_bias",
                                     "h2h_weight", "h2h_bias"])
LSTMModel = namedtuple("LSTMModel", ["rnn_exec", "symbol",
                                     "init_states", "last_states",
                                     "forward_state", "backward_state",
                                     "seq_data", "seq_labels", "seq_outputs",
                                     "param_blocks"])


def lstmcell(num_hidden, indata, prev_state, param, seqidx, layeridx, dropout=0.3):
    '''
    Defines an LSTM cell
    Args:
        num_hidden: number of hidden units
        indata: input data to LSTM cell
        prev_state: previous state vector
        param: parameter for this LSTM (weights and biases)
        seqidx: sequence id
        layeridx: layer index (0 - first layer, 1 - second layer). Useful for
            bi-directional LSTM
        dropout: fraction of the input that gets dropped out during training 
            time
    Returns:
        LSTM cell object
    '''
    indata = mx.sym.Dropout(data=indata, p=dropout)

    i2h = mx.sym.FullyConnected(data=indata,
                                weight=param.i2h_weight,
                                bias=param.i2h_bias,
                                num_hidden=num_hidden * 4,
                                name="t%d_l%d_i2h" % (seqidx, layeridx))
    h2h = mx.sym.FullyConnected(data=prev_state.h,
                                weight=param.h2h_weight,
                                bias=param.h2h_bias,
                                num_hidden=num_hidden * 4,
                                name="t%d_l%d_h2h" % (seqidx, layeridx))
    gates = i2h + h2h
    slice_gates = mx.sym.SliceChannel(gates, num_outputs=4,
                                      name="t%d_l%d_slice" %
                                      (seqidx, layeridx))
    in_gate = mx.sym.Activation(slice_gates[0], act_type="sigmoid")
    in_transform = mx.sym.Activation(slice_gates[1], act_type="tanh")
    forget_gate = mx.sym.Activation(slice_gates[2], act_type="sigmoid")
    out_gate = mx.sym.Activation(slice_gates[3], act_type="sigmoid")
    next_c = (forget_gate * prev_state.c) + (in_gate * in_transform)
    next_h = out_gate * next_c

    return LSTMState(c=next_c, h=next_h)

def build_lstm_network(seq_len, input_size, num_hidden, num_embed, num_label,
                       prediction=False):
    '''
    Build the LSTM network
    Args:
        seq_len: length of the sequence - number of times to unroll
        input_size: input vector dimension
        num_hidden: number of hiddent units
        num_embed: output dimension for the embedding unit
        num_label: output dimension for the fully-connected unit
        prediction: True if used for prediction, False if for training
    Returns:
        LSTM network symbol
    '''
    embed_weight = mx.sym.Variable("embed_weight")
    cls_weight = mx.sym.Variable("cls_weight")
    cls_bias = mx.sym.Variable("cls_bias")

    # input image feature vector
    data = mx.sym.Variable('data')
    # word indices
    label = mx.sym.Variable('softmax_label')
    # one-hot encoding of word indices
    veclabel = mx.sym.Variable('veclabel')
    # veclabel = mx.sym.Reshape(veclabel, shape=(-1, seq_len)) # https://github.com/apache/incubator-mxnet/issues/7178
    
    name = 'l0'
    param = LSTMParam(i2h_weight=mx.sym.Variable(name+"_i2h_weight"),
                      i2h_bias=mx.sym.Variable(name+"_i2h_bias"),
                      h2h_weight=mx.sym.Variable(name+"_h2h_weight"),
                      h2h_bias=mx.sym.Variable(name+"_h2h_bias"))
    lstm_state = LSTMState(c=mx.sym.Variable(name+"_init_c"),
                           h=mx.sym.Variable(name+"_init_h"))
    allsm = []
    # label indices
    labelidx = mx.sym.SliceChannel(data=label, num_outputs=seq_len,
                                   squeeze_axis=1)
    # label one-hot vector
    labelvec = mx.sym.SliceChannel(data=veclabel, num_outputs=seq_len,
                                   squeeze_axis=1)
    output = ''
    targetlen = seq_len
    if prediction:
        # increase seq_len to generate till stop words during testing
        # it is a hack for now
        targetlen = seq_len + 10
    for seqidx in range(targetlen):
        k = seqidx
        # testing may use more than seq_len, hence reuse the last input
        # as dummy labels for softmax
        if k >= seq_len:
            k = seq_len - 1
        # first iteration use image feature as input
        if k == 0:
            hidden = data
        else:
            # if in prediction mode and not in first iteration use the
            # system output generated in previous timestep as input
            if prediction & (k > 1):
                embed = mx.sym.Embedding(data=output,
                                         input_dim=input_size,
                                         weight=embed_weight,
                                         output_dim=num_embed, name='embed')
            else:
                embed = mx.sym.Embedding(data=labelvec[k-1],
                                         input_dim=input_size,
                                         weight=embed_weight,
                                         output_dim=num_embed, name='embed')
            hidden = embed
            
        next_state = lstmcell(num_hidden, indata=hidden,
                              prev_state=lstm_state,
                              param=param, seqidx=k, layeridx=0)
        hidden = next_state.h
        lstm_state = next_state
        if k == 0:
            continue
        pred = mx.sym.FullyConnected(data=hidden, num_hidden=num_label,
                                     weight=cls_weight,
                                     bias=cls_bias, name='pred')
        softmax_output = mx.sym.SoftmaxOutput(data=pred, label=labelidx[k],
                                              name='softmax')
        output = mx.sym.argmax(softmax_output, axis=1)
        allsm.append(softmax_output)

    allsm = mx.sym.Concat(*allsm, dim=1)
    softmax_output = mx.sym.reshape(allsm, shape=(-1, num_label))
    return (softmax_output,
            ['veclabel', 'l0_init_h', 'l0_init_c', 'data'],
            ['softmax_label'])

Bucketing in Mxnet

When training a recurrent neural network (RNN), we unroll the network in time. For a single example of length T, we would unroll the network T steps. In the unrolled view, the weights are shared across times steps. The unrolled view allows us to train the network via backpropagation (backpropagation through time).

However, there are varying lengths of sequences in a dataset. In the unrolled view, each example requires a different number of unrollings. If we want to perform mini-batch training, we will have to pad all the sequences so they have the same length as the longest example. This could be wastful bacause on shorter sequences, most of the computations are done on padded data.

Bucketing offers an effective solution to make minibatches out of varying-length sequences. Instead of unrolling the network to the maximum possible sequence length, we unroll multiple instances of different lengths (e.g., length 5, 10, 20, 30).

The function default_gen_buckets will generate a list of buckets based on the set of captions in the input. This list will contain the sizes of each bucket.

In [4]:
'''
module that defines bucketing data iter
'''

def default_gen_buckets(allwords, batch_size):
    '''
    Generate buckets based on data. This method generates a list of buckets
    and the length of those buckets based on the input
    Args:
        allwords: all the sentences (set of words) that are part of the data
        batch_size: batch size to check if a particular bucket has that many
        elements
    Returns:
        returns the generated buckets
    '''
    len_dict = {}
    max_len = -1
    for key in allwords:
        words = allwords[key][1]
        if len(words) == 0:
            continue
        if len(words) > max_len:
            max_len = len(words)
        if len(words) in len_dict:
            len_dict[len(words)] += 1
        else:
            len_dict[len(words)] = 1
    buckets = []
    for length, num in len_dict.items():
        if num >= batch_size:
            buckets.append(length)

    return buckets

class BucketIter(mx.io.DataIter):
    '''
    Class that defines the data iter for image captioning module
    '''
    def __init__(self, captionf, batch_size=1):
        '''
        Init function for the class
        Args:
            captionf: pickle filename that has all the captions
            batch_size: batch size for training data
        '''
        super(BucketIter, self).__init__()
        self.batch_size = batch_size

        # load datafiles, ignore the second output which is just the img ids 
        # for each data element - used in case we need access to img id during 
        # testing
        [self.allwords, _, self.vocabwords, self.vocabids, \
         self.unknown_id] = pickle.load(open(captionf, 'rb'), encoding='latin1')

        # generate buckets
        buckets = default_gen_buckets(self.allwords, batch_size)
        buckets.sort()
        self.buckets = buckets
        # assing default bucket - ideally should be the largest bucket
        self.default_bucket_key = max(buckets)

        # Assign data to their corresponding bucket
        self.databkt = [[] for _ in buckets]
        self.cursor = {}
        self.num_data_bkt = {}
        for idx in self.allwords:
            strs = self.allwords[idx][1]
            for i, bkt in enumerate(buckets):
                if bkt == len(strs):
                    self.databkt[i].append(idx)
                    break
        
        # initialize bucket specific parameters, the current index into 
        # the bucket and the remaining number of elements in the bucket
        for i, bkt in enumerate(buckets):
            self.cursor[i] = -1
            self.num_data_bkt[i] = len(self.databkt[i])

        # iterator variables
        self.epoch = 0
        self.bidx = np.argmax(buckets)
        self.data, self.label = self.read(self.bidx)
        self.reset()

    @property
    def bucket_key(self):
        '''
        bucket key for bucketiter module
        '''
        return self.buckets[self.bidx]

    @property
    def provide_data(self):
        """The name and shape of data provided by this iterator"""
        # res = [(k, tuple(list(self.data[k].shape[0:]))) for k in self.data]
        res = [('veclabel', tuple(list(self.data['veclabel'].shape[0:]))), 
               ('l0_init_h', tuple(list(self.data['l0_init_h'].shape[0:]))), 
               ('l0_init_c', tuple(list(self.data['l0_init_c'].shape[0:]))), 
               ('data', tuple(list(self.data['data'].shape[0:])))]
        return res

    @property
    def provide_label(self):
        """The name and shape of label provided by this iterator"""
        res = [(k, tuple(list(self.label[k].shape[0:]))) for k in self.label]
        return res

    def reset(self):
        '''
        data iter reset
        '''
        for index, _ in enumerate(self.cursor):
            self.cursor[index] = -1
        self.epoch += 1

    def next(self):
        """return one dict which contains "data" and "label" """
        if self.iter_next():
            # select one random bucket out of all the ones that has
            # > batch_size remaining samples
            rem = [i for i, _ in enumerate(self.buckets)
                   if (len(self.databkt[i])-self.cursor[i]) > self.batch_size]
            bidx = np.random.randint(0, len(rem))
            bidx = rem[bidx]
            # read the samples from the bucket
            self.data, self.label = self.read(bidx)
            # prepare as databatch to return
            res = DataBatch(provide_data=self.provide_data,
                            provide_label=self.provide_label,
                            bucket_key=self.buckets[self.bidx],
                            data=[mx.nd.array(self.data['veclabel']),
                                  mx.nd.array(self.data['l0_init_h']),
                                  mx.nd.array(self.data['l0_init_c']),
                                  mx.nd.array(self.data['data'])],
                            label=[mx.nd.array(self.label['softmax_label'])],
                            pad=0, index=None)
            return res
        else:
            raise StopIteration

    def iter_next(self):
        '''
        check if next iteration can be done
        '''
        for i, _ in enumerate(self.buckets):
            if self.cursor[i] + self.batch_size < self.num_data_bkt[i]:
                return True
        return False

    def read(self, bidx):
        '''
        read the next set of data based on bucket index
        Args:
            bidx: bucket index
        '''
        self.bidx = bidx
        data_array = []
        allimgids = []
        label = []
        labelvec = []
        index = 0
        while 1:
            self.cursor[bidx] += 1
            # obtain the feature vector
            data = self.get_data(bidx)
            # obtain the label (caption word indices)
            labels = self.get_label(bidx)
            if len(labels) == 0:
                continue
            data_array.append(data)
            # construct a one-hot vector of labels
            labela = []
            labelveca = []
            for labelidx in labels:
                labelarray = np.zeros((len(self.vocabwords)+1), dtype='int')
                labelarray[labelidx] = 1
                labela.append(labelarray)
                labelveca.append(labelidx)
            label.append(labela)
            labelvec.append(labelveca)
            index += 1
            if index > (self.batch_size-1):
                break

        darray = np.vstack(data_array)
        # this is also defined in training code - need to set the same number
        # of hidden units for the LSTM
        num_hidden = 512

        data = {}
        data['l0_init_h'] = np.zeros((darray.shape[0], num_hidden),
                                     dtype='float')
        data['l0_init_c'] = np.zeros((darray.shape[0], num_hidden),
                                     dtype='float')
        data['data'] = darray
        data['veclabel'] = np.array(labelvec)

        finallabel = {}
        finallabel['softmax_label'] = np.asarray(label)

        return (data, finallabel)

    def get_data(self, bidx):
        '''
        Returns the feature vector based on the current cursor
        and bucket index
        Args:
            bidx: bucket index
        '''
        idx = self.databkt[bidx][self.cursor[bidx]]
        return self.allwords[idx][0]

    def get_label(self, bidx):
        '''
        Returns the label vector based on the current cursor
        and bucket index
        Args:
            bidx: bucket index
        '''
        idx = self.databkt[bidx][self.cursor[bidx]]
        return self.allwords[idx][1]
In [5]:
'''
main code for training image captioning model
'''

DEBUG = True

def custommetric(label, pred):
    '''
    Simple metric that outputs the fraction of correct word predictions
    to the total number of words
    Args:
        label: ground truth label
        pred: predicted output
    Returns:
        accuracy metric
    '''
    # shift by one word to match prediction
    label = label[:, 1:, :]

    pred = np.reshape(pred, label.shape)
    label = np.argmax(label, axis=2)
    pred = np.argmax(pred, axis=2)
    return float(np.sum(pred == label)) / np.sum(label >= 0)
In [6]:
BATCH_SIZE = 192
NUM_HIDDEN = 512
NUM_EPOCH = 96
GENERATE_GRAPH = False
DATADIR = '.'

data_train = BucketIter(DATADIR+'/captiondata10k.pickle',
                        batch_size=BATCH_SIZE)
if DEBUG:
    print('training data loaded ....')
    print('provide data', data_train.provide_data, 'provide label', data_train.provide_label, 'default bucket key', data_train.default_bucket_key)

for i in range(len(data_train.provide_data)): # TODO: returned list didn't guarantee sequence
    if data_train.provide_data[i][0] == 'data':
        INPUT_SIZE = data_train.provide_data[i][1][1]
        EMBED_SIZE = data_train.provide_data[i][1][1]
        break

NUM_LABEL = len(data_train.vocabwords)+1
if DEBUG:
    print('input size', INPUT_SIZE, 'number label', NUM_LABEL, 'embed size', EMBED_SIZE)

data_val = BucketIter(DATADIR+'/captiondataval10k.pickle',
                      batch_size=BATCH_SIZE)
if DEBUG:
    print('validation data loaded ....')
training data loaded ....
provide data [('veclabel', (192, 16)), ('l0_init_h', (192, 512)), ('l0_init_c', (192, 512)), ('data', (192, 2048))] provide label [('softmax_label', (192, 16, 9955))] default bucket key 16
input size 2048 number label 9955 embed size 2048
validation data loaded ....
In [7]:
contexts = mx.gpu(3) # [mx.context.gpu(i) for i in range(1)]

# this is needed for the bucketing module
def sym_gen(seq_len):
    '''
    needed for bucketing module, network generated based on seq_len
    Args:
        seq_len: length of the current sequence
    Returns:
        Symbolic network
    '''
    return build_lstm_network(seq_len, INPUT_SIZE, NUM_HIDDEN, EMBED_SIZE,
                              NUM_LABEL)

model = mx.mod.BucketingModule(sym_gen, data_train.default_bucket_key,
                               context=contexts)

model.bind(data_shapes=data_train.provide_data,
           label_shapes=data_train.provide_label)
model.init_params(initializer=mx.init.Normal(sigma=0.01))

head = '%(asctime)-15s %(message)s'
logging.basicConfig(level=logging.INFO, format=head)
In [8]:
start_time = time.time()
if not os.path.exists('models'):
    os.mkdir('models')
CHECKPOINT_NAME = 'models/imagecaption'

model.fit(data_train, data_val, num_epoch=NUM_EPOCH, optimizer='adam',
              optimizer_params={'learning_rate': 1e-3},
              eval_metric=mx.metric.CustomMetric(custommetric),
              epoch_end_callback=mx.callback.do_checkpoint(CHECKPOINT_NAME))
print('program run time', time.time() - start_time)
2019-09-07 19:48:35,546 Already bound, ignoring bind()
2019-09-07 19:48:52,736 Epoch[0] Train-custommetric=0.095028
2019-09-07 19:48:52,738 Epoch[0] Time cost=17.189
2019-09-07 19:48:52,873 Saved checkpoint to "models/imagecaption-0001.params"
2019-09-07 19:49:04,640 Epoch[0] Validation-custommetric=0.090923
2019-09-07 19:49:20,491 Epoch[1] Train-custommetric=0.100061
2019-09-07 19:49:20,494 Epoch[1] Time cost=15.853
2019-09-07 19:49:20,641 Saved checkpoint to "models/imagecaption-0002.params"
2019-09-07 19:49:31,687 Epoch[1] Validation-custommetric=0.132261
2019-09-07 19:49:46,886 Epoch[2] Train-custommetric=0.204603
2019-09-07 19:49:46,888 Epoch[2] Time cost=15.199
2019-09-07 19:49:47,042 Saved checkpoint to "models/imagecaption-0003.params"
2019-09-07 19:49:58,109 Epoch[2] Validation-custommetric=0.233189
2019-09-07 19:50:13,786 Epoch[3] Train-custommetric=0.229237
2019-09-07 19:50:13,788 Epoch[3] Time cost=15.677
2019-09-07 19:50:13,894 Saved checkpoint to "models/imagecaption-0004.params"
2019-09-07 19:50:25,476 Epoch[3] Validation-custommetric=0.232909
2019-09-07 19:50:41,643 Epoch[4] Train-custommetric=0.230587
2019-09-07 19:50:41,645 Epoch[4] Time cost=16.168
2019-09-07 19:50:41,875 Saved checkpoint to "models/imagecaption-0005.params"
2019-09-07 19:50:52,908 Epoch[4] Validation-custommetric=0.234601
2019-09-07 19:51:08,462 Epoch[5] Train-custommetric=0.258921
2019-09-07 19:51:08,464 Epoch[5] Time cost=15.554
2019-09-07 19:51:08,657 Saved checkpoint to "models/imagecaption-0006.params"
2019-09-07 19:51:20,095 Epoch[5] Validation-custommetric=0.266079
2019-09-07 19:51:35,581 Epoch[6] Train-custommetric=0.287649
2019-09-07 19:51:35,583 Epoch[6] Time cost=15.486
2019-09-07 19:51:35,669 Saved checkpoint to "models/imagecaption-0007.params"
2019-09-07 19:51:47,259 Epoch[6] Validation-custommetric=0.302300
2019-09-07 19:52:02,973 Epoch[7] Train-custommetric=0.312567
2019-09-07 19:52:02,975 Epoch[7] Time cost=15.714
2019-09-07 19:52:03,075 Saved checkpoint to "models/imagecaption-0008.params"
2019-09-07 19:52:13,864 Epoch[7] Validation-custommetric=0.313838
2019-09-07 19:52:29,513 Epoch[8] Train-custommetric=0.322670
2019-09-07 19:52:29,516 Epoch[8] Time cost=15.650
2019-09-07 19:52:29,604 Saved checkpoint to "models/imagecaption-0009.params"
2019-09-07 19:52:40,661 Epoch[8] Validation-custommetric=0.323770
2019-09-07 19:52:56,275 Epoch[9] Train-custommetric=0.335742
2019-09-07 19:52:56,277 Epoch[9] Time cost=15.614
2019-09-07 19:52:56,399 Saved checkpoint to "models/imagecaption-0010.params"
2019-09-07 19:53:07,900 Epoch[9] Validation-custommetric=0.332150
2019-09-07 19:53:23,736 Epoch[10] Train-custommetric=0.350446
2019-09-07 19:53:23,738 Epoch[10] Time cost=15.837
2019-09-07 19:53:23,854 Saved checkpoint to "models/imagecaption-0011.params"
2019-09-07 19:53:35,295 Epoch[10] Validation-custommetric=0.341101
2019-09-07 19:53:51,383 Epoch[11] Train-custommetric=0.352551
2019-09-07 19:53:51,385 Epoch[11] Time cost=16.088
2019-09-07 19:53:51,509 Saved checkpoint to "models/imagecaption-0012.params"
2019-09-07 19:54:03,082 Epoch[11] Validation-custommetric=0.345381
2019-09-07 19:54:18,882 Epoch[12] Train-custommetric=0.364300
2019-09-07 19:54:18,884 Epoch[12] Time cost=15.799
2019-09-07 19:54:19,111 Saved checkpoint to "models/imagecaption-0013.params"
2019-09-07 19:54:30,413 Epoch[12] Validation-custommetric=0.350731
2019-09-07 19:54:46,350 Epoch[13] Train-custommetric=0.373037
2019-09-07 19:54:46,351 Epoch[13] Time cost=15.937
2019-09-07 19:54:46,585 Saved checkpoint to "models/imagecaption-0014.params"
2019-09-07 19:54:57,988 Epoch[13] Validation-custommetric=0.356449
2019-09-07 19:55:14,068 Epoch[14] Train-custommetric=0.386406
2019-09-07 19:55:14,070 Epoch[14] Time cost=16.080
2019-09-07 19:55:14,154 Saved checkpoint to "models/imagecaption-0015.params"
2019-09-07 19:55:24,969 Epoch[14] Validation-custommetric=0.360338
2019-09-07 19:55:40,311 Epoch[15] Train-custommetric=0.393548
2019-09-07 19:55:40,314 Epoch[15] Time cost=15.343
2019-09-07 19:55:40,470 Saved checkpoint to "models/imagecaption-0016.params"
2019-09-07 19:55:51,205 Epoch[15] Validation-custommetric=0.366379
2019-09-07 19:56:06,235 Epoch[16] Train-custommetric=0.401331
2019-09-07 19:56:06,237 Epoch[16] Time cost=15.030
2019-09-07 19:56:06,386 Saved checkpoint to "models/imagecaption-0017.params"
2019-09-07 19:56:17,526 Epoch[16] Validation-custommetric=0.369935
2019-09-07 19:56:33,355 Epoch[17] Train-custommetric=0.417445
2019-09-07 19:56:33,357 Epoch[17] Time cost=15.830
2019-09-07 19:56:33,461 Saved checkpoint to "models/imagecaption-0018.params"
2019-09-07 19:56:44,894 Epoch[17] Validation-custommetric=0.366864
2019-09-07 19:57:00,714 Epoch[18] Train-custommetric=0.429342
2019-09-07 19:57:00,716 Epoch[18] Time cost=15.820
2019-09-07 19:57:00,823 Saved checkpoint to "models/imagecaption-0019.params"
2019-09-07 19:57:11,921 Epoch[18] Validation-custommetric=0.373586
2019-09-07 19:57:27,032 Epoch[19] Train-custommetric=0.440561
2019-09-07 19:57:27,033 Epoch[19] Time cost=15.111
2019-09-07 19:57:27,165 Saved checkpoint to "models/imagecaption-0020.params"
2019-09-07 19:57:38,219 Epoch[19] Validation-custommetric=0.376616
2019-09-07 19:57:53,211 Epoch[20] Train-custommetric=0.457049
2019-09-07 19:57:53,213 Epoch[20] Time cost=14.992
2019-09-07 19:57:53,370 Saved checkpoint to "models/imagecaption-0021.params"
2019-09-07 19:58:04,681 Epoch[20] Validation-custommetric=0.377365
2019-09-07 19:58:20,010 Epoch[21] Train-custommetric=0.474968
2019-09-07 19:58:20,012 Epoch[21] Time cost=15.329
2019-09-07 19:58:20,137 Saved checkpoint to "models/imagecaption-0022.params"
2019-09-07 19:58:31,209 Epoch[21] Validation-custommetric=0.382380
2019-09-07 19:58:46,495 Epoch[22] Train-custommetric=0.486149
2019-09-07 19:58:46,497 Epoch[22] Time cost=15.286
2019-09-07 19:58:46,634 Saved checkpoint to "models/imagecaption-0023.params"
2019-09-07 19:58:57,468 Epoch[22] Validation-custommetric=0.382378
2019-09-07 19:59:12,367 Epoch[23] Train-custommetric=0.505259
2019-09-07 19:59:12,369 Epoch[23] Time cost=14.900
2019-09-07 19:59:12,536 Saved checkpoint to "models/imagecaption-0024.params"
2019-09-07 19:59:23,405 Epoch[23] Validation-custommetric=0.376662
2019-09-07 19:59:38,200 Epoch[24] Train-custommetric=0.517577
2019-09-07 19:59:38,202 Epoch[24] Time cost=14.795
2019-09-07 19:59:38,316 Saved checkpoint to "models/imagecaption-0025.params"
2019-09-07 19:59:49,120 Epoch[24] Validation-custommetric=0.382350
2019-09-07 20:00:04,421 Epoch[25] Train-custommetric=0.531297
2019-09-07 20:00:04,423 Epoch[25] Time cost=15.302
2019-09-07 20:00:04,505 Saved checkpoint to "models/imagecaption-0026.params"
2019-09-07 20:00:15,319 Epoch[25] Validation-custommetric=0.387530
2019-09-07 20:00:30,766 Epoch[26] Train-custommetric=0.548348
2019-09-07 20:00:30,768 Epoch[26] Time cost=15.447
2019-09-07 20:00:30,845 Saved checkpoint to "models/imagecaption-0027.params"
2019-09-07 20:00:41,656 Epoch[26] Validation-custommetric=0.385637
2019-09-07 20:00:57,198 Epoch[27] Train-custommetric=0.564638
2019-09-07 20:00:57,200 Epoch[27] Time cost=15.543
2019-09-07 20:00:57,333 Saved checkpoint to "models/imagecaption-0028.params"
2019-09-07 20:01:08,392 Epoch[27] Validation-custommetric=0.383837
2019-09-07 20:01:24,132 Epoch[28] Train-custommetric=0.574772
2019-09-07 20:01:24,134 Epoch[28] Time cost=15.740
2019-09-07 20:01:24,213 Saved checkpoint to "models/imagecaption-0029.params"
2019-09-07 20:01:35,313 Epoch[28] Validation-custommetric=0.382876
2019-09-07 20:01:51,112 Epoch[29] Train-custommetric=0.595194
2019-09-07 20:01:51,113 Epoch[29] Time cost=15.798
2019-09-07 20:01:51,208 Saved checkpoint to "models/imagecaption-0030.params"
2019-09-07 20:02:02,409 Epoch[29] Validation-custommetric=0.384529
2019-09-07 20:02:17,909 Epoch[30] Train-custommetric=0.607192
2019-09-07 20:02:17,911 Epoch[30] Time cost=15.500
2019-09-07 20:02:17,991 Saved checkpoint to "models/imagecaption-0031.params"
2019-09-07 20:02:28,981 Epoch[30] Validation-custommetric=0.383481
2019-09-07 20:02:44,590 Epoch[31] Train-custommetric=0.622330
2019-09-07 20:02:44,592 Epoch[31] Time cost=15.609
2019-09-07 20:02:44,697 Saved checkpoint to "models/imagecaption-0032.params"
2019-09-07 20:02:56,030 Epoch[31] Validation-custommetric=0.384403
2019-09-07 20:03:12,138 Epoch[32] Train-custommetric=0.637195
2019-09-07 20:03:12,139 Epoch[32] Time cost=16.107
2019-09-07 20:03:12,219 Saved checkpoint to "models/imagecaption-0033.params"
2019-09-07 20:03:23,389 Epoch[32] Validation-custommetric=0.385073
2019-09-07 20:03:38,886 Epoch[33] Train-custommetric=0.650192
2019-09-07 20:03:38,889 Epoch[33] Time cost=15.498
2019-09-07 20:03:39,029 Saved checkpoint to "models/imagecaption-0034.params"
2019-09-07 20:03:50,187 Epoch[33] Validation-custommetric=0.378970
2019-09-07 20:04:05,528 Epoch[34] Train-custommetric=0.665286
2019-09-07 20:04:05,531 Epoch[34] Time cost=15.342
2019-09-07 20:04:05,649 Saved checkpoint to "models/imagecaption-0035.params"
2019-09-07 20:04:17,051 Epoch[34] Validation-custommetric=0.384219
2019-09-07 20:04:32,715 Epoch[35] Train-custommetric=0.680181
2019-09-07 20:04:32,716 Epoch[35] Time cost=15.664
2019-09-07 20:04:32,794 Saved checkpoint to "models/imagecaption-0036.params"
2019-09-07 20:04:43,872 Epoch[35] Validation-custommetric=0.385282
2019-09-07 20:04:59,651 Epoch[36] Train-custommetric=0.689246
2019-09-07 20:04:59,653 Epoch[36] Time cost=15.779
2019-09-07 20:04:59,733 Saved checkpoint to "models/imagecaption-0037.params"
2019-09-07 20:05:10,957 Epoch[36] Validation-custommetric=0.383193
2019-09-07 20:05:26,335 Epoch[37] Train-custommetric=0.703947
2019-09-07 20:05:26,337 Epoch[37] Time cost=15.377
2019-09-07 20:05:26,414 Saved checkpoint to "models/imagecaption-0038.params"
2019-09-07 20:05:37,349 Epoch[37] Validation-custommetric=0.379524
2019-09-07 20:05:53,088 Epoch[38] Train-custommetric=0.718441
2019-09-07 20:05:53,090 Epoch[38] Time cost=15.739
2019-09-07 20:05:53,191 Saved checkpoint to "models/imagecaption-0039.params"
2019-09-07 20:06:04,143 Epoch[38] Validation-custommetric=0.382489
2019-09-07 20:06:19,651 Epoch[39] Train-custommetric=0.727604
2019-09-07 20:06:19,653 Epoch[39] Time cost=15.509
2019-09-07 20:06:19,809 Saved checkpoint to "models/imagecaption-0040.params"
2019-09-07 20:06:30,921 Epoch[39] Validation-custommetric=0.382254
2019-09-07 20:06:46,263 Epoch[40] Train-custommetric=0.741024
2019-09-07 20:06:46,264 Epoch[40] Time cost=15.341
2019-09-07 20:06:46,375 Saved checkpoint to "models/imagecaption-0041.params"
2019-09-07 20:06:57,140 Epoch[40] Validation-custommetric=0.379100
2019-09-07 20:07:12,432 Epoch[41] Train-custommetric=0.750506
2019-09-07 20:07:12,433 Epoch[41] Time cost=15.292
2019-09-07 20:07:12,685 Saved checkpoint to "models/imagecaption-0042.params"
2019-09-07 20:07:23,752 Epoch[41] Validation-custommetric=0.378720
2019-09-07 20:07:39,079 Epoch[42] Train-custommetric=0.760982
2019-09-07 20:07:39,081 Epoch[42] Time cost=15.327
2019-09-07 20:07:39,232 Saved checkpoint to "models/imagecaption-0043.params"
2019-09-07 20:07:50,543 Epoch[42] Validation-custommetric=0.378371
2019-09-07 20:08:05,966 Epoch[43] Train-custommetric=0.773752
2019-09-07 20:08:05,968 Epoch[43] Time cost=15.423
2019-09-07 20:08:06,086 Saved checkpoint to "models/imagecaption-0044.params"
2019-09-07 20:08:17,386 Epoch[43] Validation-custommetric=0.381297
2019-09-07 20:08:33,131 Epoch[44] Train-custommetric=0.782632
2019-09-07 20:08:33,133 Epoch[44] Time cost=15.745
2019-09-07 20:08:33,218 Saved checkpoint to "models/imagecaption-0045.params"
2019-09-07 20:08:44,168 Epoch[44] Validation-custommetric=0.378546
2019-09-07 20:08:59,347 Epoch[45] Train-custommetric=0.792201
2019-09-07 20:08:59,348 Epoch[45] Time cost=15.178
2019-09-07 20:08:59,533 Saved checkpoint to "models/imagecaption-0046.params"
2019-09-07 20:09:10,462 Epoch[45] Validation-custommetric=0.379684
2019-09-07 20:09:25,645 Epoch[46] Train-custommetric=0.801830
2019-09-07 20:09:25,647 Epoch[46] Time cost=15.183
2019-09-07 20:09:25,761 Saved checkpoint to "models/imagecaption-0047.params"
2019-09-07 20:09:36,897 Epoch[46] Validation-custommetric=0.381090
2019-09-07 20:09:52,540 Epoch[47] Train-custommetric=0.804991
2019-09-07 20:09:52,542 Epoch[47] Time cost=15.643
2019-09-07 20:09:52,705 Saved checkpoint to "models/imagecaption-0048.params"
2019-09-07 20:10:03,926 Epoch[47] Validation-custommetric=0.375386
2019-09-07 20:10:19,224 Epoch[48] Train-custommetric=0.814888
2019-09-07 20:10:19,226 Epoch[48] Time cost=15.298
2019-09-07 20:10:19,461 Saved checkpoint to "models/imagecaption-0049.params"
2019-09-07 20:10:30,458 Epoch[48] Validation-custommetric=0.373057
2019-09-07 20:10:46,158 Epoch[49] Train-custommetric=0.819736
2019-09-07 20:10:46,159 Epoch[49] Time cost=15.699
2019-09-07 20:10:46,318 Saved checkpoint to "models/imagecaption-0050.params"
2019-09-07 20:10:57,203 Epoch[49] Validation-custommetric=0.378571
2019-09-07 20:11:12,855 Epoch[50] Train-custommetric=0.826233
2019-09-07 20:11:12,857 Epoch[50] Time cost=15.652
2019-09-07 20:11:12,996 Saved checkpoint to "models/imagecaption-0051.params"
2019-09-07 20:11:24,179 Epoch[50] Validation-custommetric=0.376114
2019-09-07 20:11:39,799 Epoch[51] Train-custommetric=0.835182
2019-09-07 20:11:39,801 Epoch[51] Time cost=15.621
2019-09-07 20:11:39,892 Saved checkpoint to "models/imagecaption-0052.params"
2019-09-07 20:11:51,080 Epoch[51] Validation-custommetric=0.371435
2019-09-07 20:12:06,632 Epoch[52] Train-custommetric=0.842510
2019-09-07 20:12:06,634 Epoch[52] Time cost=15.553
2019-09-07 20:12:06,835 Saved checkpoint to "models/imagecaption-0053.params"
2019-09-07 20:12:18,043 Epoch[52] Validation-custommetric=0.372784
2019-09-07 20:12:33,371 Epoch[53] Train-custommetric=0.848326
2019-09-07 20:12:33,372 Epoch[53] Time cost=15.327
2019-09-07 20:12:33,494 Saved checkpoint to "models/imagecaption-0054.params"
2019-09-07 20:12:44,110 Epoch[53] Validation-custommetric=0.374763
2019-09-07 20:12:59,445 Epoch[54] Train-custommetric=0.850073
2019-09-07 20:12:59,447 Epoch[54] Time cost=15.335
2019-09-07 20:12:59,597 Saved checkpoint to "models/imagecaption-0055.params"
2019-09-07 20:13:10,558 Epoch[54] Validation-custommetric=0.374695
2019-09-07 20:13:25,735 Epoch[55] Train-custommetric=0.856551
2019-09-07 20:13:25,736 Epoch[55] Time cost=15.177
2019-09-07 20:13:25,873 Saved checkpoint to "models/imagecaption-0056.params"
2019-09-07 20:13:36,684 Epoch[55] Validation-custommetric=0.376749
2019-09-07 20:13:52,010 Epoch[56] Train-custommetric=0.860259
2019-09-07 20:13:52,012 Epoch[56] Time cost=15.327
2019-09-07 20:13:52,130 Saved checkpoint to "models/imagecaption-0057.params"
2019-09-07 20:14:03,034 Epoch[56] Validation-custommetric=0.375608
2019-09-07 20:14:18,398 Epoch[57] Train-custommetric=0.862717
2019-09-07 20:14:18,400 Epoch[57] Time cost=15.363
2019-09-07 20:14:18,485 Saved checkpoint to "models/imagecaption-0058.params"
2019-09-07 20:14:29,402 Epoch[57] Validation-custommetric=0.374798
2019-09-07 20:14:45,230 Epoch[58] Train-custommetric=0.867150
2019-09-07 20:14:45,231 Epoch[58] Time cost=15.827
2019-09-07 20:14:45,390 Saved checkpoint to "models/imagecaption-0059.params"
2019-09-07 20:14:56,451 Epoch[58] Validation-custommetric=0.374385
2019-09-07 20:15:12,276 Epoch[59] Train-custommetric=0.869259
2019-09-07 20:15:12,278 Epoch[59] Time cost=15.825
2019-09-07 20:15:12,411 Saved checkpoint to "models/imagecaption-0060.params"
2019-09-07 20:15:23,589 Epoch[59] Validation-custommetric=0.372125
2019-09-07 20:15:39,022 Epoch[60] Train-custommetric=0.873282
2019-09-07 20:15:39,024 Epoch[60] Time cost=15.434
2019-09-07 20:15:39,162 Saved checkpoint to "models/imagecaption-0061.params"
2019-09-07 20:15:50,233 Epoch[60] Validation-custommetric=0.375121
2019-09-07 20:16:05,536 Epoch[61] Train-custommetric=0.876042
2019-09-07 20:16:05,538 Epoch[61] Time cost=15.303
2019-09-07 20:16:05,721 Saved checkpoint to "models/imagecaption-0062.params"
2019-09-07 20:16:16,672 Epoch[61] Validation-custommetric=0.374975
2019-09-07 20:16:32,513 Epoch[62] Train-custommetric=0.877832
2019-09-07 20:16:32,515 Epoch[62] Time cost=15.842
2019-09-07 20:16:32,635 Saved checkpoint to "models/imagecaption-0063.params"
2019-09-07 20:16:43,545 Epoch[62] Validation-custommetric=0.374516
2019-09-07 20:16:58,867 Epoch[63] Train-custommetric=0.878517
2019-09-07 20:16:58,869 Epoch[63] Time cost=15.322
2019-09-07 20:16:59,067 Saved checkpoint to "models/imagecaption-0064.params"
2019-09-07 20:17:09,999 Epoch[63] Validation-custommetric=0.371854
2019-09-07 20:17:25,759 Epoch[64] Train-custommetric=0.881772
2019-09-07 20:17:25,761 Epoch[64] Time cost=15.760
2019-09-07 20:17:25,860 Saved checkpoint to "models/imagecaption-0065.params"
2019-09-07 20:17:37,162 Epoch[64] Validation-custommetric=0.373174
2019-09-07 20:17:53,072 Epoch[65] Train-custommetric=0.883254
2019-09-07 20:17:53,074 Epoch[65] Time cost=15.909
2019-09-07 20:17:53,159 Saved checkpoint to "models/imagecaption-0066.params"
2019-09-07 20:18:04,286 Epoch[65] Validation-custommetric=0.372951
2019-09-07 20:18:19,430 Epoch[66] Train-custommetric=0.886322
2019-09-07 20:18:19,432 Epoch[66] Time cost=15.144
2019-09-07 20:18:19,665 Saved checkpoint to "models/imagecaption-0067.params"
2019-09-07 20:18:30,880 Epoch[66] Validation-custommetric=0.372463
2019-09-07 20:18:46,433 Epoch[67] Train-custommetric=0.889604
2019-09-07 20:18:46,435 Epoch[67] Time cost=15.553
2019-09-07 20:18:46,522 Saved checkpoint to "models/imagecaption-0068.params"
2019-09-07 20:18:57,734 Epoch[67] Validation-custommetric=0.372066
2019-09-07 20:19:13,166 Epoch[68] Train-custommetric=0.889882
2019-09-07 20:19:13,168 Epoch[68] Time cost=15.432
2019-09-07 20:19:13,324 Saved checkpoint to "models/imagecaption-0069.params"
2019-09-07 20:19:24,316 Epoch[68] Validation-custommetric=0.370033
2019-09-07 20:19:39,524 Epoch[69] Train-custommetric=0.891643
2019-09-07 20:19:39,526 Epoch[69] Time cost=15.208
2019-09-07 20:19:39,635 Saved checkpoint to "models/imagecaption-0070.params"
2019-09-07 20:19:50,559 Epoch[69] Validation-custommetric=0.372026
2019-09-07 20:20:06,153 Epoch[70] Train-custommetric=0.893659
2019-09-07 20:20:06,155 Epoch[70] Time cost=15.594
2019-09-07 20:20:06,288 Saved checkpoint to "models/imagecaption-0071.params"
2019-09-07 20:20:17,341 Epoch[70] Validation-custommetric=0.371216
2019-09-07 20:20:32,657 Epoch[71] Train-custommetric=0.895258
2019-09-07 20:20:32,659 Epoch[71] Time cost=15.316
2019-09-07 20:20:32,790 Saved checkpoint to "models/imagecaption-0072.params"
2019-09-07 20:20:43,631 Epoch[71] Validation-custommetric=0.370877
2019-09-07 20:20:59,098 Epoch[72] Train-custommetric=0.894700
2019-09-07 20:20:59,100 Epoch[72] Time cost=15.467
2019-09-07 20:20:59,236 Saved checkpoint to "models/imagecaption-0073.params"
2019-09-07 20:21:10,303 Epoch[72] Validation-custommetric=0.369977
2019-09-07 20:21:25,658 Epoch[73] Train-custommetric=0.898213
2019-09-07 20:21:25,660 Epoch[73] Time cost=15.355
2019-09-07 20:21:25,840 Saved checkpoint to "models/imagecaption-0074.params"
2019-09-07 20:21:36,608 Epoch[73] Validation-custommetric=0.371168
2019-09-07 20:21:51,947 Epoch[74] Train-custommetric=0.897508
2019-09-07 20:21:51,949 Epoch[74] Time cost=15.339
2019-09-07 20:21:52,168 Saved checkpoint to "models/imagecaption-0075.params"
2019-09-07 20:22:02,942 Epoch[74] Validation-custommetric=0.368863
2019-09-07 20:22:18,315 Epoch[75] Train-custommetric=0.900474
2019-09-07 20:22:18,318 Epoch[75] Time cost=15.374
2019-09-07 20:22:18,404 Saved checkpoint to "models/imagecaption-0076.params"
2019-09-07 20:22:29,397 Epoch[75] Validation-custommetric=0.370735
2019-09-07 20:22:44,714 Epoch[76] Train-custommetric=0.901295
2019-09-07 20:22:44,716 Epoch[76] Time cost=15.317
2019-09-07 20:22:44,823 Saved checkpoint to "models/imagecaption-0077.params"
2019-09-07 20:22:56,268 Epoch[76] Validation-custommetric=0.369174
2019-09-07 20:23:11,624 Epoch[77] Train-custommetric=0.901627
2019-09-07 20:23:11,626 Epoch[77] Time cost=15.356
2019-09-07 20:23:11,853 Saved checkpoint to "models/imagecaption-0078.params"
2019-09-07 20:23:23,021 Epoch[77] Validation-custommetric=0.372758
2019-09-07 20:23:38,242 Epoch[78] Train-custommetric=0.903973
2019-09-07 20:23:38,244 Epoch[78] Time cost=15.222
2019-09-07 20:23:38,403 Saved checkpoint to "models/imagecaption-0079.params"
2019-09-07 20:23:49,525 Epoch[78] Validation-custommetric=0.368674
2019-09-07 20:24:04,853 Epoch[79] Train-custommetric=0.902916
2019-09-07 20:24:04,855 Epoch[79] Time cost=15.328
2019-09-07 20:24:05,010 Saved checkpoint to "models/imagecaption-0080.params"
2019-09-07 20:24:15,989 Epoch[79] Validation-custommetric=0.369180
2019-09-07 20:24:31,604 Epoch[80] Train-custommetric=0.905322
2019-09-07 20:24:31,606 Epoch[80] Time cost=15.615
2019-09-07 20:24:31,768 Saved checkpoint to "models/imagecaption-0081.params"
2019-09-07 20:24:43,074 Epoch[80] Validation-custommetric=0.371066
2019-09-07 20:24:58,911 Epoch[81] Train-custommetric=0.903389
2019-09-07 20:24:58,913 Epoch[81] Time cost=15.837
2019-09-07 20:24:59,008 Saved checkpoint to "models/imagecaption-0082.params"
2019-09-07 20:25:10,439 Epoch[81] Validation-custommetric=0.365567
2019-09-07 20:25:26,272 Epoch[82] Train-custommetric=0.906095
2019-09-07 20:25:26,273 Epoch[82] Time cost=15.833
2019-09-07 20:25:26,434 Saved checkpoint to "models/imagecaption-0083.params"
2019-09-07 20:25:37,590 Epoch[82] Validation-custommetric=0.368598
2019-09-07 20:25:53,422 Epoch[83] Train-custommetric=0.907867
2019-09-07 20:25:53,424 Epoch[83] Time cost=15.832
2019-09-07 20:25:53,508 Saved checkpoint to "models/imagecaption-0084.params"
2019-09-07 20:26:04,709 Epoch[83] Validation-custommetric=0.368898
2019-09-07 20:26:20,330 Epoch[84] Train-custommetric=0.907167
2019-09-07 20:26:20,332 Epoch[84] Time cost=15.621
2019-09-07 20:26:20,448 Saved checkpoint to "models/imagecaption-0085.params"
2019-09-07 20:26:31,721 Epoch[84] Validation-custommetric=0.369183
2019-09-07 20:26:47,607 Epoch[85] Train-custommetric=0.908899
2019-09-07 20:26:47,609 Epoch[85] Time cost=15.886
2019-09-07 20:26:47,717 Saved checkpoint to "models/imagecaption-0086.params"
2019-09-07 20:26:58,831 Epoch[85] Validation-custommetric=0.364819
2019-09-07 20:27:14,811 Epoch[86] Train-custommetric=0.909538
2019-09-07 20:27:14,813 Epoch[86] Time cost=15.980
2019-09-07 20:27:14,905 Saved checkpoint to "models/imagecaption-0087.params"
2019-09-07 20:27:25,932 Epoch[86] Validation-custommetric=0.366772
2019-09-07 20:27:41,699 Epoch[87] Train-custommetric=0.910241
2019-09-07 20:27:41,701 Epoch[87] Time cost=15.768
2019-09-07 20:27:41,840 Saved checkpoint to "models/imagecaption-0088.params"
2019-09-07 20:27:53,030 Epoch[87] Validation-custommetric=0.366212
2019-09-07 20:28:08,433 Epoch[88] Train-custommetric=0.908940
2019-09-07 20:28:08,435 Epoch[88] Time cost=15.403
2019-09-07 20:28:08,594 Saved checkpoint to "models/imagecaption-0089.params"
2019-09-07 20:28:20,346 Epoch[88] Validation-custommetric=0.368487
2019-09-07 20:28:40,295 Epoch[89] Train-custommetric=0.909809
2019-09-07 20:28:40,297 Epoch[89] Time cost=19.947
2019-09-07 20:28:41,007 Saved checkpoint to "models/imagecaption-0090.params"
2019-09-07 20:28:54,533 Epoch[89] Validation-custommetric=0.367890
2019-09-07 20:29:14,391 Epoch[90] Train-custommetric=0.911937
2019-09-07 20:29:14,392 Epoch[90] Time cost=19.857
2019-09-07 20:29:15,287 Saved checkpoint to "models/imagecaption-0091.params"
2019-09-07 20:29:27,297 Epoch[90] Validation-custommetric=0.361660
2019-09-07 20:29:46,734 Epoch[91] Train-custommetric=0.910833
2019-09-07 20:29:46,736 Epoch[91] Time cost=19.437
2019-09-07 20:29:47,613 Saved checkpoint to "models/imagecaption-0092.params"
2019-09-07 20:30:01,546 Epoch[91] Validation-custommetric=0.363664
2019-09-07 20:30:21,616 Epoch[92] Train-custommetric=0.912488
2019-09-07 20:30:21,618 Epoch[92] Time cost=20.070
2019-09-07 20:30:22,497 Saved checkpoint to "models/imagecaption-0093.params"
2019-09-07 20:30:35,458 Epoch[92] Validation-custommetric=0.365387
2019-09-07 20:30:54,727 Epoch[93] Train-custommetric=0.913274
2019-09-07 20:30:54,731 Epoch[93] Time cost=19.269
2019-09-07 20:30:55,632 Saved checkpoint to "models/imagecaption-0094.params"
2019-09-07 20:31:09,713 Epoch[93] Validation-custommetric=0.366574
2019-09-07 20:31:29,473 Epoch[94] Train-custommetric=0.913691
2019-09-07 20:31:29,474 Epoch[94] Time cost=19.747
2019-09-07 20:31:29,608 Saved checkpoint to "models/imagecaption-0095.params"
2019-09-07 20:31:41,970 Epoch[94] Validation-custommetric=0.364529
2019-09-07 20:32:01,334 Epoch[95] Train-custommetric=0.913289
2019-09-07 20:32:01,336 Epoch[95] Time cost=19.348
2019-09-07 20:32:02,036 Saved checkpoint to "models/imagecaption-0096.params"
2019-09-07 20:32:15,758 Epoch[95] Validation-custommetric=0.366743
program run time 2620.220703601837

Test the model to generate a caption

In [9]:
'''
module that generates features for images using resnet
'''

Batch = namedtuple('Batch', ['data'])


def download(url):
    '''
    download the file given the url
    Args:
        url: path for the filename
    '''
    filename = url.split("/")[-1]
    if not os.path.exists(filename):
        urllib.request.urlretrieve(url, filename)


def get_model(prefix, epoch):
    '''
    get the model with prefix and epoch
    Args:
        prefix: model prefix
        epoch: trained model - epoch
    '''
    download(prefix+'-symbol.json')
    download(prefix+'-%04d.params' % (epoch,))


def get_image(filename):
    '''
    return the image based on filename after resizing it to 224x224 to be
    fit for reset format
    Args:
        filename: filename of the image
    '''
    img = cv2.imread(filename)  # read image in b,g,r order
    img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)   # change to r,g,b order
    img = cv2.resize(img, (224, 224))  # resize to 224*224 to fit model
    img = np.swapaxes(img, 0, 2)
    img = np.swapaxes(img, 1, 2)  # change to (channel, height, width)
    img = img[np.newaxis, :]  # extend to (example, channel, heigth, width)
    return img


class Resnet(object):
    '''
    Resnet class to construct reset model object
    '''
    def __init__(self):
        '''
        Download the model from mxnet database and constructs the network
        for prediction
        '''
        url = 'http://data.mxnet.io/models/imagenet/resnet/50-layers/resnet-50'
        get_model(url, 0)
        sym, arg_params, aux_params = mx.model.load_checkpoint('resnet-50', 0)
        all_layers = sym.get_internals()
        sym3 = all_layers['flatten0_output']
        mod3 = mx.mod.Module(symbol=sym3, label_names=None, context=mx.cpu())
        mod3.bind(for_training=False, data_shapes=[('data', (1, 3, 224, 224))])
        mod3.set_params(arg_params, aux_params)
        self.mod3 = mod3

    def gen_features(self, img_path):
        '''
        generate features given an image
        Args:
            img_path: full path to the image
        '''
        img = get_image(img_path)
        self.mod3.forward(Batch([mx.nd.array(img)]))
        return self.mod3.get_outputs()[0].asnumpy()


def get_feature(imgfname):
    '''
    Returns the feature for the image
    Args:
        imagefname: full path to the image
    '''
    network = Resnet()
    return network.gen_features(imgfname)
In [10]:
'''
generate captions for an image
'''

%matplotlib inline
NUM_LSTM_LAYER = 1
BATCH_SIZE = 1

! wget -nc https://www.dropbox.com/s/41kfb3ezigssa9q/testimage.jpg?dl=1 --output-document testimage.jpg
imgfname = "testimage.jpg"

SEQ_LEN = 25

sym, arg_params, aux_params = \
    mx.model.load_checkpoint(CHECKPOINT_NAME, NUM_EPOCH)

NUM_HIDDEN = arg_params['l0_h2h_weight'].shape[1]
INPUT_SIZE = arg_params['l0_h2h_weight'].shape[0]
NUM_LABEL = arg_params['cls_weight'].shape[0]
sym, _, _ = build_lstm_network(SEQ_LEN, INPUT_SIZE, NUM_HIDDEN,
                               INPUT_SIZE, NUM_LABEL, prediction=True)

init_c = [('l%d_init_c' % l, (BATCH_SIZE, NUM_HIDDEN))
          for l in range(NUM_LSTM_LAYER)]
init_h = [('l%d_init_h' % l, (BATCH_SIZE, NUM_HIDDEN))
          for l in range(NUM_LSTM_LAYER)]
data_shape = [("data", (BATCH_SIZE, 2048))]
label_shape = [("veclabel",
                (BATCH_SIZE, SEQ_LEN, ))]
label_shape1 = [("softmax_label",
                 (BATCH_SIZE, SEQ_LEN, NUM_LABEL))]

f = get_feature(imgfname)
input_data = mx.nd.array(f)

veclabel = mx.nd.zeros((BATCH_SIZE, SEQ_LEN))
veclabel[0][0] = 0
input_shapes = dict(init_c+init_h+data_shape+label_shape+label_shape1)

executor = sym.simple_bind(ctx=mx.gpu(), **input_shapes)

for key in executor.arg_dict.keys():
    if key in arg_params:
        arg_params[key].copyto(executor.arg_dict[key])

state_name = []
for i in range(NUM_LSTM_LAYER):
    state_name.append("l%d_init_c" % i)
states_dict = dict(zip(state_name, executor.outputs[1:]))
input_arr = mx.nd.zeros(data_shape[0][1])

for key in states_dict.keys():
    executor.arg_dict[key][:] = 0.

input_data.copyto(executor.arg_dict["data"])
veclabel.copyto(executor.arg_dict["veclabel"])

executor.forward()

for key in states_dict.keys():
    states_dict[key].copyto(executor.arg_dict[key])

prob = executor.outputs[0].asnumpy()

img = cv2.imread(imgfname)[:,:,::-1]
plt.imshow(img)
# [_, _, _, vocab, _]  = pickle.load(open(VOCABF, 'r'))
for index in range(0, BATCH_SIZE):
    p = np.reshape(prob, (-1, SEQ_LEN+9, len(data_train.vocabwords)+1))
    p = np.argmax(p, axis=2)[index, :]
    str1 = ''
    index = 0
    for i in p:
        if i == 2:
            break
        str1 = str1 + data_train.vocabids[i] + ' '
        index += 1
print(str1)
--2019-09-07 20:32:16--  https://www.dropbox.com/s/41kfb3ezigssa9q/testimage.jpg?dl=1
Resolving www.dropbox.com (www.dropbox.com)... 162.125.82.1, 2620:100:6032:1::a27d:5201
Connecting to www.dropbox.com (www.dropbox.com)|162.125.82.1|:443... connected.
HTTP request sent, awaiting response... 301 Moved Permanently
Location: /s/dl/41kfb3ezigssa9q/testimage.jpg [following]
--2019-09-07 20:32:16--  https://www.dropbox.com/s/dl/41kfb3ezigssa9q/testimage.jpg
Reusing existing connection to www.dropbox.com:443.
HTTP request sent, awaiting response... 302 Found
Location: https://uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com/cd/0/get/AoFTMraInSdnB6_4Iu5UayD1ciRduP4srL9EKflu0I0iDh_xQTHM0D2B0s8RHIVBgbXSgPc_e5zu5MW5jo7OFWkWN_RZn4y0eOSdsfvKclHlpQ/file?dl=1# [following]
--2019-09-07 20:32:16--  https://uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com/cd/0/get/AoFTMraInSdnB6_4Iu5UayD1ciRduP4srL9EKflu0I0iDh_xQTHM0D2B0s8RHIVBgbXSgPc_e5zu5MW5jo7OFWkWN_RZn4y0eOSdsfvKclHlpQ/file?dl=1
Resolving uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com (uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com)... 162.125.82.6, 2620:100:6032:6::a27d:5206
Connecting to uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com (uc753a5d9ed24c2da5ace20c6c3a.dl.dropboxusercontent.com)|162.125.82.6|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 200245 (196K) [application/binary]
Saving to: ‘testimage.jpg’

testimage.jpg       100%[===================>] 195.55K   468KB/s    in 0.4s    

2019-09-07 20:32:17 (468 KB/s) - ‘testimage.jpg’ saved [200245/200245]

man in a cap is riding a surfboard in the ocean . 
In [ ]: