As we know, the GoogLeNet image classification network has a couple of additional outputs connected to some of its intermediate layers during training. As per GoogLeNet paper: "By adding auxiliary classifiers connected to these intermediate layers, we would expect to encourage discrimination in the lower stages in the classifier, increase the gradient signal that gets propagated back, and provide additional regularization. [...] During training, their loss gets added to the total loss of the network with a discount weight (the losses of the auxiliary classifiers were weighted by 0.3)."

One of the intermediate outputs
One of the intermediate outputs

Initial implementation

To reflect this structure in the model, I added both of those auxiliary outputs to the output list (as one should):

[...]
model = Model(inputs = X_input, outputs = [main, aux1, aux2])
model.compile(loss='categorical_crossentropy', 
              loss_weights={'main': 1.0, 'aux1': 0.3, 'aux2': 0.3},
              optimizer='sgd', metrics=['accuracy'])

Then comes the part that provides the data and trains the model:

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set train/validation split

train_generator = train_datagen.flow_from_directory(
    image_dir,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical',
    subset='training')

validation_generator = train_datagen.flow_from_directory(
    image_dir,
    target_size=(image_size, image_size),
    batch_size=batch_size,
    class_mode='categorical',
    subset='validation')

model.fit_generator(
    train_generator,
    steps_per_epoch = train_samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_samples // batch_size,
    epochs = epoch_num)

However, running this code produces an error:

ValueError: Error when checking model target: the list of Numpy arrays that you are passing to your model is not the size the model expected. Expected to see 3 array(s), but instead got the following list of 1 arrays: [array([[0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 0., 0., 0....

How to fix this

Fortunately, it's possible to provide a custom generator to the fit_generator method. Keras documentation has a small example on that, but what exactly should we yield as our inputs/outputs? And how to make use of the ImageDataGenerator that's conveniently handling reading images and splitting them to train/validation sets for us?

Turns out, the generator has a next() method which does exactly what you'd expect - returns a tuple with next batch of images and labels. The images are serving as an input, and the labels (which is a bunch of one-hot vectors) are there to provide ground truth for the model. Since GoogLeNet has 3 softmax layers that output guessed category, we need to yield the same ground truth 3 times for them to compare their guesses with. Here is what it would look like:

train_datagen = ImageDataGenerator(rescale=1./255,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2) # set train/validation split

# custom generator
def multiple_outputs(generator, image_dir, batch_size, image_size, subset):
    gen = generator.flow_from_directory(
        image_dir,
        target_size=(image_size, image_size),
        batch_size=batch_size,
        class_mode='categorical',
        subset=subset)
    
    while True:
        gnext = gen.next()
        # return image batch and 3 sets of lables
        yield gnext[0], [gnext[1], gnext[1], gnext[1]]
        
train_generator = multiple_outputs(
    train_datagen,
    image_dir=image_dir,
    batch_size=batch_size,
    image_size=image_size,
    subset='training')
     
validation_generator = multiple_outputs(
    train_datagen,
    image_dir=image_dir,
    batch_size=batch_size,
    image_size=image_size,
    subset='validation')

model.fit_generator(
    train_generator,
    steps_per_epoch = train_samples // batch_size,
    validation_data = validation_generator, 
    validation_steps = validation_samples // batch_size,
    epochs = epoch_num)

I imagine you can use the same principle if your model requires several inputs or even inputs/outputs of different shapes, etc.

Thanks for reading.