The MNIST Database is a classic problem for machine learning: given thousands of 28x28 greyscale images of handrwitten digits along with their corresponding labels, create a model that can most accurately predict the numbers drawn. In this post, I’ll run through the process that got me to 0.9971 accuracy (~top 200 submissions).

Preparing Data

First, we have to prepare the data.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 train = pd.read_csv("../input/digit-recognizer/train.csv") test = pd.read_csv("../input/digit-recognizer/test.csv") y_train = train["label"] X_train = train.drop(labels=["label"], axis=1) # Normalize X_train = X_train / 255.0 test = test / 255.0 # Reshape X_train = X_train.values.reshape(-1, 28, 28, 1) test = test.values.reshape(-1, 28, 28, 1) # One-hot encode y_train = to_categorical(y_train, num_classes=10) # Split into training and validation sets X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size = 0.1) 

We perform a greyscale normalization, reshape the “rolled out” images into 28x28x1 (since the images are greyscale, we only have one channel), one-hot encode the labels, and split off 10% of our training data for validation.

Creating the Model

For the first run-through, we’ll use a standard, barebones Conv2D -> Batch Normalization -> ReLU -> MaxPool2D -> Flatten -> Dense architecture. Here’s what it looks like in Keras:

1 2 3 4 5 6 7 8 9 10 X_input = Input(shape=(28, 28, 1)) X = Conv2D(filters=32, kernel_size=(5, 5), padding='same')(X_input) X = BatchNormalization(axis=3)(X) X = Activation('relu')(X) X = MaxPool2D(pool_size=(2,2))(X) X = Flatten()(X) X = Dense(units=10, activation='softmax')(X) model = Model(inputs=X_input, outputs=X) 

We’ll use RMSprop for our optimizer and categorical cross entropy for loss. In addition, we’ll use an annealer to reduce the learning rate at plateaus.

1 2 3 optimizer = RMSprop() reduce_lr = ReduceLROnPlateau(monitor='val_loss', patience=3, verbose=1, factor=0.5, min_lr=0.0001) model.compile(optimizer='RMSprop', loss='categorical_crossentropy', metrics=['accuracy']) 

Finally, we can fit our model and prepare our submission. Here, I’m using 40 epochs with a batch size of 50.

1 2 fit(X_train, y_train, epochs=40, batch_size=50, callbacks=[reduce_lr], validation_data=(X_val, y_val)) predictions = model.predict(test) 

With a pretty barebones CNN implementation, our submission ends up with an accuracy of 0.982.

Improvements

The first big addition we can make is implementing data augmentation. By making small transformations, we can artificially expand our dataset to train a more accurate model. We have to be careful not to actually change the numbers in the transformations, though (i.e. a flipped 5 is no longer a 5). Here are the transformations I made:

1 2 3 4 5 6 7 datagen = ImageDataGenerator( shear_range=0.1, rotation_range=15, zoom_range = 0.1, width_shift_range=0.1, height_shift_range=0.1 ) 

Next, we can improve our model. Here, I add 3 more convolutional layers, 1 fully connected layer, and cut down on the overfitting with some tweaked usage of batch optimization and dropouts, mostly through trial and error. This makes my new model [[Conv2D -> Batch Normalization -> ReLU] * 2 -> MaxPool2D -> Dropout] * 2 -> Flatten -> Dense -> Batch Normalization -> Dropout -> Dense.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 X_input = Input(shape=(28, 28, 1)) X = Conv2D(filters=32, kernel_size=(5, 5), padding='same')(X_input) X = BatchNormalization()(X) X = Activation('relu')(X) X = Conv2D(filters=32, kernel_size=(5, 5), padding='same')(X) X = BatchNormalization()(X) X = Activation('relu')(X) X = MaxPool2D(pool_size=(2,2))(X) X = Dropout(0.1)(X) X = Conv2D(filters=64, kernel_size=(3, 3), padding='same')(X) X = BatchNormalization()(X) X = Activation('relu')(X) X = Conv2D(filters=64, kernel_size=(3, 3), padding='same')(X) X = BatchNormalization()(X) X = Activation('relu')(X) X = MaxPool2D(pool_size=(2,2), strides=(2,2))(X) X = Dropout(0.1)(X) X = Flatten()(X) X = Dense(units=256, activation='relu')(X) X = BatchNormalization()(X) X = Dropout(0.2)(X) X = Dense(units=10, activation='softmax')(X) 

With these two new changes, my submission ends up at a much improved score of 0.99714 (~top 200 submissions).

You can view my full notebook here: