Implementation of MNIST Handwritten Number Recognition

MNIST Data Set
MNIST is a simple handwritten digit recognition data set consisting of 70,000 black and white pictures with 28 x 28 pixels (low resolution). Each of the pictures has a number of 0 to 9. The task is to classify the pictures into 10 categories according to the number on the pictures.
The official address of the data set is It consists of the following four parts:
  train-images-idx3-ubyte.gz: training set images (9912422 bytes)
  train-labels-idx1-ubyte.gz: training set labels (28881 bytes)
  t10k-images-idx3-ubyte.gz: test set images (1648877 bytes)
  t10k-labels-idx1-ubyte.gz: test set labels (4542 bytes)
They are a training picture set, a training label set, a test picture set and a test label set. This is not an ordinary text file or picture file, but a compressed file, which is downloaded and decompressed to get a binary file. The training set contains 60,000 pictures as training data, and the test set contains 10,000 pictures as testing data.

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
# mnist = input_data.read_data_sets("/home/jiangziyang/MNIST_data",one_hot=True)
mnist = input_data.read_data_sets("/mnt/projects/tflearn/chapter6/MNIST_data",one_hot=True)

batch_size = 100                #Set the batch size for each round of training
learning_rate = 0.8             #learning rate
learning_rate_decay = 0.999     #Attenuation of learning rate
max_steps = 30000               #Maximum training steps

#Define the variables that store the number of training rounds. When using Tensorflow to train the neural network,
#Generally, variables representing the number of training rounds are set to untrained by trainable parameters.
training_step = tf.Variable(0, trainable=False)

#Define the forward propagation calculation of the hidden layer and the output layer, and use relu() as the activation function.
def hidden_layer(input_tensor,weights1,biases1,weights2,biases2,layer_name):
    return tf.matmul(layer1,weights2)+biases2

x = tf.placeholder(tf.float32,[None,784],name="x-input")   #INPUT_NODE=784
y_ = tf.placeholder(tf.float32,[None,10],name="y-output")   #OUT_PUT=10
#Generate hidden layer parameters, where weights contain 784x500=392000 parameters
biases1 = tf.Variable(tf.constant(0.1,shape=[500]))
#Generate output layer parameters, where weights 2 contains 500x10=5000 parameters
weights2 = tf.Variable(tf.truncated_normal([500, 10], stddev=0.1))
biases2 = tf.Variable(tf.constant(0.1, shape=[10]))

#The value of y obtained by forward propagation of the neural network is calculated. There is no sliding average used here.
y = hidden_layer(x,weights1,biases1,weights2,biases2,'y')

#Initialize a sliding average class with attenuation rate of 0.99
#In order to make the model update faster in the early training period, num_updates parameters are provided here.
#And set the number of training rounds for the current network
averages_class = tf.train.ExponentialMovingAverage(0.99,training_step)
#Defining an operation that updates the sliding average of variables requires providing a list of parameters to the apply() function of the sliding average class.
#The train_variables() function returns the elements in Graph.TRAINABLE_VARIABLES on the set graph.
#The elements of this set are all parameters that do not specify trainable_variables=False
averages_op = averages_class.apply(tf.trainable_variables())
#The value of y obtained by forward propagation of the neural network is calculated again. The sliding average is used here, but it is important to remember that the sliding average is only a shadow variable.
average_y = hidden_layer(x,averages_class.average(weights1),averages_class.average(biases1),

#The function prototype for calculating cross-entropy loss is sparse_softmax_cross_entropy_with_logits(_sential, labels, logdits, name)
#It works in the same way as the calculation of the softmax_cross_entropy_with_logits() function, and is suitable for the case that each category is independent and exclusive of each other.
#That is, a picture can only belong to one category. In version 1.0.0 of TensorFlow, this function can only be used by naming parameters, here
#logits parameter is the forward propagation result of neural network without software Max layer. lables parameter gives the correct answer to training data.
cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y,labels=tf.argmax(y_, 1))
#The prototype of argmax() function is argmax(input, axis, name, dimension), which is used to calculate the predictive answer of each sample.
#The input parameter y is a two-dimensional array of batch_size* 10 (batch_size rows, 10 columns), and each row represents the result of a sample forward propagation.
#The axis parameter "1" indicates that the operation of selecting the maximum value is performed only in the first dimension, that is, the subscript corresponding to the maximum value is selected only in each row.
#The result is a one-dimensional array with a length of batch_size. The values in this one-dimensional array represent the corresponding values of each sample.
#Number recognition results.

regularizer = tf.contrib.layers.l2_regularizer(0.0001)       #Calculating Loss Function of L2 Regularization
regularization = regularizer(weights1)+regularizer(weights2) #Regularization loss of computational model
loss = tf.reduce_mean(cross_entropy)+regularization          #Total loss

#The learning rate is set by exponential attenuation method, where the staircase parameter adopts the default False, i.e. the learning rate decays continuously.
laerning_rate = tf.train.exponential_decay(learning_rate,training_step,mnist.train.num_examples/batch_size,
#Gradient Descent Optimizer optimization algorithm is used to optimize cross-entropy loss and regularization loss
train_step= tf.train.GradientDescentOptimizer(learning_rate).minimize(loss,global_step=training_step)

#In training this model, data need to be back-propagated to update the parameters of the neural network as well as the data need to be retransmitted every time.
#Updating the sliding average of each parameter, control_dependencies() is used to complete such a one-time multiple operation.
# The same operation can be done using the following line of code:
# train_op =,averages_op)
with tf.control_dependencies([train_step,averages_op]):
     train_op = tf.no_op(name="train")

#Check whether the forward propagation result of the neural network using the moving average model is correct.
#The prototype of equal() function is equal(x, y, name), which is used to determine whether each dimension of two tensors is equal. If equal returns True, otherwise it returns False.
crorent_predicition = tf.equal(tf.argmax(average_y,1),tf.argmax(y_,1))

#The prototype of cast() function is cast(x, DstT, name). This is used to convert a bool-type data to float32-type
#Later, an average value of the float32 data is calculated, which is the correct rate of the model on this set of data.
accuracy = tf.reduce_mean(tf.cast(crorent_predicition,tf.float32))

with tf.Session() as sess:
    #In earlier versions, initialize_all_variables() functions were commonly used to initialize all variables.

    #Prepare validation data.
    validate_feed = {x:mnist.validation.images,y_:mnist.validation.labels}
    #Prepare test data.
    test_feed = {x:mnist.test.images,y_:mnist.test.labels}

    for i in range(max_steps):
        if i%1000==0:
            #Calculate the results of the moving average model on the validation data.
            # In order to get a percentage output, you need to expand the validate_accuracy by 100 times.
            validate_accuracy =, feed_dict=validate_feed)
            print("After %d trainging step(s) ,validation accuracy"
                  "using average model is %g%%"%(i,validate_accuracy*100))

        #Generate and train a batch training data for this round
        #The class generated by the input_data.read_data_sets() function provides the train.next_bacth() function.
        #By setting the batch_size parameter of the function, a small part of the training batch can be read from all the training data as a training batch.
        xs,ys = mnist.train.next_batch(batch_size=100),feed_dict={x:xs,y_:ys})

    #Using test data sets to test the final correctness of neural network training
    # In order to get a percentage output, you need to expand the test_accuracy by 100 times.
    test_accuracy =,feed_dict=test_feed)
    print("After %d trainging step(s) ,test accuracy using average"
                  " model is %g%%"%(max_steps,test_accuracy*100))

The input_data.read_data_sets function can obtain the data set provided by the above official network. The prototype of the function is as follows:

def read_data_sets(train_dir,

When calling a function, only train_dir and one_hot parameters are typically passed in. The parameter train_dir is the path to place MNIST data. If the MNIST data set file is not found under the train_dir path, the read_ data_ sets() function will call other functions in the file to download the MNIST data set file on Professor Yann LeCun's website. one_hot=True specifies whether the sample image corresponds to the annotation information.
At the same time, some functions are implemented in the file to process MNIST data. The general process is to parse the data from the original data packet into the format used for training and testing the neural network. The read_ data_ sets() function is also called on these functions.
Since the local computer does not store data sets, it is necessary to download and run the code when it encounters an error:This is because pycharm does not have write permission when downloading datasets from the official website to the local site using TensorFlow. Running sudo Python 3 using pycharm Terminal can execute normally.
The read_ data_ sets() function returns a class, which is named MNIST here. This class automatically divides the MNIST dataset into train, validation and test datasets. There are 55,000 pictures in the train set and 5,000 pictures in the validation set. These two sets constitute the training data set provided by MNIST itself.
The final operation results are as follows:
The above code uses many optimization methods, such as L2 regularization, moving average model and learning rate attenuation, which contribute to improving the prediction accuracy of the model on the test set.

Tags: network Pycharm Session sudo

Posted on Sat, 07 Sep 2019 03:47:41 -0700 by geo__