Tensorflow+MNIST reference notes

It's a shame. It's been a while since we contacted the neural network, but we haven't learned to use Tensorflow. We always use other ready-made models to complete the task

It's mainly because when I first contacted Tensorflow, the most basic MNIST number recognition was not able to work out. I always felt that there was a problem with parameter setting. I had time to study today and finally solved it


The code used is TensorFlow Chinese community | MNIST advanced What is provided here is changed as follows

import csv
import numpy as np
import os
import random
import tensorflow as tf

def LoadCSV(path):
    ret = []
    with open(path, 'r') as f:
        reader = csv.reader(f)
        for row in reader:
    ret = np.array(ret)
    return ret

def main():
    batch_size = 50
    max_epoches = 1000

    # Read data
    train_X_path = "./Data/MNIST_train_X.txt"
    train_Y_path = "./Data/MNIST_train_Y.txt"
    test_X_path = "./Data/MNIST_test_X.txt"
    test_Y_path = "./Data/MNIST_test_Y.txt"
    train_X = LoadCSV(train_X_path).astype(int)
    train_Y = LoadCSV(train_Y_path).astype(int)
    test_X = LoadCSV(test_X_path).astype(int)
    test_Y = LoadCSV(test_Y_path).astype(int)

    temp = np.zeros((train_X.shape[0], 10))
    for i in range(train_X.shape[0]):
        temp[i, int(train_Y[i])] = 1
    train_Y = temp
    temp = np.zeros((test_X.shape[0], 10))
    for i in range(test_X.shape[0]):
        temp[i, int(test_Y[i])] = 1
    test_Y = temp

    # modeling
    sess = tf.InteractiveSession()

    x = tf.placeholder("float", shape=[None, 784])
    y_ = tf.placeholder("float", shape=[None, 10])

    W = tf.Variable(tf.zeros([784, 10]))
    b = tf.Variable(tf.zeros([10]))


    y = tf.nn.softmax(tf.matmul(x, W) + b)
    loss = -tf.reduce_sum(y_ * tf.log(y))
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

    for i in range(max_epoches):
        batch_index = random.sample(range(train_X.shape[0]), batch_size)
        batch_X = train_X[batch_index, :]
        batch_Y = train_Y[batch_index, :]
        train_step.run(feed_dict={x: batch_X, y_: batch_Y})
        if i % 10 == 0:
            correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
            accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
            print(accuracy.eval(feed_dict={x: test_X, y_: test_Y}))

if __name__ == "__main__":

The modeled code is copied directly. The dataset is not obtained from the Internet with the script provided by them, but read the csv file extracted by myself

Output prediction accuracy once every 10 iterations

After running, the results are as follows

The observation results show that the accuracy is 0.2162 after 10 iterations, and the local optimum is 0.098 after 20 iterations. Because the initial value of W is set to 0, at this time, no matter what X is, the predicted value is b, i.e. 0, and the accuracy should be about 0.1, indicating that the initial model is being optimized. I guess the next step may be too big

In order to observe the specific situation, I will change the final print part as follows

        if i % 1 == 0:
            print(y.eval(feed_dict={x: batch_X}))

The eval() function is used to directly obtain the value of a Tensorflow variable in the interactive session() mode.

Although the predicted value of the first iteration is not accurate, the value is normal. The next iteration becomes:

In view of the normal value obtained in the last iteration, it is unlikely that nan here is caused by data type. I guess it's because w changes so fast that x*W+b calculates a very small negative value, which leads to a very small denominator of softmax function. Because of the problem of accuracy, 0 / 0 becomes nan

Since this happens in one iteration, the learning rate is reduced to 1e-7, and the final output is changed to direct output prediction value for easy observation

        if i % 1 == 0:
            print(np.argmax(y.eval(feed_dict={x: batch_X}), axis=1).ravel())
            print(np.argmax(batch_Y, axis=1).ravel())


The results are much better

To further optimize, we can also increase the number of iterations and use adaptive learning rate, which is not needed here

I don't know why it doesn't work for other people's parameters

Tags: network Session

Posted on Thu, 04 Jun 2020 06:02:05 -0700 by daniel_grant