Top-down interpretation of ncnn series: loading param model and forward propagation of bin files

Since this period of time began to realize the transformation from tensorflow to ncnn, we have a certain understanding of the ncnn framework in the development process, and hereby share it.

For details and steps of tensorflow 2ncnn, you can refer to my github:
https://github.com/hanzy88/tensorflow2ncnn
At present, CNN+FC conversion has been supported, and yolov3 based on full cnn/mobile enetv2 has been tested successfully, due to certain accuracy loss results are basically correct. The test runs successfully on jetson nano (because full CNN yolov3 is too big to run very well! Carton.

Back to the point, when I looked at the ncnn source code, I started with mat. Although I looked at some of them, I really did not have specific concepts. So this series of articles mainly started with the introduction of model and weight from top to bottom to interpret ncnn.

Introduction of Model and Weight Document

After completing the ncnn model transformation, two files are usually obtained:

ncnn.param
ncnn.bin

param stores the model structure and bin stores the weight files of OPS like convolution.

The structure of param is as follows:

7767517
3 3
Input         input    0 1 data 0=4 1=4 2=1
InnerProduct  ip       1 1 data fc 0=10 1=1 2=80
Softmax       softmax  1 1 fc prob 0=0

The first line is magic num, specifying 7767517
In the second line, the first is the number of layers and the second is the number of blobs, where blobs are data structures passed between layers, defined in blob.h, blob.cpp, as follows:

class Blob
{
public:
    // empty
    Blob();

public:
#if NCNN_STRING
    // blob name
    std::string name;
#endif // NCNN_STRING
    // Layer index which produces the index of the blob as output record output layer
    int producer;
    // Layer index which needs this blob as input record input layer index
    std::vector<int> consumers;
};

From the third line to the end, the transmission information of each layer is recorded, as an example of the third action:
Column 1 is the op of the current layer, column 2 is the layer name, column 3 and column 4 are the number of input and output nodes respectively, and then string is the name of input node and output node. The next number is the constant parameter that is passed in at each level. Look down specifically:

[layer type] [layer name] [input count] [output count] [input blobs] [output blobs] [layer specific params]

layer type : type name, such as Convolution Softmax etc
layer name : name of this layer, must be unique among all layer names
input count : count of the blobs this layer needs as input
output count : count of the blobs this layer produces as output
input blobs : name list of all the input blob names, seperated by space, must be unique among input blob names of all layers
output blobs : name list of all the output blob names, seperated by space, must be unique among output blob names of all layers
layer specific params : key=value pair list, seperated by space

Layer parameters:

0=1 1=2.5 -23303=2,2.0,3.0

key index should be unique in each layer line, pair can be omitted if the default value used

the meaning of existing param key index can be looked up at https://github.com/Tencent/ncnn/wiki/operation-param-weight-table

integer or float key : index 0 ~ 19
integer value : int
float value : float
integer array or float array key : -23300 minus index 0 ~ 19
integer array value : [array size],int,int,...,int
float array value : [array size],float,float,...,float

From the official explanation, if you still don't understand it, take the parametric reading code that I defined Range in tensorflow2ncnn.cpp as an example:

else if(node.op() == "Range"){
            const tensorflow::TensorProto& start = weights[node.input(0)];
            const tensorflow::TensorProto& limit = weights[node.input(1)];
            const tensorflow::TensorProto& delta = weights[node.input(2)];

            const int * start_data = reinterpret_cast<const int *>(start.int_val().begin()); 
            const int * limit_data = reinterpret_cast<const int *>(limit.int_val().begin()); 
            const int * delta_data = reinterpret_cast<const int *>(delta.int_val().begin()); 


            fprintf(pp, " 0=%d", *start_data);
            fprintf(pp, " 1=%d", *limit_data);
            fprintf(pp, " 2=%d", *delta_data);
        }

Here, 0, 1, 2 can be defined at will, but you need to know what it specifically stands for, so that when passing forward, you can operate according to the defined index(i.e.,0,1,2), so index 0 may have different meanings at different levels.

The bin file structure is as follows:

  +---------+---------+---------+---------+---------+---------+
  | weight1 | weight2 | weight3 | weight4 | ....... | weightN |
  +---------+---------+---------+---------+---------+---------+
  ^         ^         ^         ^
  0x0      0x80      0x140     0x1C0

the model binary is the concatenation of all weight data, each weight buffer is aligned by 32bit.

The structure of weight buff is as follows:

[flag] (optional)
[raw data]
[padding] (optional)

flag : unsigned int, little-endian, indicating the weight storage type, 0 => float32, 0x01306B47 => float16, otherwise => quantized int8, may be omitted if the layer implementation forced the storage type explicitly
raw data : raw weight data, little-endian, float32 data or float16 data or quantized table and indexes depending on the storage type flag
padding : padding space for 32bit alignment, may be omitted if already aligned

refer to: https://github.com/Tencent/ncnn/wiki/param-and-model-file-structure

After obtaining the model file, we need to construct the corresponding net object, load the model and weight, and then get the output of the specified layer after the input.

	ncnn::Net net;
    net.load_param("ncnn.param");
    net.load_model("ncnn.bin");

    const int target_size = 227;
    int img_w = bgr.cols;
    int img_h = bgr.rows;
    ncnn::Mat in = ncnn::Mat::from_pixels_resize(bgr.data, ncnn::Mat::PIXEL_BGR, bgr.cols, bgr.rows, target_size, target_size);

    ncnn::Extractor ex = net.create_extractor();
    ex.input("input_x", in);

    ncnn::Mat out;
    ex.extract("softmax", out);

(param model and read of bin file are defined in net.h and net.cpp)

II. param model reading

Take Net::load_param(FILE* fp) as an example:

int Net::load_param(FILE* fp)
{
	//Read magic number
    int magic = 0;
    int nbr = fscanf(fp, "%d", &magic);
    if (nbr != 1)
    {
        LOG_HAN;
        fprintf(stderr, "issue with param file\n");
        return -1;
    }
    if (magic != 7767517)
    {
        fprintf(stderr, "param is too old, please regenerate\n");
        return -1;
    }

    // parse reads layer_count and blob_count
    int layer_count = 0;
    int blob_count = 0;
    nbr = fscanf(fp, "%d %d", &layer_count, &blob_count);
    if (nbr != 2 || layer_count <= 0 || blob_count <= 0)
    {
        //LOG_HAN;
		fprintf(stderr, "nbr %d, layer_count %d, blob_count %d", nbr, layer_count, blob_count);
		fprintf(stderr, "issue with param file\n");
        return -1;
    }

    layers.resize((size_t)layer_count);
    blobs.resize((size_t)blob_count);

    ParamDict pd;

    int blob_index = 0;
    for (int i=0; i<layer_count; i++)
    {
        int nscan = 0;

        char layer_type[257];  //Different type s correspond to different Ops
        char layer_name[257]; //Name of the current layer
        int bottom_count = 0;
        int top_count = 0;
        //Read in layer_op, layer_name, number of inputs and number of outputs
        nscan = fscanf(fp, "%256s %256s %d %d", layer_type, layer_name, &bottom_count, &top_count);
        if (nscan != 4)
        {
            continue;
        }
		// Create Layer objects and dynamically bind them according to different Ops
		// Layer is defined in layer.cpp and layer.h, and the code corresponding to op is defined in src/layer
        Layer* layer = create_layer(layer_type);
        if (!layer)
        {
            layer = create_custom_layer(layer_type);
        }
        if (!layer)
        {
            fprintf(stderr, "layer %s of name %s not exists or registered\n", layer_type, layer_name);
            clear();
            return -1;
        }

        layer->type = std::string(layer_type);
        layer->name = std::string(layer_name);
//         fprintf(stderr, "new layer %d %s\n", i, layer_name);
		
        layer->bottoms.resize(bottom_count);
		//Create input nodes and store them in the current layer
        for (int j=0; j<bottom_count; j++)
        {
            char bottom_name[257];
            nscan = fscanf(fp, "%256s", bottom_name);
            //fprintf(stderr, "new blob %s, %d\n", bottom_name, __LINE__);
            if (nscan != 1)
            {
                continue;
            }

            int bottom_blob_index = find_blob_index_by_name(bottom_name);
			//LOG_HAN;
            if (bottom_blob_index == -1)
            {
                Blob& blob = blobs[blob_index];

                bottom_blob_index = blob_index;

                blob.name = std::string(bottom_name);
                fprintf(stderr, "new blob %s, %d\n", bottom_name, blob_index);

                blob_index++;
            }

            Blob& blob = blobs[bottom_blob_index];

            blob.consumers.push_back(i);

            layer->bottoms[j] = bottom_blob_index;
        }
		//Create output nodes and store them in the current layer
        layer->tops.resize(top_count);
        for (int j=0; j<top_count; j++)
        {
            Blob& blob = blobs[blob_index];

            char blob_name[257];
            nscan = fscanf(fp, "%256s", blob_name);
            if (nscan != 1)
            {
                continue;
            }

            blob.name = std::string(blob_name);
//             fprintf(stderr, "new blob %s\n", blob_name);

            blob.producer = i;

            layer->tops[j] = blob_index;

            blob_index++;
        }

        // layer specific params where the value is defined as a constant layer parameter
        // Call ParamDict::load_param to fetch all defined values
        int pdlr = pd.load_param(fp);
        if (pdlr != 0)
        {
            fprintf(stderr, "ParamDict load_param failed\n");
            continue;
        }
		
		//Call loar_param function overloaded by specific op layer
		//Remove the specified required value
        int lr = layer->load_param(pd);
        if (lr != 0)
        {
            fprintf(stderr, "layer load_param failed\n");
            continue;
        }

        layers[i] = layer;
    }

    return 0;
}

In loading the specific parameters of each layer, take Range as an example (in src/layer/tfrange.h and tfrang.cpp):

int TFRange::load_param(const ParamDict& pd)
{
    start = pd.get(0, 0);
    limit = pd.get(1, 1);
    delta = pd.get(2, 1);
    //fprintf(stderr, "slices: %d %d %d \n", start, limit, delta);
    return 0;
}

Three values defined at the time of conversion can be taken out.

III. bin file reading

Take Net::load_model(FILE* fp) as an example:

int Net::load_model(FILE* fp)
{
    if (layers.empty())
    {
        fprintf(stderr, "network graph not ready\n");
        return -1;
    }

    // load file
    int ret = 0;

    ModelBinFromStdio mb(fp); //The assignment structure of mb is defined in modelbin.h and modelbin.cpp
    //Read in layers in turn
    for (size_t i=0; i<layers.size(); i++)
    {
        Layer* layer = layers[i];
        
        //Here we found inconsistent content in the parameter file.
        if (!layer){
            fprintf(stderr, "load_model error at layer %d, parameter file has inconsistent content.\n", (int)i);
            ret = -1;
            break;
        }
		//Call the load_model of the current layer. If the current layer is not overloaded, call Layer's load_model directly and return 0.
		//The size of the input value of the overloaded function is as follows: weight_data = mb.load(weight_data_size, 0);
		//Call Mat ModelBinFromStdio::load(int w, int type) const to read the file stream through the fread function
        int lret = layer->load_model(mb);
        if (lret != 0)
        {
            fprintf(stderr, "layer load_model %d failed\n", (int)i);
            ret = -1;
            break;
        }

        int cret = layer->create_pipeline(opt);
        if (cret != 0)
        {
            fprintf(stderr, "layer create_pipeline %d failed\n", (int)i);
            ret = -1;
            break;
        }
    }

    fuse_network();

    return ret;
}

Among them, the weight file is not used in every layer, and the layer specific parameters in param are the same. According to the specific op definition, we can refer to FusedBatchnorm in tensorflow 2ncnn.cpp:

const tensorflow::TensorProto& scale = weights[node.input(1)];
            const tensorflow::TensorProto& B = weights[node.input(2)];
            const tensorflow::TensorProto& mean = weights[node.input(3)];
            const tensorflow::TensorProto& var = weights[node.input(4)];

            int channels = scale.tensor_shape().dim(0).size(); // data size
            //fprintf(stderr, "channels: %d\n", channels);
            int dtype = scale.dtype();

            switch (dtype){
                case 1: //float
                {
                    float * scale_tensor = (float *)malloc(sizeof(float) * channels);
                    float * mean_tensor = (float *)malloc(sizeof(float) * channels);
                    float * var_tensor = (float *)malloc(sizeof(float) * channels);
                    float * b_tensor = (float *)malloc(sizeof(float) * channels);
                    const float * scale_data = reinterpret_cast<const float *>(scale.tensor_content().c_str());
                    const float * mean_data = reinterpret_cast<const float *>(mean.tensor_content().c_str());
                    const float * var_data = reinterpret_cast<const float *>(var.tensor_content().c_str());
                    const float * b_data = reinterpret_cast<const float *>(B.tensor_content().c_str());
                    
                    for(int i=0;i<channels;i++){
                        scale_tensor[i] = *scale_data++;
                        mean_tensor[i] = *mean_data++;
                        var_tensor[i] = *var_data++;
                        b_tensor[i] = *b_data++;
                        //fprintf(stderr, "scale_data: %f\n", * scale_data);
                    }

                    fwrite(scale_tensor, sizeof(float), channels, bp);
                    fwrite(mean_tensor, sizeof(float), channels, bp);
                    fwrite(var_tensor, sizeof(float), channels, bp);
                    fwrite(b_tensor, sizeof(float), channels, bp);
                    break;
                }

After writing the mean, variance and scaling translation factor, forward propagation in the batchnorm layer (batchnorm.cpp) can be read by load_model:

int BatchNorm::load_model(const ModelBin& mb)
{
    slope_data = mb.load(channels, 1);
    if (slope_data.empty())
        return -100;

    mean_data = mb.load(channels, 1);
    if (mean_data.empty())
        return -100;

    var_data = mb.load(channels, 1);
    if (var_data.empty())
        return -100;

    bias_data = mb.load(channels, 1);
    if (bias_data.empty())
        return -100;

    a_data.create(channels);
    if (a_data.empty())
        return -100;
    b_data.create(channels);
    if (b_data.empty())
        return -100;

    for (int i=0; i<channels; i++)
    {
        float sqrt_var = sqrt(var_data[i] + eps);
        a_data[i] = bias_data[i] - slope_data[i] * mean_data[i] / sqrt_var;
        b_data[i] = slope_data[i] / sqrt_var;
    }

    return 0;
}

IV. Forward Transfer

After defining the ncnn::Extractor object, it first passes into the input layer:

// This is a relatively simple block, which gets the index of the input op according to the input name and assigns the incoming data.
int Extractor::input(const char* blob_name, const Mat& in)
{
    int blob_index = net->find_blob_index_by_name(blob_name);
    if (blob_index == -1)
        return -1;

    return input(blob_index, in);
}

int Extractor::input(int blob_index, const Mat& in)
{
    if (blob_index < 0 || blob_index >= (int)blob_mats.size())
        return -1;

    blob_mats[blob_index] = in;

    return 0;
}

The next step is to extract the output layer. The output layer can take any layer defined in it at will, and the layer calculated by extract will not be repeated. If the subsequent extract involves uncalculated layers, it will still take some time to calculate.

Because the code is too long, the interception part explains:

// Also get index input from the output layer
int Extractor::extract(const char* blob_name, Mat& feat)
{
    int blob_index = net->find_blob_index_by_name(blob_name);
    if (blob_index == -1)
        return -1;

    return extract(blob_index, feat);
}

int Extractor::extract(int blob_index, Mat& feat)
{
    if (blob_index < 0 || blob_index >= (int)blob_mats.size())
        return -1;

    int ret = 0;
	//Get the output index of the required acquisition layer
    if (blob_mats[blob_index].dims == 0)
    {
        int layer_index = net->blobs[blob_index].producer;
        ret = net->forward_layer(layer_index, blob_mats, opt);
   }
  //Get the corresponding result of Index of the current acquisition layer
  feat = blob_mats[blob_index];
  return ret;
}      

Where forward_layer is passed as a parameter between layers:

int Net::forward_layer(int layer_index, std::vector<Mat>& blob_mats, Option& opt) const
{
	//Invoke recursively from the output layer to the blob_mats of the corresponding layer index
    const Layer* layer = layers[layer_index];

	//The one_blob_only property of each layer layer, when true, with single input and single output
    if (layer->one_blob_only)
    {
        // load bottom blob
        int bottom_blob_index = layer->bottoms[0];
        int top_blob_index = layer->tops[0];

        if (blob_mats[bottom_blob_index].dims == 0)
        {
        	//Get the first level of input in turn
            int ret = forward_layer(blobs[bottom_blob_index].producer, blob_mats, opt);
            if (ret != 0)
                return ret;
        }

        Mat bottom_blob = blob_mats[bottom_blob_index];

        if (opt.lightmode)
        {
            // delete after taken in light mode
            blob_mats[bottom_blob_index].release();
            // deep copy for inplace forward if data is shared
            if (layer->support_inplace && *bottom_blob.refcount != 1)
            {
                bottom_blob = bottom_blob.clone();
            }
        }

        // forward supports support_inplace attribute, i.e. output mat instead of input mat
        if (opt.lightmode && layer->support_inplace)
        {
            Mat& bottom_top_blob = bottom_blob;
			 //Call the farward_inplace overloaded by the current layer
            int ret = layer->forward_inplace(bottom_top_blob, opt);
            if (ret != 0)
                return ret;

            // store top blob
            blob_mats[top_blob_index] = bottom_top_blob;
        }
        else
        {
        	//Otherwise, the forwad function of overloaded input mat and output mat
            Mat top_blob;
            int ret = layer->forward(bottom_blob, top_blob, opt);
            if (ret != 0)
                return ret;

            // store top blob
            blob_mats[top_blob_index] = top_blob;
        }

    }
    else
    {
        // load bottom blobs
        //Multiple Input, Multiple Output/Multiple Input, Single Output/Single Input, Multiple Output
        std::vector<Mat> bottom_blobs(layer->bottoms.size());
        for (size_t i=0; i<layer->bottoms.size(); i++)
        {
            int bottom_blob_index = layer->bottoms[i];

            if (blob_mats[bottom_blob_index].dims == 0)
            {
                int ret = forward_layer(blobs[bottom_blob_index].producer, blob_mats, opt);
                if (ret != 0)
                    return ret;
            }

            bottom_blobs[i] = blob_mats[bottom_blob_index];

            if (opt.lightmode)
            {
                // delete after taken in light mode
                blob_mats[bottom_blob_index].release();
                // deep copy for inplace forward if data is shared
                if (layer->support_inplace && *bottom_blobs[i].refcount != 1)
                {
                    bottom_blobs[i] = bottom_blobs[i].clone();
                }
            }
        }

        // forward
        if (opt.lightmode && layer->support_inplace)
        {
            std::vector<Mat>& bottom_top_blobs = bottom_blobs;
            int ret = layer->forward_inplace(bottom_top_blobs, opt);
            if (ret != 0)
                return ret;

            // store top blobs
            for (size_t i=0; i<layer->tops.size(); i++)
            {
                int top_blob_index = layer->tops[i];

                blob_mats[top_blob_index] = bottom_top_blobs[i];
            }
        }
        else
        {
            std::vector<Mat> top_blobs(layer->tops.size());
            int ret = layer->forward(bottom_blobs, top_blobs, opt);
            if (ret != 0)
                return ret;

            // store top blobs
            for (size_t i=0; i<layer->tops.size(); i++)
            {
                int top_blob_index = layer->tops[i];

                blob_mats[top_blob_index] = top_blobs[i];
            }
        }
    }

    return 0;
}

This is the whole process of forward transfer of ncnn based on param and bin.

To be continued

Tags: github Mobile network Attribute

Posted on Fri, 30 Aug 2019 01:54:11 -0700 by aesthetics1