KTV song recommendation-factor splitter and DeepFM

Preface

Today should be the last of the recommended algorithms, factor decomposer deepFM.FM and FFM are skipped here, because I'm going to do something else soon, so I'll end up with deepFM. First two papers by po

After reading these two papers, you can basically understand FM and DeepFM.To save everyone's time, I'll outline some basic ideas.

FM Factor Decomposer

Before FM appeared, SVM was mostly used for CTR estimation, and of course there were other models such as SVD++,PITF, FPMC, etc. However, these models are overwhelmed with sparse matrices and the parameters are very large. What problem did FM solve?

  • Better for parameter calculation of sparse matrices
  • Reduces the size of parameters that need training, and the characteristics and number of parameters are linear
  • FM can be calculated using any real data

In fact, the appearance of FM mainly resolves the cross-feature relationship between features, omitting the model of w parameter invalidation caused by sparse matrix, directly speaking the final model:

The problem of parameter invalidation caused by sparse matrix is solved by the intersection of a vector v. Then why is the size of his parameter small? Next comes the second part of the derivation:

You can see from this that the complexity of the parameter is linear O(kn).

Keras Modeling FM

Here is a simple FM model code, which is used for reference by others. I find one problem is that he repeat the last two items, which I don't know very well, so post it out that you are interested in discussing together.

import os
os.environ["CUDA_VISIBLE_DEVICES"]="-1"

import keras.backend as K
from keras import activations
from keras.engine.topology import Layer, InputSpec


class FMLayer(Layer):
    def __init__(self, output_dim,
                 factor_order,
                 activation=None,
                 **kwargs):
        if 'input_shape' not in kwargs and 'input_dim' in kwargs:
            kwargs['input_shape'] = (kwargs.pop('input_dim'),)
        super(FMLayer, self).__init__(**kwargs)

        self.output_dim = output_dim
        self.factor_order = factor_order
        self.activation = activations.get(activation)
        self.input_spec = InputSpec(ndim=2)

    def build(self, input_shape):
        assert len(input_shape) == 2
        input_dim = input_shape[1]

        self.input_spec = InputSpec(dtype=K.floatx(), shape=(None, input_dim))

        self.w = self.add_weight(name='one', 
                                 shape=(input_dim, self.output_dim),
                                 initializer='glorot_uniform',
                                 trainable=True)
        self.v = self.add_weight(name='two', 
                                 shape=(input_dim, self.factor_order),
                                 initializer='glorot_uniform',
                                 trainable=True)
        self.b = self.add_weight(name='bias', 
                                 shape=(self.output_dim,),
                                 initializer='zeros',
                                 trainable=True)

        super(FMLayer, self).build(input_shape)

    def call(self, inputs, **kwargs):
        X_square = K.square(inputs)

        xv = K.square(K.dot(inputs, self.v))
        xw = K.dot(inputs, self.w)

        p = 0.5 * K.sum(xv - K.dot(X_square, K.square(self.v)), 1)
        
        rp = K.repeat_elements(K.expand_dims(p, 1), self.output_dim, axis=1)
        f = xw + rp + self.b

        output = K.reshape(f, (-1, self.output_dim))
        
        if self.activation is not None:
            output = self.activation(output)

        return output

    def compute_output_shape(self, input_shape):
        assert input_shape and len(input_shape) == 2
        return input_shape[0], self.output_dim
    
inp = Input(shape=(np.shape(x_train)[1],))
x = FMLayer(200, 100)(inp)
x = Dense(2, activation='sigmoid')(x)
model.compile(loss='binary_crossentropy',
                      optimizer='adam',
                      metrics=['accuracy'])

model.fit(x_train, y_train,
          batch_size=32,
          epochs=10,
          validation_data=(x_test, y_test))

Run Results

Train on 2998 samples, validate on 750 samples
Epoch 1/10
2998/2998 [==============================] - 2s 791us/step - loss: 0.0580 - accuracy: 0.9872 - val_loss: 0.7466 - val_accuracy: 0.8127
Epoch 2/10
2998/2998 [==============================] - 2s 683us/step - loss: 0.0650 - accuracy: 0.9893 - val_loss: 0.7845 - val_accuracy: 0.8067
Epoch 3/10
2998/2998 [==============================] - 2s 674us/step - loss: 0.0803 - accuracy: 0.9915 - val_loss: 0.8730 - val_accuracy: 0.7960
Epoch 4/10
2998/2998 [==============================] - 2s 681us/step - loss: 0.0362 - accuracy: 0.9943 - val_loss: 0.8771 - val_accuracy: 0.8013
Epoch 5/10
2998/2998 [==============================] - 2s 683us/step - loss: 0.0212 - accuracy: 0.9953 - val_loss: 0.9035 - val_accuracy: 0.8007
Epoch 6/10
2998/2998 [==============================] - 2s 721us/step - loss: 0.0188 - accuracy: 0.9965 - val_loss: 0.9295 - val_accuracy: 0.7993
Epoch 7/10
2998/2998 [==============================] - 2s 719us/step - loss: 0.0168 - accuracy: 0.9972 - val_loss: 0.9597 - val_accuracy: 0.8007
Epoch 8/10
2998/2998 [==============================] - 2s 693us/step - loss: 0.0150 - accuracy: 0.9973 - val_loss: 0.9851 - val_accuracy: 0.7993
Epoch 9/10
2998/2998 [==============================] - 2s 677us/step - loss: 0.0137 - accuracy: 0.9972 - val_loss: 1.0114 - val_accuracy: 0.7987
Epoch 10/10
2998/2998 [==============================] - 2s 684us/step - loss: 0.0126 - accuracy: 0.9977 - val_loss: 1.0361 - val_accuracy: 0.8000

FFM algorithm

The above FM algorithm can also be seen as having only one set of features, but there may be several sets of features in real life, such as the examples in the paper:

This includes users, movies, other movies rated by users, time information and so on, so it is not enough to cross a set of features, may involve the cross of different groups of features.So FFM came into being.Not detailed here, but deepFM.

DeepFM algorithm

Like DeepFM, I don't go into details, I don't know what I'm looking at, I'll just go straight to the point.

Advantages of DeepFM

  • Combining FM with DNN, combining high-order feature modeling DNN with low-order feature modeling FM
  • DeepFM lower-order and higher-order parts share the same characteristics to make calculations more efficient
  • DeepFM works best in CTR prediction

DeepFM Network Architecture

FM Partial Network Structure

FM part is to combine one item with two items

# One time item
fm_w_1 = Activation('linear')(K.dot(inputs, self.fm_w))
    
# Quadratic term
dot_latent = {}
dot_cat = []
for i in range(1, self.num_fields):
    print(self.num_fields)
    dot_latent[i] = {}
    for f in self.field_combinations[i]:
        print(len(self.field_combinations))
        dot_latent[i][f] = Dot(axes=-1, normalize=False)([latent[i], latent[f]])
        dot_cat.append(dot_latent[i][f])
print(dot_cat)
fm_w_2 = Concatenate()(dot_cat)
# Merge's first and second terms get FM
fm_w = Concatenate()([fm_w_1, fm_w_2])

DNN Partial Network Structure

DNN part is simpler

# Add two hidden layers
deep1 = Activation('relu')(K.bias_add(K.dot(ConcatLatent,self.h1_w),self.h1_b))
deep2 = Activation('relu')(K.bias_add(K.dot(deep1, self.h2_w), self.h2_b))

Overall Network Structure

Embeding is done here, then data is provided to DNN and FM, code as follows:

# Different fields Combinations
for i in range(1, self.num_fields + 1):
    sparse[i] = Lambda(lambda x: x[:, self.id_start[i]:self.id_stop[i]],
                               output_shape=((self.len_field[i],)))(inputTensor)
    latent[i] = Activation('linear')(K.bias_add(K.dot(sparse[i],self.embed_w[i]),self.embed_b[i]))
    
# merge different field s
ConcatLatent = Concatenate()(list(latent.values()))

DeepFM Code

The overall code is as follows:

from keras.layers import Dense, Concatenate, Lambda, Add, Dot, Activation
from keras.engine.topology import Layer
from keras import backend as K

class DeepFMLayer(Layer):
    def __init__(self, embeddin_size, field_len_group=None, **kwargs):
        self.output_dim = 1
        self.embedding_size = 10
        self.input_spec = InputSpec(ndim=2)
        
        self.field_count = len(field_len_group)
        self.num_fields = len(field_len_group)
        self.field_lengths = field_len_group
        self.embed_w = {}
        self.embed_b = {}
        self.h1 = 10
        self.h2 = 10
        
        def start_stop_indices(field_lengths, num_fields):
            len_field = {}
            id_start = {}
            id_stop = {}
            len_input = 0

            for i in range(1, num_fields + 1):
                len_field[i] = field_lengths[i - 1]
                id_start[i] = len_input
                len_input += len_field[i]
                id_stop[i] = len_input

            return len_field, len_input, id_start, id_stop

        self.len_field, self.len_input, self.id_start, self.id_stop = \
            start_stop_indices(self.field_lengths,self.num_fields)
        
        def Field_Combos(num_fields):
            field_list = list(range(1, num_fields))
            combo = {}
            combo_count = 0
            for idx, field in enumerate(field_list):
                sub_list = list(range(field + 1, num_fields + 1))
                combo_count += len(sub_list)
                combo[field] = sub_list

            return combo, combo_count

        self.field_combinations, self.combo_count = Field_Combos(self.num_fields)
        print(field_len_group)
        print(self.field_combinations)
        print(self.num_fields)
        super(DeepFMLayer, self).__init__(**kwargs)
        
    def build(self, input_shape):
        assert len(input_shape) == 2
        total_embed_size = 0
        
        self.input_spec = InputSpec(dtype=K.floatx(), shape=(None, input_shape[1]))
        
        
        for i in range(1, self.num_fields + 1):
            input_dim = self.len_field[i]
            _name = "embed_W" + str(i)
            self.embed_w[i] = self.add_weight(shape=(input_dim, self.embedding_size), initializer='glorot_uniform', name=_name)
            _name = "embed_b" + str(i)
            self.embed_b[i] = self.add_weight(shape=(self.embedding_size,), initializer='zeros', name=_name)
            total_embed_size += self.embedding_size

        
        self.fm_w = self.add_weight(name='fm_w', shape=(input_shape[1], 1), initializer='glorot_uniform', trainable=True)
        
        
        self.h1_w = self.add_weight(shape=(total_embed_size, self.h1), initializer='glorot_uniform', name='h1_w')
        self.h1_b = self.add_weight(shape=(self.h1,), initializer='zeros', name='h1_b')

        self.h2_w = self.add_weight(shape=(self.h1, self.h2), initializer='glorot_uniform', name='h2_W')
        self.h2_b = self.add_weight(shape=(self.h2,), initializer='zeros', name='h2_b')
        self.w = self.add_weight(name='w', shape=(self.h2, 1), initializer='glorot_uniform', trainable=True)
        self.b = self.add_weight(name='b', shape=(1,), initializer='zeros', trainable=True)

        super(DeepFMLayer, self).build(input_shape)

    def call(self, inputs):
        latent = {}
        sparse = {}
        # Different fields Combinations
        for i in range(1, self.num_fields + 1):
            sparse[i] = Lambda(lambda x: x[:, self.id_start[i]:self.id_stop[i]],
                                       output_shape=((self.len_field[i],)))(inputTensor)
            latent[i] = Activation('linear')(K.bias_add(K.dot(sparse[i],self.embed_w[i]),self.embed_b[i]))
        
        # merge different field s
        ConcatLatent = Concatenate()(list(latent.values()))
        
        # Add two hidden layers
        deep1 = Activation('relu')(K.bias_add(K.dot(ConcatLatent,self.h1_w),self.h1_b))
        deep2 = Activation('relu')(K.bias_add(K.dot(deep1, self.h2_w), self.h2_b))
        
        # One time item
        fm_w_1 = Activation('linear')(K.dot(inputs, self.fm_w))
        
        # Quadratic term
        dot_latent = {}
        dot_cat = []
        for i in range(1, self.num_fields):
            print(self.num_fields)
            dot_latent[i] = {}
            for f in self.field_combinations[i]:
                print(len(self.field_combinations))
                dot_latent[i][f] = Dot(axes=-1, normalize=False)([latent[i], latent[f]])
                dot_cat.append(dot_latent[i][f])
        print(dot_cat)
        fm_w_2 = Concatenate()(dot_cat)
        # Merge's first and second terms get FM
        fm_w = Concatenate()([fm_w_1, fm_w_2])
        
        fm = Lambda(lambda x: K.sum(x, axis=1, keepdims=True))(fm_w)
        
        deep_wx = Activation('linear')(K.bias_add(K.dot(deep2, self.w),self.b))
        print('build finish')
        return Add()([fm, deep_wx])
        
    def compute_output_shape(self, input_shape):
            return (input_shape[0], self.output_dim)


inputTensor = Input(shape=(np.shape(x_train)[1],))
deepFM_out = DeepFMLayer(10, {0:10, 1:10, 2:np.shape(x_train)[1]-20})(inputTensor)
out = Dense(2, activation="sigmoid", trainable=True)(deepFM_out)
model = Model(inputTensor, out)

model.compile(loss='binary_crossentropy',
                      optimizer='adam',
                      metrics=['accuracy'])

model.fit(x_train, y_train,
                  batch_size=32,
                  epochs=10,
                  validation_data=(x_test, y_test))

Run result:

Train on 2998 samples, validate on 750 samples
Epoch 1/10
2998/2998 [==============================] - 1s 425us/step - loss: 0.6403 - accuracy: 0.6344 - val_loss: 0.5578 - val_accuracy: 0.7533
Epoch 2/10
2998/2998 [==============================] - 1s 226us/step - loss: 0.4451 - accuracy: 0.8721 - val_loss: 0.4727 - val_accuracy: 0.8240
Epoch 3/10
2998/2998 [==============================] - 1s 215us/step - loss: 0.2469 - accuracy: 0.9445 - val_loss: 0.5188 - val_accuracy: 0.8360
Epoch 4/10
2998/2998 [==============================] - 1s 200us/step - loss: 0.1319 - accuracy: 0.9678 - val_loss: 0.6488 - val_accuracy: 0.8233
Epoch 5/10
2998/2998 [==============================] - 1s 211us/step - loss: 0.0693 - accuracy: 0.9843 - val_loss: 0.7755 - val_accuracy: 0.8247
Epoch 6/10
2998/2998 [==============================] - 1s 225us/step - loss: 0.0392 - accuracy: 0.9932 - val_loss: 0.9234 - val_accuracy: 0.8187
Epoch 7/10
2998/2998 [==============================] - 1s 204us/step - loss: 0.0224 - accuracy: 0.9967 - val_loss: 1.0437 - val_accuracy: 0.8200
Epoch 8/10
2998/2998 [==============================] - 1s 190us/step - loss: 0.0163 - accuracy: 0.9972 - val_loss: 1.1618 - val_accuracy: 0.8173
Epoch 9/10
2998/2998 [==============================] - 1s 190us/step - loss: 0.0106 - accuracy: 0.9980 - val_loss: 1.2746 - val_accuracy: 0.8147
Epoch 10/10
2998/2998 [==============================] - 1s 213us/step - loss: 0.0083 - accuracy: 0.9987 - val_loss: 1.3395 - val_accuracy: 0.8167
Model: "model_19"

conclusion

From the above results, it seems that pure FM and DeepFM have almost the same results on my dataset, even though they are not very different from LR, probably because my dataset is smaller.If you have any questions, we can discuss them together. I hope you can have a good look at the paper.These are some of my humble opinions, and I hope to get some inspiration from them.

Tags: Lambda network

Posted on Wed, 18 Mar 2020 22:08:40 -0700 by Pazuzu156