Machine learning feature Engineering 3 - feature tools

FeatureTools introduction

Featuretools is a framework for performing automatic feature engineering. It is good at transforming interrelated data sets into feature matrices for deep learning. We can divide the operations of feature construction into two categories: transformation and aggregation. Let's use the following example to learn how to use featuretools.
Code example address:
https://github.com/scottlinlin/auto_feature_demo.git

install

pip install featuretools

quick get start

1. Import feauretool

import featuretools as ft

2. Load data

#Loading data
clients = pd.read_csv('data/clients.csv', parse_dates = ['joined'])
loans = pd.read_csv('data/loans.csv', parse_dates = ['loan_start', 'loan_end'])
payments = pd.read_csv('data/payments.csv', parse_dates = ['payment_date'])

Output:



3. Creating solids and solid sets

#Create entity
es = ft.EntitySet(id = 'clients')

#Add clients entity
es = es.entity_from_dataframe(entity_id = 'clients', dataframe = clients, 
                              index = 'client_id', time_index = 'joined')

#Add loads entity
es = es.entity_from_dataframe(entity_id = 'loans', dataframe = loans, 
                              variable_types = {'repaid': ft.variable_types.Categorical},
                              index = 'loan_id', 
                              time_index = 'loan_start')


#Add pyments entity
es = es.entity_from_dataframe(entity_id = 'payments', 
                              dataframe = payments,
                              variable_types = {'missed': ft.variable_types.Categorical},
                              make_index = True,
                              index = 'payment_id',
                              time_index = 'payment_date')
#Print entity set
es

Output:



4. Add entity relationship

# Associating clients and loans entities through client? ID
r_client_previous = ft.Relationship(es['clients']['client_id'],
                                    es['loans']['client_id'])
es = es.add_relationship(r_client_previous)

# Associating payments and loans entities through loan? ID
r_payments = ft.Relationship(es['loans']['loan_id'],
                             es['payments']['loan_id'])
es = es.add_relationship(r_payments)

#Print entity set
es

Output:



5. Aggregate features and generate new ones

#Aggregate features and generate new ones
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients')
features.head()

Input:



6. Aggregate features, generating new features by specifying aggregate and transform functions

#Aggregate features, which are generated by specifying aggregate AGG primitives and transforming trans primitives
features, feature_names = ft.dfs(entityset = es, target_entity = 'clients', 
                                 agg_primitives = ['mean', 'max', 'percent_true', 'last'],
                                 trans_primitives = ['years', 'month', 'subtract', 'divide'])
features.head()

Output:


Tags: github git pip

Posted on Thu, 13 Feb 2020 10:54:29 -0800 by Rik Peters