python clustering algorithm solution (rest interface / mpp database / json data / download pictures and data)

python clustering algorithm solution (rest interface / mpp database / json data / download pictures and data) 1. scene description
Has been doing java, because of project reasons, need to encapsulate some classical algorithms to the platform, while learning python, while looking for classical algorithm code online. Today, introduce the classical K-means clustering algorithm, the algorithm principle is not introduced, only from the code level, including: rest interface, connecting mpp database. Return json data, download pictures and data.

  1. Solution
    2.1 Project Routine

(1) python classical algorithm is a separate server deployment, providing rest interface for java platform to call, interactive way is http+json;

(2) Data are obtained from mpp database-Greenplum.

(3) The data returned includes three parts: 1 is the address of generating clustering pictures; 2 is the complete data address of clustering items; 3 is the 200 json preview data returned to the front end.

2.2 restapi class
There are two classes. The first one is the restapi class, which encapsulates the rest interface class. Other classical algorithms have corresponding methods here. It is a common class.

Complete code:

-- coding: utf-8 --

from flask import Flask, request, send_from_directory
from k_means import exec
import logging
app = Flask(__name__)

1. Change the server address on the server to store data

dirpath = 'E:\ruanjianlaowang'

2. Testing Connectivity, Software Lao Wang

def index():

return "Hello, World!"

3. k-means algorithm software Lao Wang

@app.route('/getKmeansInfoByLaowang', methods=['POST'])
def getKmeansInfoByLaowang():

     result = exec(request.get_json(), dirpath)
except IndexError as e:
    return 'exception:' + str(e)
except KeyError as e:
    return 'exception:' + str(e)
except ValueError as e:
    return 'exception:' + str(e)
except Exception as e:
    return 'exception:' + str(e)
    return result

4. File download (pictures and csv)

def getImages(filename):

return send_from_directory(dirpath, filename, as_attachment=True)

5. Start-up

if name == '__main__':"", port=5000, debug=True)

Code description:

The rest service provided by third-party flask is used

(1) Change the server address on the server to store data

(2) Testing Connectivity, Lao Wang of Software

(3) Lao Wang, k-means algorithm software

(4) File download (pictures and csv)

(5) Start-up

2.3 k-means algorithm class
Complete code:

import pandas as pd
import dbgp as dbgp
from import json
from numpy import *
import matplotlib.pyplot as plt
import numpy as np
import logging

Lao Wang of Execution Software

def exec(params, dirpath):

#1. Obtain parameters, software Lao Wang
sql = params.get("sql")
xlines = params.get("xlines")
ylines = params.get("ylines")
xlinesname = params.get("xlinesname")
ylinesname = params.get("ylinesname")
grouplinesname = params.get("grouplinesname")

times = int(params.get("times"))
groupnum = int(params.get("groupnum"))
url = params.get("url")
name = params.get("name")

#2. Whether the check is empty or not, Lao Wang of software
flag = checkparam(sql, xlines, ylines, times, groupnum)
if not flag is None and len(flag) != 0:
    return flag

#3. Getting data from the database, Lao Wang of software
    data = dbgp.queryGp(sql)
except IndexError:
    return sql
except KeyError:
    return sql
except ValueError:
    return sql
except Exception:
    return sql

if data.empty:
    return "exception:This data set has no data. Please confirm and try again."
#4 Call the KMeans clustering algorithm of third-party sklearn s, Lao Wang of software
# Data_zs = 1.0* (data - data. mean ()/ data. STD () Data standardization, no standard words required
from sklearn.cluster import KMeans
model = KMeans(n_clusters=groupnum, n_jobs=4, max_iter=times)  # Start clustering

return export(model, data, data, url, dirpath, name,grouplinesname,xlines, ylines,xlinesname,ylinesname)

# 5. Generating and Exporting excel Software Lao Wang

def export(model, data, data_zs, url, dirpath, name,grouplinesname,xlines, ylines,xlinesname,ylinesname):

# #Detailed output of raw data and its categories
detail_data = pd.DataFrame().append(data)
if not grouplinesname is None and len(grouplinesname) != 0:
    detail_data.columns = grouplinesname.split(',')

r_detail_new = pd.concat([detail_data, pd.Series(model.labels_, index=detail_data.index)], axis=1)  # Detailed output of the corresponding categories for each sample
r_detail_new.columns = list(detail_data.columns) + [u'Cluster Category']  # Rename table headers
outputfile = dirpath + name + '.csv'
r_detail_new.to_csv(outputfile, encoding='utf_8_sig')  # Preserve the results
#Rename table headers
r1 = pd.Series(model.labels_).value_counts()  # Statistics of the number of categories
r2 = pd.DataFrame(model.cluster_centers_)  # Finding Cluster Centers
r = pd.concat([r2, r1], axis=1)  # Horizontal connection (0 is vertical), the number of categories corresponding to cluster centers is obtained.
r.columns = list(data.columns) + [u'Number of categories']  # Rename table headers

return generateimage(r, data_zs, url, dirpath, name,model,xlines, ylines,xlinesname,ylinesname)

6. Generate pictures and return to json, software Lao Wang

def generateimage(r, data_zs, url, dirpath, name,model,xlines, ylines,xlinesname,ylinesname):

image = dirpath + name + '.jpg'

#6.1 Chinese Processing, Software Lao Wang
plt.rcParams['font.sans-serif'] = ['simhei']
plt.rcParams[''] = 'sans-serif'
plt.rcParams['axes.unicode_minus'] = False
# 6.2 Drawing, Generating Pictures, Software Lao Wang
labels = model.labels_
centers = model.cluster_centers_
data_zs['label'] = labels
data_zs['label'] = data_zs['label'].astype(
# Icon Set
markers = ['o', 's', '+', 'x', '^', 'v', '<', '>']
colors = ['b', 'c', 'g', 'k', 'm', 'r', 'y']
symbols = []
for m in markers:
    for c in colors:
        symbols.append((m, c))
# Draw scatters and centers of mass for each category
for i in range(0, len(centers)):
    df_i = data_zs.loc[data_zs['label'] == i]
    symbol = symbols[i]
    center = centers[i]

    x = df_i[xlines].values.tolist()
    y = df_i[ylines].values.tolist()

    plt.scatter(x, y, marker=symbol[0], color=symbol[1], s=10)
    plt.scatter(center[0], center[1], marker='*', color=symbol[1], s=50)

plt.savefig(image, dpi=150)

# 6.3 Return json data to front-end display, software Lao Wang
result = {}
result['image_url'] = url + '/' + name + '.jpg'
result['details_url'] = url + '/' + name + '.csv'
result['data'] = r[:200]   #Display 200, more words, equivalent to preview
result = json.dumps(result, ensure_ascii=False)
result = result.replace('\\', '')
return result

def checkparam(sql, xlines, ylines, times, groupnum):

if sql is None or sql.strip() == '' or len(sql.strip()) == 0:
    return "Data sets or clustered data columns, not empty"
if xlines is None or xlines.strip() == '' or len(xlines.strip()) == 0:
    return "X Axis, not empty"
if ylines is None or ylines.strip() == '' or len(ylines.strip()) == 0:
    return "Y Axis, not empty"
if times is None or times <= 0:
    return "Number of clusters, not empty or less than or equal to 0"
if groupnum is None or groupnum <= 0:
    return "The number of iterations should not be empty or less than or equal to 0."

Code description:

(1) Obtain parameters, software Lao Wang;

(2) Whether the verification is empty or not, the software Lao Wang;

(3) Getting data from database, software Lao Wang;

(4) KMeans clustering algorithm of third-party sklearn s, Lao Wang of software;

(5) generation and export of excel software Lao Wang

(6) Generating pictures and returning to json, Lao Wang

(6.1) Chinese processing, software Lao Wang

(6.2) Drawing, generating pictures, software Lao Wang

(6.3) Return json data to front-end display, software Lao Wang

2.4 Implementation Effect
2.4.1 json returns
{"image_url":"","details_url":"","data":{"empno":{"0":7747.2,"1":7699.625,"2":7839.0},"mgr":{"0":7729.8,"1":7745.25,"2":7566.0},"sal":{"0":2855.0,"1":1218.75,"2":5000.0},"comm":{"0":29.5110766,"1":11 7.383964625,'2': 31.281453},'deptno': {0': 20.0, `1': 25.0, `2': 10.0}, `number of categories': {0': 5, `1': 8, `2': 1}}}}
2.4.2 Return Pictures

Data returned by 2.4.3

In addition, the current project environment is using 8-core 16G virtual machine, the amount of execution data is 300,000, running well.

I'm "Lao Wang of Software". If you think it's OK, pay attention to it and update it in seconds. Welcome to the discussion area, the same name of the public number message exchange!
Original address

Tags: Python SQL JSON REST Database

Posted on Tue, 27 Aug 2019 21:33:34 -0700 by eskimowned