JS Reverse-Decoding Crawler Parameters in Dow Translation

JS Reverse - Decoding the Sign of the Doctrine Translation Crawler

An Analysis of Request Parameters

https://fanyi.baidu.com/

Open the Chrome debugging tool, and then enter a paragraph of text at will to see the results of the package grabbing.

<img src="pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_17-10-16.png"/>

  • POST request:

    • Form data parameter:

​ <img src="pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_17-15-01.png"/>

  • The data parameter has two values that change.

    • query: What we want to translate
    • sign: A string of encrypted strings

      After many tests, it is found that the value of token is unchanged, so we can use it directly.

​ token: e1ee8dc91e3ad02a0a5b5851ede622bd

We can now preliminarily build the request: the code is as follows

import requests
import json
import jsonpath
import execjs

url = "https://fanyi.baidu.com/v2transapi"
headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:66.0) Gecko/20100101 Firefox/66.0",
    "Cookie": "REALTIME_TRANS_SWITCH=1; FANYI_WORD_SWITCH=1; HISTORY_SWITCH=1; SOUND_SPD_SWITCH=1; SOUND_PREFER_SWITCH=1; APPGUIDE_8_0_0=1; BAIDUID=79357D8CD8AA6632236EC4004F8ED7EF:FG=1; BIDUPSID=79357D8CD8AA6632236EC4004F8ED7EF; PSTM=1568295162; BDUSS=c4Q2tKZDF3blRwa2lPZTBtZ2l3WEFrR2t4dmFlS0xOU3AwSDA0LXh3d0hjS2hkRUFBQUFBJCQAAAAAAAAAAAEAAAAsLQLCv9Ww125pY2UwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAfjgF0H44Bddm; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; delPer=0; PSINO=1; BDRCVFR[feWj1Vr5u3D]=I67x6TjHwwYf0; H_PS_PSSID=1438_21127_18560_29522_29720_29568_29221_22159; locale=zh; Hm_lvt_64ecd82404c51e03dc91cb9e8c025574=1568728202,1568728208,1569819081,1569834485; Hm_lpvt_64ecd82404c51e03dc91cb9e8c025574=1569834485; __yjsv5_shitong=1.0_7_81f3894b80952667326f8bc7678a7b15ea68_300_1569834485000_43.243.148.219_3029b54a; yjs_js_security_passport=1c8e7350a4a786aba3434633ccbccb8e365beeb8_1569834491_js; to_lang_often=%5B%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%2C%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%5D; from_lang_often=%5B%7B%22value%22%3A%22en%22%2C%22text%22%3A%22%u82F1%u8BED%22%7D%2C%7B%22value%22%3A%22zh%22%2C%22text%22%3A%22%u4E2D%u6587%22%7D%5D"
}
data = {
    "from": "en",
    "to": "zh",
    "query": "",  # query is what we want to translate
    "transtype": "translang",
    "simple_means_flag": "3",
    "sign": "",  # sign is a change that requires us to execute js code to get
    "token": "e1ee8dc91e3ad02a0a5b5851ede622bd"  # token has not changed
}

Two Analytical sign Parameters-JS Reverse

Global Search sign

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_17-35-04.png"/>

Break the breakpoint and adjust the mode.

We find that sign parameters are generated by y() functions.

Then we hover over y with the mouse, and the position of the current function appears. Then we click in and we can jump around.

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_17-38-58.png"/>

The y function we see after the jump is:

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_17-43-04.png"/>

Copy this piece of code

Let's create a new JS file, handle_baidu_translata.js

function e(r) {
        var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
        if (null === o) {
            var t = r.length;
            t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
        } else {
            for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
                "" !== e[C] && f.push.apply(f, a(e[C].split(""))),
                C !== h - 1 && f.push(o[C]);
            var g = f.length;
            g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
        }
        var u = void 0
          , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
        u = null !== i ? i : (i = window[l] || "") || "";
        for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
            var A = r.charCodeAt(v);
            128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
            S[c++] = A >> 18 | 240,
            S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
            S[c++] = A >> 6 & 63 | 128),
            S[c++] = 63 & A | 128)
        }
        for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
            p += S[b],
            p = n(p, F);
        return p = n(p, D),
        p ^= s,
        0 > p && (p = (2147483647 & p) + 2147483648),
        p %= 1e6,
        p.toString() + "." + (p ^ m)
    }

The file can then be executed by a compiler (which can execute js) or by using the module PyExecJS in Python to execute JS code.

Execute js using PyExecJS

import execjs

query = 'Hello spider'
with open('handle_baidu_translata.js', 'r', encoding='utf-8') as f:
    ctx = execjs.compile(f.read())

sign = ctx.call('e', query)
print(sign)

Exception information:

execjs._exceptions.ProgramError: ReferenceError: i is not defined

<img src=" pycapux74.bkt.clouddn.com/blog/img20190522093757502 (1).png"/>

We continue to debug the browser with breakpoints to see how i values are generated.

In line 6819, we found that I u = null!= i? I: (i = window [l] | (")| |""; I was generated through the windows of the browser, and we could not generate this I if we did not execute js through the browser, but after many attempts, we found that the I value here is unchanged, all very good operation. Just declare I.

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_18-08-06.png"/>

We added it on the first line of js

var i = "320305.131321201"

Reexecution, result

<img src=" pycapux74.bkt.clouddn.com/blog/img20190522095508991.png"/>

Then we try to execute the function again, and an error prompt n is undefined. So repeat the technique, find n, find a function, go into the function to buckle down the whole code and copy to the front of the js code.

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_18-12-51.png"/>

We run it again, and we have generated the sign value.

<img src=" pycapux74.bkt.clouddn.com/blog/imgSnipaste_2019-09-30_18-14-22.png"/>

Three Edit Project Codes

1. Ready js code

handle_baidu_translata.js

var i = "320305.131321201"
function n(r, o) {
    for (var t = 0; t < o.length - 2; t += 3) {
        var a = o.charAt(t + 2);
        a = a >= "a" ? a.charCodeAt(0) - 87 : Number(a),
            a = "+" === o.charAt(t + 1) ? r >>> a : r << a,
            r = "+" === o.charAt(t) ? r + a & 4294967295 : r ^ a
    }
    return r
}
function e(r) {
     var o = r.match(/[\uD800-\uDBFF][\uDC00-\uDFFF]/g);
     if (null === o) {
         var t = r.length;
         t > 30 && (r = "" + r.substr(0, 10) + r.substr(Math.floor(t / 2) - 5, 10) + r.substr(-10, 10))
     } else {
         for (var e = r.split(/[\uD800-\uDBFF][\uDC00-\uDFFF]/), C = 0, h = e.length, f = []; h > C; C++)
             "" !== e[C] && f.push.apply(f, a(e[C].split(""))),
             C !== h - 1 && f.push(o[C]);
         var g = f.length;
         g > 30 && (r = f.slice(0, 10).join("") + f.slice(Math.floor(g / 2) - 5, Math.floor(g / 2) + 5).join("") + f.slice(-10).join(""))
     }
     var u = void 0
         , l = "" + String.fromCharCode(103) + String.fromCharCode(116) + String.fromCharCode(107);
     u = null !== i ? i : (i = window[l] || "") || "";
     for (var d = u.split("."), m = Number(d[0]) || 0, s = Number(d[1]) || 0, S = [], c = 0, v = 0; v < r.length; v++) {
         var A = r.charCodeAt(v);
         128 > A ? S[c++] = A : (2048 > A ? S[c++] = A >> 6 | 192 : (55296 === (64512 & A) && v + 1 < r.length && 56320 === (64512 & r.charCodeAt(v + 1)) ? (A = 65536 + ((1023 & A) << 10) + (1023 & r.charCodeAt(++v)),
             S[c++] = A >> 18 | 240,
             S[c++] = A >> 12 & 63 | 128) : S[c++] = A >> 12 | 224,
             S[c++] = A >> 6 & 63 | 128),
             S[c++] = 63 & A | 128)
     }
     for (var p = m, F = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(97) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(54)), D = "" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(51) + ("" + String.fromCharCode(94) + String.fromCharCode(43) + String.fromCharCode(98)) + ("" + String.fromCharCode(43) + String.fromCharCode(45) + String.fromCharCode(102)), b = 0; b < S.length; b++)
         p += S[b],
             p = n(p, F);
     return p = n(p, D),
         p ^= s,
     0 > p && (p = (2147483647 & p) + 2147483648),
         p %= 1e6,
     p.toString() + "." + (p ^ m)
 }

2. Execute the python code of js

#!/usr/bin/env python 
# -*- coding: utf-8 -*- 

import execjs

query = 'Hello spider'
with open('handle_baidu_translata.js', 'r', encoding='utf-8') as f:
    ctx = execjs.compile(f.read())

sign = ctx.call('e', query)
print(sign)

3 Complete the crawler and parse the data

Data is converted directly to json data, response.json()

<img src=" pycapux74.bkt.clouddn.com/blog/img2019052210373311.png"/>

dst is the result of translation

Fourth Summary

This is the end of the JS reverse process of the translation. The analysis process and steps are quite clear. If you help, please click on the praise.

Project source address[ https://github.com/Alva-Sian/...](

Tags: Python JSON Windows encoding

Posted on Tue, 08 Oct 2019 20:42:45 -0700 by icm