π Personal profile
- π About the author: Hello, I'm Daniel π
- π Personal homepage: Hall owner a Niuπ₯
- π Support me: like π Collection β Leave a message π
- π£ Series column: python web crawlerπ
- π¬ Maxim: so far, all life is written with failure, but it doesn't prevent me from moving forward! π₯
preface
It's coming! It's coming! As a programmer, I can't translate English sentences, which I can't bear. I have to arrange scripts!!!
Baidu translation version (simple)
analysis
When you enter Baidu translation, F12 enters all of the network. When you write what you want to translate, you can see the link sug in all of the network, which is our interface url and the parameter is kw.
code
import requests post_url='https://fanyi.baidu.com/sug' headers={ 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36' } word = input('Please enter what you want to translate, which can be in various languages:') data = { 'kw': word } response = requests.post(url=post_url,data=data,headers=headers) dic_obj = response.json() #Convert json data into a dictionary print(dic_obj['data'][0]['v'])
result
Youdao translation version (difficult)
Analysis (js reverse)
F12 enters the developer mode and finds the following interface in xhr (where to find ajax requests) in the network.
Then let's look at the parameters:
The comparison between the two figures shows that i should be the sentence we want to translate. The green line is the parameters of different forms, which need us to deal with. It is a 13 bit timestamp. Salt means salt in English, and it is one more bit than the timestamp lts. The first 13 bits are the same, It should be a salted timestamp (for a string of numbers, you can add a string of numbers or strings and then encrypt them. In encryption, we call salting). These two parameters can be simulated separately in python. In order to avoid unnecessary trouble or some people won't, we can directly find their js statement later and execute js generation in python.
The sign here has 32 bits, which should be generated by some encryption algorithm. The most common ones are md5 and rsa encryption. Let's conduct a global search js reverse:
After searching, we found the old penyou md5 encryption and the generation method of parameters. In the figure, r in js is the timestamp, i in js is the salt timestamp, and sign is the string in parentheses encrypted with md5. We also need to analyze the generation of e, which can be found through interrupt debugging.
We can see that e is what we want to translate. Now the parameters are obvious. In fact, we can get the sign by calling the md5 encryption algorithm in the hashlib module in python, but here we don't need to increase the difficulty and practice js reverse. I put the js file of md5 encryption process directly extracted into the online disk. You can extract it yourself and use it in the code.
Link: https://pan.baidu.com/s/1aV1tEo35Oyw4TUExhJoXUA
Extraction code: waan
At the same time, in order to deal with anti crawling, we should add not only user agent, but also Cookie and Referer.
code
import requests import execjs #Module executing js statement import json import jsonpath class Youdao(): def __init__(self,msg): # url self.url = 'https://fanyi.youdao.com/translate_o?smartresult=dict&smartresult=rule' # headers self.headers = { 'User-Agent': 'Mozilla / 5.0(Windows NT 10.0;WOW64) AppleWebKit / 537.36(KHTML, likeGecko) Chrome / 91.0.4472.124Safari / 537.', 'Cookie': 'OUTFOX_SEARCH_USER_ID = -1032338096@10.169.0.102;OUTFOX_SEARCH_USER_ID_NCOO = 39238000.072458096;JSESSIONID = aaak-QLUNaabh_wFWK8Qx;___rl__test__cookies = 1626662199192', 'Referer': 'https://fanyi.youdao.com/' } self.msg = msg self.Formdata = None def js_Formdata(self): #time stamp r = execjs.eval('"" + (new Date).getTime()') #Timestamp salt i = r + str(execjs.eval('parseInt(10 * Math.random(), 10)')) ctx = execjs.compile(open('./youdao.js', 'r', encoding='utf-8').read()) sign = ctx.call('getsign', self.msg,i) #Call Youdao The getsign function in JS passes in the things to be translated and the salt timestamp. self.Formdata = { 'i': self.msg, 'from': 'AUTO', 'to': 'AUTO', 'smartresult': 'dict', 'client': 'fanyideskweb', 'salt': i, 'sign': sign, 'lts': r, 'bv': 'f46e446c6db49492797b7d03ea1e82da', 'doctype': 'json', 'version': '2.1', 'keyfrom': 'fanyi.web', 'action': 'FY_BY_REALTlME', } def response(self): resp = requests.post(url=self.url,data=self.Formdata,headers=self.headers).text data = json.loads(resp) #Convert json data into a dictionary #Using jsonpath to extract data if "translateResult" in data: k = jsonpath.jsonpath(data, '$..translateResult')[0][0][0]['tgt'] print(k) print("Other translation:") if "smartResult" in data: lst = jsonpath.jsonpath(data, '$..entries')[0] for k in lst[1:]: k = k.replace("\r\n", "") print(k) def main(self): #Formdata self.js_Formdata() #print(self.Formdata) #Send request and get response self.response() if __name__ == '__main__': msg = input('Please enter the word or sentence you want to translate:') youdao = Youdao(msg) youdao.main()
result
epilogue
If you think the blogger writes well, give it to the third company!!! πππ