1 line of code climb CSDN hot list, Python ha beer style writing

Eraser, a funny senior Internet bug

Project background

Group Friends: sister wipe, how many lines of code can CSDN hot list data climb at least?
Sister wipe: it's estimated to be 10.
Group Friends: oh baby, show me your code!

This is how the project needs to climb the CSDN hot list with the least number of lines of code.

The import module can not be included in the number of code lines.

When you see the end, please type your feelings in the comment area

Reptile analysis

Target to climb: https://blog.csdn.net/rank/list

Data interface, request to obtain the top 100 data of the hot list twice.

https://blog.csdn.net/phoenix/web/blog/hotRank?page=0&pageSize=50
https://blog.csdn.net/phoenix/web/blog/hotRank?page=1&pageSize=50

Data return format: JSON

{
  "code": 200,
  "message": "success",
  "data": [
    # Actual data
  ]
}

After analysis, start coding. The complete set of code can directly use the requests library.

Hot list reptile

For such simple code, first write a basic crawler to obtain the data, and then optimize it.

import requests
import json

for i in range(2):
    headers = {
        "user-agent": "Baiduspider"
    }
    res = requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50", headers=headers)
    data = res.json()
    if data["code"] == 200:
        data = data["data"]

    with open(f"{i}.json", "w+", encoding="utf-8") as f:
        f.write(json.dumps(data))

When running the code, two json files will be generated in the code directory. There are 50 pieces of data in each file, that is, all the data of the hot list.

The above codes total 12 lines. Next, start with the inner volume to shorten the number of code lines.

Declaration of abbreviated variables

import requests
import json
for i in range(2):
    json_data = requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50",headers={"user-agent": "Baiduspider"}).json()
    if json_data and json_data["code"] == 200:
        with open(f"{i}.json", "w+", encoding="utf-8") as f:
            f.write(json.dumps(json_data))

Make a simple arrangement and reduce it from 12 lines to 7 lines, making a slight progress.

Then combine the json and requests at the beginning of the code into one line, and reduce the code to 6 lines.

Add generator code

Replace the loop part with a generator and simplify the code again. In this step, there is only one line of code.

import requests, json
for i, data in enumerate([requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50",headers={"user-agent": "Baiduspider"}).json() for i in range(2)]):
    with open(f"{i}.json", "w+", encoding="utf-8") as f:
        f.write(json.dumps(data))

Please ignore the automatic line folding code. There are 4 lines left at present.

Continue to optimize and remove the line break and json module.

Line breaks are also removed

import requests
for i, data in enumerate([requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50", headers={"user-agent": "Baiduspider"}).text for i in range(2)]):
    with open(f"{i}.json", "w+", encoding="utf-8") as f: f.write(data)

After sorting according to the above code, there are only three lines of code left at this time.

Is this the limit? Can't we write code that normal people can't understand?

1 line of code final version

Use the simplest knowledge to achieve the most exciting effect. In order to minimize the number of lines of code, I wrote the following version.

import requests
with open("file.json", "a+", encoding="utf-8") as f: [f.write(my_str + "\n") for my_str in [
    requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50",
                 headers={"user-agent": "Baiduspider"}).text for i in range(2)]]

You are right. Except for module import, there is only one line of code.

After this line of code is expanded, it looks like the following, a long line, and it catches the hot list 100 data.

with open("file.json", "a+", encoding="utf-8") as f: [f.write(my_str + "\n") for my_str in [requests.get(f"https://blog.csdn.net/phoenix/web/blog/hotRank?page={i}&pageSize=50",headers={"user-agent": "Baiduspider"}).text for i in range(2)]]

In the end, you must have a lot of question marks. Please type the word you want to say in the comment area. Maybe you can win the prize.

Lucky draw (2 copies are sent in total at present)

As long as the number of comments exceeds 50
Select a lucky reader at random
Reward 39.9 yuan, 100 cases of crawler column, 10% off, one coupon, only 3.99 yuan

Today is the 159th / 200th day of continuous writing.
Ask for praise, comment and collection.

Keywords: Python Java Big Data JSON crawler

Added by J@ystick_FI on Mon, 07 Feb 2022 10:09:55 +0200