Using Python to uncover how the melon eaters think of Luo Zhixiang's incident

Preface

Recently, the entertainment industry can be said to be bustling. Before, there was a bully president who loved Xiao San, the palace was torn and forced to be a red girl in the net, and after that, there was a sunny boy named Show Luo, who was robbed of his reputation by Zhou Yangqing. It's hard for ordinary people to understand the love story of Guiquan. As Jia Xuming and Zhang Kang said in this cross talk, the love in entertainment circle is always on and off, which makes it a conversation resource for ordinary people. People outside the city want to go in, and people in the city can really play.

Various versions of wash white, rumors flying everywhere, eat melon netizens are how to look at it?

Speaking with data is the meaning of data workers. The whole process of data analysis is divided into three steps:

Data acquisition
Data preprocessing
Data visualization and data analysis

The following are the specific steps and code implementation:

Data acquisition

Data acquisition address:

'http://ent.163.com/20/0423/09/FASTLQ7I00038FO9.html'

Before crawling the comment data, we need to press F12 to analyze the comment data web page. It can be found that there are 172 pages in total. The offset starts from 0, and each additional page offset increases by 30, which can be obtained by get method.

Core code:

headers = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Safari/537.36'}
# Comment address
url="http://comment.api.163.com/api/v1/products/a2869674571f77b5a0867c3d71db5856/threads/FASTLQ7I00038FO9/comments/newList?ibc=newspc&limit=30&showLevelThreshold=72&headLimit=1&tailLimit=2&offset={}"
# Cyclic crawling
df = pd.DataFrame(None)
i = 0
while True:
    ret = requests.get(url.format(str(i*30)), headers=headers)
    text = ret.text
    result = json.loads(text)
    t = result['comments'].values()
    s = json_normalize(t)
    i += 1
    if len(s) == 0:
        print("End of crawling")
        break
    else:
        df = df.append(s)
        print("The first{}Page crawl completed".format(i))

df.to_csv('data.csv')

Data display

Data preprocessing

Data preprocessing is a very important part before data visualization. Including data reading, comment de duplication, data format conversion, etc

import pandas as pd
#data fetch
df = pd.read_csv('data.csv')
# Comment de duplication
df=df.drop_duplicates('commentId').reset_index(drop=True)
#format conversion
df['new_time'] = df.apply(lambda x : x['createTime'].split(':',1)[0],axis=1)

Data analysis and visualization

1. Event concern index

The push time from Zhou Yangqing's microblog, headlines and other platforms is 9:00 on April 23, which is accurate enough to show that there has been a period of preparation for the divorce declaration, which is not impulsive. From the point of view of the content sent, the statement started with a bit of teasing, the knife saw blood behind it, and the writing was also polished a lot. After the incident, according to the attention index, the review index reached a peak at 10 points on the 23rd, and then the review gradually decreased.

2. Analysis of netizens' comments

From the cloud map of Ci, we can see that many people fought against injustice for Zhou Yangqing, followed him for nine years, and finally made such an end. Across the screen, you can feel the public's anger at Luo Zhixiang's character, and of course, you can also watch the sound of melon eaters. Artists are public figures. Every word and deed will have a profound impact on the society, especially the youth groups. Character is not only the character under the camera, but also the character in life. "Starting from beauty, falling into talent and being loyal to human character" is the public's praise for Hu Ge, and also the expectation of the common people for literary and art workers. This is where Luo Zhixiang wants to correct.

Core code

import jieba.analyse
import os 
from pyecharts.charts import WordCloud
from pyecharts.globals import SymbolType, ThemeType 
from pyecharts.charts import Page
from pyecharts import options as opts
def get_comment_word(df): 
    # Collective storage-Duplicate removal
    stop_words = set()  
    print(stop_words)

    # Load stop words
    cwd = os.getcwd() 
    stop_words_path = cwd + '/stop_words.txt'
    print(stop_words_path)

    with open(stop_words_path, 'r', encoding="ISO-8859-1") as sw:
        for line in sw.readlines():
            stop_words.add(line.strip()) 
    print(stop_words)

    # Merge comment information
    df_comment_all = df['content'].str.cat() 

    # Use TF-IDF Algorithm extraction keywords
    word_num = jieba.analyse.extract_tags(df_comment_all, topK=300, withWeight=True, allowPOS=())
    print(word_num)
    # One step filtering
    word_num_selected = []

    # Filter out stop words
    for i in word_num:
        if i[0] not in stop_words:
            word_num_selected.append(i) 
        else:
            pass 

    return word_num_selected

3. Where do the message loving melon eaters come from

From the above chart, we can see that the netizens of Guangzhou, Shenzhen and Shanghai rank in the top three in terms of message. The people who like to comment are basically in the first tier cities and quasi first tier cities. On the one hand, due to the large population, on the other hand, the information flow in these cities is relatively fast.

Core code

from snapshot_selenium import snapshot as driver
from pyecharts import options as opts
from pyecharts.charts import Bar
from pyecharts.render import make_snapshot
df = df.groupby(['user.location']).agg({'Serial number':'count'}).reset_index()
df.rename(columns={'place':'user.location'}, inplace=True)
df = df[~df['user.location'].isin(['Shanghai','China','From Mars','Mars'])]
df = df.sort_values(['Serial number'],axis = 0,ascending = False)
df_gb_top = df[:15]

def bar_chart() -> Bar:
    c = (
        Bar()
        .add_xaxis(list(df_gb_top['user.location']))
        .add_yaxis("Write a comment Top15 Regions of", list(df_gb_top['Serial number']))
        .reversal_axis()
        .set_series_opts(label_opts=opts.LabelOpts(position="right"))
        .set_global_opts(title_opts=opts.TitleOpts(title="Ranking List"))
    )
    return c

If you want to learn Python or are learning python, there are many Python tutorials, but are they up to date? Maybe you have learned something that someone else probably learned two years ago. Share a wave of the latest Python tutorials in 2020 in this editor. Access to the way, private letter small "information", you can get free Oh!

Keywords: Python Windows JSON Lambda

Added by r_a_s_robin on Sat, 02 May 2020 03:07:36 +0300

Programming VIP

Using Python to uncover how the melon eaters think of Luo Zhixiang's incident

Popular Keywords