Data analysis practical project - eggshell apartment complaint analysis

Abstract: due to the epidemic this year, the storm of long-term rental apartments has accelerated, and various negative news has frequently spread from my eggshell apartment. Until the contract expires in early October, I dare not renew the lease and check out according to the contract! The deposit refund process is returned to the APP first and then withdrawn. The APP shows that the deposit has arrived in the account within 14 working days. As of 2020-11-10 (2020-11-07 rent refund and reflection) before the press release, the customer service has not received the payment for one month, and the customer service has not been able to contact. As a last resort, black cat and 12315 complained. During the complaint process, more than 25000 complaints were found, So I crawled down and analyzed the complaints about eggshell apartment from black cat, so I had this complete data analysis practical project, from data acquisition to simple analysis of data

If you encounter the same problem, you can make a wave of complaints Black cat complaint 12315 complaints

1, Data capture

import requests,time
import pandas as pd
import numpy as np
requests.packages.urllib3.disable_warnings()  # Mask https request certificate validation warning
from fake_useragent import UserAgent  # Generate random request header


# uid requests data, and the data format is relatively standardized, which is convenient for processing
def request_data_uid(req_s,couid,page,total_page):
    params = {
                'couid': couid, # Merchant ID
                'type': '1',
                'page_size': page * 10, # 10 entries per page
                'page': page,  # What page
                # 'callback':'jQuery11',
            }
    print(f"Crawling to No{page}page,total{total_page}Page, remaining{total_page-page}page")
    url = 'https://tousu.sina.com.cn/api/company/received_complaints'
    
    # Forged random request header
    header={'user-agent':UserAgent().random}
    res=req_s.get(url,headers=header,params=params, verify=False)
#     res = requests.get(url, params=params, verify=False)
    info_list = res.json()['result']['data']['complaints']
    result =[]
    for info in info_list:
        _data = info['main']
        
        # Date of complaint
        timestamp =float(_data['timestamp'])
        date = time.strftime("%Y-%m-%d",time.localtime(timestamp))
        
        # sn: complaint No. title: complaint problem appeal: complaint appeal summary: Problem Description
        data = [date,_data['sn'],_data['title'],_data['appeal'],_data['summary']]
        result.append(data)

    pd_result = pd.DataFrame(result,columns=["Date of complaint","Complaint number","Complaint problem","Complaint appeal","detailed description"])
    return pd_result


# keywords requests data, and the data format is relatively chaotic
# Purple Wutong, a company without ID, can only be retrieved by keywords.
# If the eggshell apartment has uid, you can also use keywods to request data

def request_data_keywords(req_s,keyword,page,total_page):
#     page =1
    params = {
                'keywords':keyword, # Search keywords
                'type': '1',
                'page_size': page * 10, # 10 entries per page
                'page': page,  # What page
                # 'callback':'jQuery11',
            }
    print(f"Crawling to No{page}page,total{total_page}Page, remaining{total_page-page}page")
    # url = 'https://tousu.sina.com.cn/api/company/received_complaints'
    url ='https://tousu.sina.com.cn/api/index/s?'
    
      # Forged random request header
    header={'user-agent':UserAgent().random}
    res=req_s.get(url,headers=header,params=params, verify=False)
#     res = requests.get(url, params=params, verify=False)
    info_list = res.json()['result']['data']['lists']
    result =[]
    for info in info_list:
        _data = info['main']
        
        # Date of complaint
        timestamp =float(_data['timestamp'])
        date = time.strftime("%Y-%m-%d",time.localtime(timestamp))
        
        # sn: complaint No. title: complaint problem appeal: complaint appeal summary: Problem Description
        data = [date,_data['sn'],_data['title'],_data['appeal'],_data['summary']]
        result.append(data)

    pd_result = pd.DataFrame(result,columns=["Date of complaint","Complaint number","Complaint problem","Complaint appeal","detailed description"])
    return pd_result


#Generate and maintain request sessions
req_s = requests.Session() 

# Eggshell apartment
result = pd.DataFrame()
total_page = 2507
for  page in range(1,total_page+1):
    data = request_data_uid(req_s,'5350527288',page,total_page)
    result = result.append(data)
result['Object of complaint']="Eggshell apartment"
result.to_csv("Eggshell apartment complaint data.csv",index=False)

# Keyword search of purple Wutong
# Eggshell apartment is brand name, and the name of the industry and commerce is purple Wutong Asset Management Co., Ltd.
result = pd.DataFrame()
total_page = 56
for  page in range(1,total_page+1):
    data = request_data_keywords(req_s,'Wutong',page,total_page)
    result = result.append(data)
result['Object of complaint']="Wutong"
result.to_csv("Complaint data of purple Wutong.csv",index=False)

2, Cleaning drawing

import os,re
import pandas as pd
import numpy as np


# Data cleaning and handling of complaints caused by keywords crawling
data_path = os.path.join('data','Complaint data of purple Wutong.csv')
data =pd.read_csv(data_path)
pattern=r'[^\u4e00-\u9fa5\d]'
data['Complaint problem']=data['Complaint problem'].apply(lambda x: re.sub(pattern,'',x))
data.to_csv(data_path,index=False,encoding="utf_8_sig")


# Data merging
result = pd.DataFrame()
for wj in os.listdir('data'):
    data_path = os.path.join('data',wj)
    data =pd.read_csv(data_path)
    result = result.append(data)
result.to_csv("data/Combined eggshell complaint data.csv",index=False,encoding="utf_8_sig")
# Read data
data = pd.read_csv("data/Combined eggshell complaint data.csv")

# Filter the data up to yesterday to ensure the integrity of data by day
data = data[data.Date of complaint<='2020-11-09']
print(f"By 2020-11-09 Previously, black cat received a total of complaints related to eggshell apartment {len(data)} strip")
# Time distribution processing
_data=data.groupby('Date of complaint').count().reset_index()[['Date of complaint','Complaint number']]
_data.rename(columns={"Complaint number":"Number of complaints"},inplace = True)


# Sum of complaints before January 30, 2020
num1 = _data[_data.Date of complaint<='2020-01-30'].Number of complaints.sum()
data0 =pd.DataFrame([['2020-01-30 before',num1]],columns=['Date of complaint','Number of complaints'])
# Distribution of complaints from February 1, 2020 to February 21, 2020
data1=_data[(_data.Date of complaint>='2020-02-01')&(_data.Date of complaint<='2020-02-21')]

# 2020-02-21 ~ 2020-11-05
num2 = _data[(_data.Date of complaint>='2020-02-21')&(_data.Date of complaint<='2020-11-05')].Number of complaints.sum()

# From November 6, 2020 to November 9, 2020, this data is only collected until November 9, 2020
print(f"2020-11-06 Complaint volume of the day{_data[_data.Date of complaint=='2020-11-06'].iloc[0,1]}strip")
                            
data2=_data[(_data.Date of complaint>'2020-11-06')&(_data.Date of complaint<='2020-11-09')]


data3=pd.DataFrame([['2020-02-21 ~ 2020-11-05',num2]],columns=['Date of complaint','Number of complaints'])
new_data = pd.concat([data0,data1,data3,data2])
'''Configure drawing parameters'''
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams['font.sans-serif']=['SimHei']
plt.rcParams['font.size']=18
plt.rcParams['figure.figsize']=(12,8)
plt.style.use("ggplot")
new_data.set_index('Date of complaint').plot(kind='bar') # Excluding the data from November 6, 2020, 24093

Before January 30, 2020, the number of complaints was normal, occasionally one or two. In February, due to the epidemic, the number of complaints increased greatly, possibly due to the failure of cleaning due to the epidemic, rental subsidies due to the epidemic, as well as the tension of tenants brought by negative news such as long-term apartment thunderstorm and eggshell bankruptcy.

The number of complaints from February 21, 2020 to November 05, 2020 is normal, slightly more than that before January 30, 2020, and still within the acceptable range of normal operation

On November 6, 2020, more than 24000 complaints were suddenly added. The impact of outliers was shown and eliminated separately. I went to check the news to see if there were any major events. The results were true. According to 36 krypton's report, on November 6, 2020, the affiliated company of eggshell apartment said that the executee's execution target exceeded 5.19 million yuan

Since then, the complaints of 7, 8 and 9 eggshells in the black cat have increased by 2-300 every day. It seems that the official rumor of eggshell bankruptcy is nonsense. Maybe it is not a rumor. Maybe it is not groundless that eggshells reappear ofo and queue up to collect debts

The above or just the complaint data obtained from the black cat, how many users who have no way to complain and think they are unlucky?

Next, let's take a look. What are the main complaints of users? What are the main demands?

3, Word cloud generation

import jieba# Word segmentation module
import re
import collections
import PIL.Image as img# pip install PIL
from wordcloud import WordCloud
import PIL.Image as img# pip install PIL
from wordcloud import WordCloud

# The detailed description of the complaint will be segmented after merging
all_word=''
for line in data.values:
    word = line[4]
    all_word = all_word+word

# jieba participle
result=list(jieba.cut(all_word))

# Cloud picture of complaints
wordcloud=WordCloud(
    width=800,height=600,background_color='white',
    font_path='C:\\Windows\\Fonts\\msyh.ttc',# If there are Chinese characters, the parsed dictionary needs to be loaded
    max_font_size=500,min_font_size=20
).generate(' '.join(result))
image=wordcloud.to_image()
# image.show()# Generate picture display
wordcloud.to_file('Eggshell apartment complaint details.png')# Generate file presentation locally


# Word segmentation after merging complaint titles
all_word=''
for line in data.values:
    word = line[2]
    all_word = all_word+word

# jieba participle
result=list(jieba.cut(all_word))

# Generate word cloud
# Cloud picture of complaints
wordcloud=WordCloud(
    width=800,height=600,background_color='white',
    font_path='C:\\Windows\\Fonts\\msyh.ttc',# If there are Chinese characters, the parsed dictionary needs to be loaded
    max_font_size=500,min_font_size=20
).generate(' '.join(result))
image=wordcloud.to_image()
# image.show()# Generate picture display
wordcloud.to_file('Eggshell apartment complaints.png')# Generate file presentation locally

# Word segmentation after merging complaints and appeals
all_word=''
for line in data.values:
    word = line[3]
    all_word = all_word+word

# jieba participle
result=list(jieba.cut(all_word))

# Generate word cloud
# Cloud picture of complaints
wordcloud=WordCloud(
    width=800,height=600,background_color='white',
    font_path='C:\\Windows\\Fonts\\msyh.ttc',# If there are Chinese characters, the parsed dictionary needs to be loaded
    max_font_size=500,min_font_size=20
).generate(' '.join(result))
image=wordcloud.to_image()
# image.show()# Generate picture display
wordcloud.to_file('Eggshell apartment complaints.png')# Generate file presentation locally

Word cloud chart of eggshell apartment complaint details

The details of the complaint can be seen. The main complaint problems are cash withdrawal (it should be the same problem as me, which is reflected in the deposit), activity cash return (how much money is returned every month, except for the normal cash return in the first two months, I didn't make the payment on time, and the customer service couldn't get through, so I didn't pay much attention), mainly customer service can't get in touch, cleaning problems, etc! Maybe face the problem directly, and there may not be so many complaints. The most unbearable thing is that except that it is easy to get through to the official 400 for the first time, there is basically no customer service in the back, and the whole process is fooled by electronic sound

Cloud picture of complaints and demands of eggshell apartment

The main appeal of the complaint users strongly demands that the eggshell apartment be punished accordingly And ask for refund and compensation

Word cloud picture of eggshell apartment complaints

Complaint problem, that is, the title of the complaint, It can also be reflected here. The main problems are cash injection and activity cash back, as well as some cleaning problems

Added by jmicozzi on Fri, 07 Jan 2022 03:21:06 +0200