Using Python, it is found that more than 60% of girls' cups are B, but A's clothes are versatile and high-grade

Recently, I often hear that girls' A cup is versatile and high-grade!

Today, let's take the general sales situation of a bra brand in Jingdong Mall in different sizes to see what size is the mainstream at present!

catalogue

1. Demand sorting

2. Data acquisition

3. Statistical display

3.1. cup distribution

3.2. color distribution

4. That's it

5. Python learning resources

1. Demand sorting

Many people learn Python and don't know where to start.

Many people learn to look for python，After mastering the basic grammar, I don't know where to start.

Many people who may already know the case do not learn more advanced knowledge.

These three categories of people, I provide you with a good learning platform, free access to video tutorials, e-books, and the source code of the course!

QQ Group: 101677771

Welcome to join us and discuss and study together

This paper is relatively simple. It simply collects the commodity comments of different sizes of a bra brand with the largest number of comments from JD, and then counts the proportion of different sizes.

Since JD has no similar sales volume (or how many people pay), we only use the number of comments as the comparison dimension. We won't introduce how to obtain the number of comments here.

By selecting underwear bra suitable for young people in Jingdong, and then sorting according to the number of comments, we can get the top commodity list. Since the first two are all size-free, and the third is a bra laundry bag (also size-free), we chose the fourth product.

Looking for target brands

Then, we directly click to enter the details page of the fourth commodity and find that there are many 7 colors and 10 sizes, which is a little more.

In order to better obtain the comment data of each product, we need to obtain the product id of each product first. Therefore, we F12 entered the developer mode, searched one of the commodity IDs on the element page, and finally found the place where all the commodity IDs are stored as follows: (it can be parsed through regular analysis)

color&size

Now that you can get all the product IDS, you can call the comment interface through the product id to get the comment data of the corresponding product. Let's start coding!

2. Data acquisition

In the data collection part, first use regular to obtain all the product IDS, and then obtain the comment data corresponding to all the product IDS through the product id, then the required data will be alive.

Get all product IDS

import requests
import re
import pandas as pd

headers = {
    # "Accept-Encoding": "Gzip",  # Use gzip to compress and transfer data for faster access
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win sixty-four; x sixty-four) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.97 Safari/537.36",
    # "Cookie": cookie,
    "Referer": "https://item.jd.com/"
    }

url=  r'https://item.jd.com/100003749316.html'    
r = requests.get(url, headers=headers, timeout=6)

text = re.sub(r'\s','',r.text)
colorSize = eval(re.findall(r'colorSize:(\[.*?\])', text)[0])
df = pd.DataFrame(colorSize)

Get comment data corresponding to product id

# Get comment information
def get_comment(productId, proxies=None):
    # time.sleep(0.5)
    url = 'https://club.jd.com/comment/skuProductPageComments.action?'
    params = {
            'callback': 'fetchJSON_comment98',
            'productId': productId,
            'score': 0,
            'sortType': 6,
            'page': 0,
            'pageSize': 10,
            'isShadowSku': 0,
            'fold': 1,
            }
    # print(proxies)
    r = requests.get(url, headers=headers, params=params, 
                     proxies=proxies, 
                     timeout=6)
    comment_data = re.findall(r'fetchJSON_comment98\((.*)\)', r.text)[0]
    comment_data = json.loads(comment_data)
    comment_summary = comment_data['productCommentSummary']
    
    return sum([comment_summary[f'score{i}Count'] for i in range(1,6)])
    
df_commentCount = pd.DataFrame(columns=['skuId','commentCount'])
proxies = get_proxies()
for productId in df.skuId[44:]:
    df_commentCount = df_commentCount.append({
                        "skuId": productId,
                        "commentCount": get_comment(productId, proxies),
                        },
                        ignore_index=True
                        )

df = df.merge(df_commentCount,how='left')

3. Statistical display

Let's start with ABC in size The cup parts are listed separately

df['cup'] = df['size'].str[-1]

Let's start our simple statistical presentation

Let's first look at the overview of data information

>>> df.info()
    
<class 'pandas.core.frame.DataFrame'>
Int64Index: 64 entries, 0 to 63
Data columns (total 5 columns):
 #   Column        Non-Null Count  Dtype 
---  ------        --------------  ----- 
 0   size            64 non-null     object
 1   skuId         64 non-null     object
 2   colour            64 non-null     object
 3   commentCount  64 non-null     object
 4   cup           64 non-null     object
dtypes: object(5)
memory usage: 3.0+ KB

3.1. cup distribution

However, the data we collected are only divided into three types: A-B-C cup..

cupNum = df.groupby('cup')['commentCount'].sum().to_frame('quantity')
cupNum

cup	quantity
A	6049
B	11618
C	4076

import matplotlib.pyplot as plt
from matplotlib import font_manager as fm

plt.rcParams['font.sans-serif'] = ['Microsoft YaHei']
plt.rcParams['axes.unicode_minus'] = False

labels = cupNum.index
sizes = cupNum['quantity']
explode = (0, 0.1, 0) 

fig1, ax1 = plt.subplots(figsize=(6,5))
patches, texts, autotexts = ax1.pie(sizes, explode=explode, labels=labels, autopct='%1.1f%%',
                                    shadow=True, startangle=90)
ax1.axis('equal') 

# Reset font size
proptease = fm.FontProperties()
proptease.set_size('large')
plt.setp(autotexts, fontproperties=proptease)
plt.setp(texts, fontproperties=proptease)
ax1.set_title('cup distribution')
plt.show()

cup distribution

We can see that up to 53.4% of buyers are B-cup, followed by A-cup, accounting for 27.8%.

3.2. color distribution

colorNum = df.groupby('colour')['commentCount'].sum().to_frame('quantity')
colorNum

colour	quantity
Light skin	3627
Light blue grey	3058
Light silver grey	3837
white	1439
Lotus root powder	8286
Wine red	1429
black	67

We can see that lotus root powder is the most and far ahead, followed by light silver gray, light skin and light blue.

color distribution

The following are the lotus root powder colors accounting for up to 38.1%

Lotus root powder color: from Jingdong

4. That's it

We see the most 34/75B, 34 is the English code, 75 can be understood as the lower chest circumference (in fact, 34 and 75 here can be understood as the same meaning), and B is cup.

For cup and bust comparison table, refer to:

The above is all the content of this time. The sample size is small. It is not exquisite. It is only for entertainment!

Keywords: Python

Added by sam_h on Tue, 18 Jan 2022 11:15:28 +0200

Programming VIP

Using Python, it is found that more than 60% of girls' cups are B, but A's clothes are versatile and high-grade

1. Demand sorting

2. Data acquisition

3. Statistical display

3.1. cup distribution

3.2. color distribution

4. That's it

Popular Keywords