Python data analysis -- sales data analysis of second-hand house of a real estate website

1, Process thinking of data analysis

1. Clarify the purpose and thinking of the analysis / put forward the hypothesis

2. Data collection

3. Data processing / collation

4. Data analysis / validation assumptions

5. Data presentation / visualization chart

6. Report writing

2, Analysis purpose

  1. Demand 01: unit price of second-hand house per square meter (total price, monthly average price)
  2. Demand 02: sum of housing areas in each region, sorted in descending order
  3. Demand 03: according to the weekly analysis, is the number of second-hand housing transactions in Beijing rising or declining, or basically unchanged?
  4. Demand 04: according to weekly analysis, how about the trend of the average unit price of second-hand housing transactions every week?
  5. Demand 05: average listing cycle by Region / community / Region

3, Data collection

The data of a real estate website has been stored in the csv file.

4, Data processing

4.1 introducing python data analysis library

import numpy as np #For general data operations
import pandas as pd  #Used for data analysis, including data introduction, feature extraction, data cleaning and transfer, etc
import matplotlib as mpl #Visualization for data
import matplotlib.pyplot as plt #Convenient and fast drawing of 2D chart

4.2 setting Chinese support for drawing

mpl.rcParams["font.family"] = "SimHei" #Set font
mpl.rcParams["axes.unicode_minus"]=False # Used to display negative sign normally
plt.rcParams['font.sans-serif']=['SimHei'] # Used to display Chinese labels normally

% matplotlib inline #Display the map used for matplotlib drawing in the page instead of a pop-up window

4.3 read data

lianjia = pd.read_csv("XXXXXXXX.csv",  encoding="utf-8",  sep="\t") # Read csv file
pd.set_option("max_colwidth", 60)     # Set each field to display up to 60 characters
pd.set_option("max_columns", 50)      # Set each dataframe to display 50 fields
lianjia.head(3) #View the first three lines

Result:

Transaction price (10000) transaction time, residential unit type building area, listing price (10000), transaction cycle (day), price adjustment (time), attention (person) browsing (sub chain family number, transaction ownership, listing time, house purpose, house age, house type, floor unit type structure, Suite area (M2), building type towards the year of completion Decoration condition; building structure; heating mode; proportion of elevator households; years of ownership; equipped with elevator
 0. Daxing NaN Nan NaN Nan NaN Nan NaN Nan NaN Nan NaN Nan NaN Nan NaN Nan
 1 297, 2019-10-29, transaction: Room 3, hall 1, green cloud villa, 89.95, 300, 608, 1.0, 4.0, 43.0, 6424, 1.01E+11, commercial house, 2018 / 3 / 1, ordinary residence, two years in total, non shared, room 3, hall 1, kitchen 1, bathroom 1, middle floor (nine floors in total), no data at present, 73.63, Tower North and south, 2014, hardbound steel concrete structure, central heating, one ladder, two households, 70, NaN Nan NaN
 2 366, 2019-10-29, transaction: sanyangli, 2 rooms, 1 hall, 89.79, 368, 31, 0.0, 3.0, 4.0, 118, 1.01E+11, commercial housing, 2019 / 9 / 29, ordinary residence, two years in total, non shared, 2 rooms, 1 hall, 1 kitchen, 1 bathroom, middle floor (6 floors in total), flat floor, 80.44, plank building, South-North 2009, other steel concrete structure, central heating, one ladder, two households, 70, no NaN Nan NaN

View the overall structure of data

lianjia.info() #View the overall structure of data

Result:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 38393 entries, 0 to 38392
Data columns (total 32 columns):
Transaction price (10000) 38393 non null object
 Closing time 38379 non null object
 38379 non null object in the cell
 House type 38379 non null object
 Building area 38379 non null object
 Listing price (10000) 38379 non null object
 Closing period (days): 38379 non null object
 Price adjustment (Times) 38379 non null float64
 Take a look (Times) 38379 non null float64
 Attention (person) 38379 non null float64
 Browse (times 38379 non null object
 Chainer No. 38379 non null object
 Transaction ownership 38379 non null object
 Listing time 38379 non null object
 House use 38379 non null object
 House age 38379 non null object
 38379 non null object of the property right
 38379 non null object
 38379 non null object on the floor
 House type structure 38379 non null object
 Set area (㎡) 38379 non null object
 Building type 38379 non null object
 House facing 38379 non null object
 Built in 38379 non null object
 Decoration situation 38379 non null object
 Building structure 38379 non null object
 Heating mode 38379 non null object
 Scale of households 38379 non null object
 Property right period: 38379 non null object
 Equipped with elevator 38379 non null object
xx1        3022 non-null object
xx2        3022 non-null object
dtypes: float64(3), object(29)
memory usage: 9.4+ MB

4.4 data preprocessing

4.4.1 treatment area

There are more non empty data in transaction price than in other fields. After checking the original table, the region is stored in this field. A new large area field is added below.

lianjia["Large area"] = lianjia["Transaction price (10000 yuan)"]    # Assign the value of "transaction price (10000)" to the field of the region
lianjia[["Large area", "Transaction price (10000 yuan)"]].head(10) #View newly added fields
Regional transaction price (10000 yuan)
0. Daxing
1	297	297
2	366	366
3	226	226
4	548	548
5	245	245
6	254	254
7	193	193
8	280	280
9	347	347

Replace the number in the region field with the region name

lianjia["Large area"] = lianjia["Large area"].str.replace("-", "").replace("\d+", np.nan, regex=True) #Remove special symbols - replace numbers with nan
lianjia["Large area"].fillna(inplace=True, method="ffill") #Replace the null value with the previous area name
lianjia.dropna(axis=0, inplace=True, thresh=20) #Delete lines like large nan nan nan

# Move region field to first column
lianjia_daqu = lianjia["Large area"]
lianjia.drop("Large area", axis=1, inplace=True)
lianjia.insert(0, "Large area", lianjia_daqu)
display(lianjia[["Large area", "Closing time", "Residential area", "Apartment layout", "Built-up area"]].sample(10))#View results
        //Transaction time, unit type and building area of the community
6654	Fangshan	2019-08 Deal	Biguiyuan community zone 2	1room0office	58
15175	Mentougou	2019-09 Deal	Xinqiao Road community	    2room1office	51.47
29127	Westlife	2019-05-12 Deal	Dragon claw and Sophora Hutong	    2room1office	67.36
25210	Tongzhou	2019-06 Deal	Haitangwan phase I	    2room2office	91.07
1797	Daxing	2019-07 Deal	Liyuan C area	        3room2office	141.48
33204	Changping	2019.09.03	    New Dragon City	        2room1office	100.2
36054	Dongcheng	1905/7/11	    Qianmen East Street	    3room1office	69.79
5760	Chaoyang	2019-09-04 Deal	Fuli City D area	    2room1office	85.04
12756	Haidian	2019-09-30 Deal	Ding Hui Bei Li	        3room1office	82.44
32146	Yizhuang Development Zone	2017-03-07 Deal	Rong Jing Lido	    1room0office	41.44

4.4.2 processing xx1,xx2

From the previous info list, we can see that most of the two fields are missing values

display(lianjia["xx1"].unique())   # View the value of the xx1 after de duplication 
display(lianjia["xx2"].unique())   # View the value of the xx2 after de duplication
array([nan, '70', '40', '50', 'Unknown'], dtype=object)  
array([nan, 'Yes', 'nothing', 'No data'], dtype=object)
### Delete the two columns of xx1 and XX2
lianjia.drop(axis=1, columns=["xx1", "xx2"], inplace=True) #Delete the two columns of xx1 and XX2

All fields have no missing values

<class 'pandas.core.frame.DataFrame'>
Int64Index: 35357 entries, 1 to 35369
Data columns (total 31 columns):
Region 35357 non null object
 Transaction price (10000) 35357 non null object
 Closing time: 35357 non null object
 35357 non null object in the cell
 House type 35357 non null object
 Building area: 35357 non null object
 Listing price (10000) 35357 non null object
 Transaction period (days) 35357 non null object
 Price adjustment (Times) 35357 non null float64
 Take a look (Times) 35357 non null float64
 Attention (person) 35357 non null float64
 Browse (Times: 35357 non null object
 Chainer No. 35357 non null object
 Transaction ownership 35357 non null object
 Listing time: 35357 non null object
 House use 35357 non null object
 House age: 35357 non null object
 House ownership 35357 non null object
 House type 35357 non null object
 Floor 35357 non null object
 House type structure 35357 non null object
 Set area (M2) 35357 non null object
 Building type 35357 non null object
 House facing 35357 non null object
 Built in 35357 non null object
 Decoration situation 35357 non null object
 Building structure 35357 non null object
 Heating mode 35357 non null object
 Scale of ladder households: 35357 non null object
 Property right years: 35357 non null object
 Equipped with elevator 35357 non null object
dtypes: float64(3), object(28)
memory usage: 8.6+ MB

4.4.3 processing time

#View all fields related to date
display(lianjia[["Closing time", "Transaction period (days)", "Listing time"]].sample(10))
display(lianjia[["Closing time", "Transaction period (days)", "Listing time"]].dtypes)

# First, remove the "transaction" string from the "transaction time" field
lianjia["Closing time"] = lianjia["Closing time"].str.replace(" Deal", "")
# Unified time format
lianjia["Closing time"] = pd.to_datetime(lianjia["Closing time"])
lianjia["Listing time"] = pd.to_datetime(lianjia["Listing time"])
# Calculate closing cycle and convert to days
lianjia["Transaction cycle(new)"] = lianjia["Closing time"] - lianjia["Listing time"]
lianjia["Transaction period (days)"] = lianjia["Transaction cycle(new)"].dt.days
#Year and week of obtaining transaction time
lianjia["Transaction time (year)"] = lianjia["Closing time"].dt.year
lianjia["Closing time (week)"] = lianjia["Closing time"].dt.week

4.4.4 processing other fields

lianjia [["large area", "community", "house type", "building area"]]. loc[[30922, 32852, 8784, 31629]]
	    Building area of the residential area
 Area C, Lincoln Park Phase II, Yizhuang Development Zone, 30922--
32852 Changping first smart Club parking space 6.99 house structure
 8784 NAME of Fangshan palace garden--
31629 new Hainan Island NAME of Yizhuang Development Zone--
#Delete the row whose house type is parking space, do not need to analyze parking space, and the corresponding building area is not standardized
 lianjia.drop(lianjia[(lianjia ["house type"] = = "parking space")]. index, inplace=True)
#Delete "NAME" for house type, delete line with building area
 lianjia.drop(lianjia[(lianjia ["house type"] = = "" name? ") (Lianjia [" building area "] = =" -- ")]. index, inplace=True)
# Building area Chinese characters and spaces become empty and converted to floating-point values
lianjia["Built-up area"] = lianjia["Built-up area"].str.replace("[\s\u4e00-\u9fa5]", "", regex=True)
lianjia["Built-up area"] = lianjia["Built-up area"].astype(np.float32)
lianjia [["transaction price (10000)", "listing price (10000)", "price adjustment (time)", "show (time)", "attention (person)", "browse (time)]]. sample(10)
        Transaction price (10000) listing price (10000) price adjustment (Times) show (Times) pay attention to (person) browse (Times)
10648	290	        290	            0.0	    2.0	    8.0	     288
30125	363	        390	            0.0	    6.0	    14.0	 3680
 14244 648 680 0.0 0.0 0 0.0 no data temporarily
21619	209-214 	226	            1.0	    2.0	    3.0	     155
27061	327-334 	355	            1.0	    6.0	    132.0	 1423
4598	499	        480	            0.0	    27.0	42.0	 924
35070	805   	    850	            1.0	    24.0	51.0	 13856
5253	437	        437	            1.0	    46.0	82.0	 962
 1369 293-324 no data temporarily 0.0 0.0 0.0 no data temporarily
12051	586	        600	            1.0   	84.0	55.0	 5399
# The transaction price is similar to 293-324, taking the average of two figures
#The function splits by. If it is a number, it will return it directly. If it is two numbers, it will return after calculating the average value
def handle(value):
    values2 = str(value).split("-")
    if len(values2) == 1:
        return value
    else:
        result = (float(values2[0]) + float(values2[1])) / 2
        return str(result)

lianjia["Transaction price (10000 yuan)"] = lianjia["Transaction price (10000 yuan)"].map(handle)    #  Call function to map
lianjia["Transaction price (10000 yuan)"] = lianjia["Transaction price (10000 yuan)"].astype(np.float32)
#Handle
lianjia["Listing price (10000 yuan)"] = lianjia["Listing price (10000 yuan)"].str.replace("No data", "0")
lianjia["Browse (Times)"] = lianjia["Browse (Times)"].str.replace("No data", "0")
#shifting clause
lianjia["Listing price (10000 yuan)"] = lianjia["Listing price (10000 yuan)"].astype(np.float32)
lianjia["Transaction period (days)"] = lianjia["Transaction period (days)"].astype(np.float32)
lianjia["Browse (Times)"] = lianjia["Browse (Times)"].astype(np.float32)
lianjia[[ "Chain number", "Trading Right", "Housing use", "Housing life"]].sample(10)
        //Serial number of chain store, ownership of transaction right, housing purpose, housing years
30379	1.01E+11	Commercial housing	Ordinary residence	 No data
32736	1.01E+11	Commercial housing	Ordinary residence	 No data
4078	1.01E+11	Commercial housing	Ordinary residence	 Five years
9624	1.01E+11	Commercial housing	Ordinary residence	 Five years
7003	1.01E+11	Commercial housing	Ordinary residence	 Five years
31937	1.01E+11	Commercial housing	apartment	 No data
13796	1.01E+11	Commercial housing	Ordinary residence	 No data
5560	1.01E+11	Purchased public housing	Ordinary residence	 Five years
11346	1.01E+11	Commercial housing	Five years in general
598 	1.01E+11    Commercial housing	Ordinary residence	 Five years
lianjia[[ "Ownership of premises", "Apartment layout", "Floor", "Structure of apartment layout", "Inner area(㎡)"]].sample(10)
        //Unit area of unit structure on the floor where the house belongs to (㎡)
29047	Non co ownership	2room2office1kitchen1Wei	Low floor(common7layer)	Flat layer	No data
17853	Share	2room1office1kitchen1Wei	Top floor(common6layer)	    Flat layer	No data
29596	No data	2room1office1kitchen2Wei	Middle floor(common19layer)	Flat layer	No data
14038	Non co ownership	2room1office1kitchen1Wei	Tall building(common18layer)	No data	Temporarily numerous
27348	Non co ownership	1room0office1kitchen1Wei	Middle floor(common26layer)	No data	18.53
31512	Non co ownership	2room1office1kitchen1Wei	Middle floor(common6layer)	Flat layer	77.18
25027	Non co ownership	1room0office1kitchen1Wei	Low floor(common7layer)	No data	No data
687	    Non co ownership	1room1office1kitchen1Wei	Middle floor(common15layer)	Flat layer	46.22
3567	Share	3room1office1kitchen1Wei	Low floor(common6layer)	Flat layer	No data
15897	Non co ownership	1room1office1kitchen1Wei	Bottom(common6layer)	    Flat layer	No data

#Use the same unit type area to fill the inner area of the set
temp_df1 = lianjia[~lianjia["Internal area(㎡)"].str.contains("No time|build|number")][["Ownership of premises", "Apartment layout", "Floor",                                  "Structure of apartment layout", "Inner area(㎡)"]]
temp_df2 = lianjia[lianjia["Inner area(㎡)"].str.contains("No time|build|number")][["Ownership of premises", "Apartment layout", "Floor",                                    "Structure of apartment layout", "Inner area(㎡)"]]
temp_df2["Inner area(㎡)"] = temp_df2["Inner area(㎡)"].replace("No data", np.nan)
temp_df2["Inner area(㎡)"] = temp_df2["Inner area(㎡)"].replace("Temporarily numerous", np.nan)
temp_df2["Inner area(㎡)"] = temp_df2["Inner area(㎡)"].replace("\d+\s+.*", np.nan, regex=True)
lianjia_new5 = pd.concat((temp_df1, temp_df2))
lianjia[["Ownership of premises", "Apartment layout", "Floor",  "Structure of apartment layout", "Inner area(㎡)"]]  = lianjia_new5[["Ownership of premises",                             "Apartment layout", "Floor",  "Structure of apartment layout", "Inner area(㎡)"]]
lianjia[["Ownership of premises", "Apartment layout", "Floor",  "Structure of apartment layout", "Inner area(㎡)"]].tail(10)
        //Unit area of unit structure on the floor where the house belongs to (㎡)
35360	No data	2room1office1kitchen1Wei	Middle floor(common6layer)	Flat layer	NaN
35361	Non co ownership	2room2office1kitchen1Wei	Bottom(common6layer)	    Flat layer	NaN
35362	Non co ownership	3room1office1kitchen2Wei	Tall building(common21layer)	Flat layer	117.11
35363	No data	1room1office1kitchen1Wei	Bottom(common5layer)	    Flat layer	NaN
35364	Non co ownership	3room1office1kitchen1Wei	Tall building(common6layer)	Flat layer	84.11
35365	No data	3room1office1kitchen1Wei	Tall building(common6layer)	Flat layer	NaN
35366	Non co ownership	1room0office0kitchen1Wei	Low floor(common28layer)	Flat layer	NaN
35367	No data	1room0office0kitchen1Wei	Low floor(common28layer)	Flat layer	NaN
35368	Non co ownership	1room2office1kitchen1Wei	Tall building(common6layer)	Flat layer	NaN
35369	Non co ownership	2room2office1kitchen2Wei	Tall building(common6layer)	Flat layer	NaN
# Type conversion
lianjia["Inner area(㎡)"] = lianjia["Inner area(㎡)"].astype(np.float32)
# Use the average area of each type of house to replace
lianjia["Inner area(㎡)"] = lianjia["Inner area(㎡)"].fillna(
lianjia.groupby("Apartment layout")["Inner area(㎡)"].transform("mean"))
lianjia [["building type", "building orientation", "completion date", "decoration", "building structure]]. sample(10)
        Building type, building orientation, decoration, building structure
 28721 plank building north south 1960 simple installation mixed structure
 21679 plank building north south 2009 other steel concrete structure
 20617 board building east west 2004 simple installation mixed structure
 30381 plate building north south 2001 simple installation mixed structure
 32116 slab building south north 2013 simple steel concrete structure
 2539 slab building north south 2003 hardcover mixed structure
 15401 board building north south 1980 hardbound mixed structure
 21363 plate building south south 2012 simple steel concrete structure
 13273 plank building south north 1996 other mixed structure
 5942 plank building south northwest 2007 other steel concrete structure
 lianjia [["heating mode", "proportion of elevator households", "years of ownership", "equipped with elevator]]. sample(10)
        Heating mode, proportion of elevator households, years of ownership, equipped with elevator
 22469 central heating, one ladder, four households, 70 no
 29064 central heating, two ladders, seven households, 70 households
 23215 central heating, one ladder, three households, 70 no
 26982 central heating, one ladder, three households, 70 no
 3056 central heating, one ladder, three households, 70 households
 25895 central heating, one ladder, two households, 70 no
 33206 self heating, one ladder, four households, 70 no
 31214 self heating, two ladders, three households and 70 households
 7638 central heating, one ladder, nine households, 70 no
 23420 central heating, one ladder, three households, 70 no
 #Fill with 70 if the age of property right is unknown
 lianjia [years of ownership] = lianjia [years of ownership]. str.replace("unknown", "70")
lianjia [years of ownership] = lianjia [years of ownership]. astype(np.int32)
#Delete closing cycle (New) field
 lianjia.drop("closing cycle (New)", axis=1, inplace=True)

4.4.5 check the processed data

lianjia.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 34520 entries, 1 to 35369
Data columns (total 34 columns):
Region 34520 non null object
 Transaction price (10000) 34520 non null float32
 Closing time 34520 non null datetime64 [ns]
34520 non null object in the cell
 House type 34520 non null object
 Building area: 34520 non null float32
 Listing price (10000) 34520 non null float32
 Transaction cycle (days) 34520 non null float32
 Price adjustment (Times) 34520 non null float64
 Take a look (Times) 34520 non null float64
 Attention (person) 34520 non null float64
 Browse (times 34520 non null float32
 Chainer No. 34520 non null object
 Transaction ownership 34520 non null object
 Listing time 34520 non null datetime64 [ns]
House use 34520 non null object
 House age 34520 non null object
 House ownership 34520 non null object
 House type 34520 non null object
 Floor 34520 non null object
 House type structure 34520 non null object
 Inner area of sheath (M2) 34475 non null float32
 Building type 34520 non null object
 House facing 34520 non null object
 Built in 34520 non null object
 Decoration situation 34520 non null object
 Building structure 34520 non null object
 Heating mode 34520 non null object
 Scale of ladder households: 34520 non null object
 Term of ownership: 34520 non null int32
 Equipped with elevator 34520 non null object
 Closing cycle (New) 34520 non null timedelta64 [ns]
Transaction time (year) 34520 non null Int64
 Transaction time (week) 34520 non null Int64
dtypes: datetime64[ns](2), float32(6), float64(3), int32(1), int64(2), object(19), timedelta64[ns](1)
memory usage: 8.3+ MB
lianjia.sample(3)
Regional transaction price (10000) transaction time, residential building area, listing price (10000), transaction cycle (day), price adjustment (time), watch (time), pay attention (person), browse (sub chain family number, transaction right, listing time, house purpose, house age, house type, floor, household structure, suite area (M2), building type towards completion Decoration situation in s, building structure, heating mode, elevator household proportion, ownership period, elevator equipped transaction cycle (New) transaction time (year) transaction time (week)
17501 Mentougou 160.0 2017-12-20 room 1, room 2, Shuangyu Road community 53.779999 180.0 151.0 1.0 85.0 144.0 8739.0 1.01E+11 commercial housing 2017-07-22 ordinary residence two years old non shared room 2 room 1 hall 1 kitchen 1 bathroom ground floor (5 floors in total) flat floor 38.1500002 plank building south 1980 simple installation mixed structure central heating one ladder three households 70 no 151 days 2017 51
 1675 Daxing 203.0 2019-07-11 Kangtai Garden Room 1 hall 60.099998 203.0 492.0 1.0 62.0 192.0 10405.0 1.01E+11 commercial housing 2018-03-06 ordinary residence no data temporarily non shared room 1 hall 1 kitchen 1 bathroom ground floor (18 floors in total) flat floor 46.389999 board building north south 2009 simple steel concrete structure central heating one ladder two households 70 492 days 2019 28
 14601 Haidian 938.5 2019-08-01 today's home 4 rooms 1 hall 179.509995 1380.0 66.0 1.0 21.0 17.0 842.0 1.01E+11 commercial housing 2019-05-27 ordinary residence no data temporarily non shared 4 rooms 1 hall 1 kitchen 3 bathroom middle floor (9 floors in total) 156.351532 plank building south 2000 hardbound steel concrete structure central heating one ladder six households 70 66 days 2019 31

5, Analyze requirements

5.1 unit price of second-hand house per square meter

# Total building area
lianjia["Built-up area"].sum()
2961666.2
# Total transaction amount
lianjia["Transaction price (10000 yuan)"].sum()
15145877.0
# Calculate unit price per square meter
result = lianjia["Transaction price (10000 yuan)"].sum() / lianjia["Built-up area"].sum()
display(str(result) + "ten thousand")
'5.1139717 ten thousand'

5.2 total area of buildings in each region, in descending order

Result [DF = Lianjia. Groupby ("region") ["building area"]. agg({"building area": "sum"})
Result? DF = result? Df.sort? Values ("building area", ascending=False)
display(result_df)
	Built-up area
 Large area	
Yizhuang Development Zone 300565.843750
 Changping 289278.125000
 Shunyi 286023.375000
 Fangshan 259237.218750
 Daxing 253338.468750
 Tongzhou 251888.953125
 Chaoyang 247918.687500
 Haidian 238134.578125
 Mentougou 230544.812500
 Fengtai 229984.734375
 Xicheng 202130.656250
 Shijingshan 165606.250000
 Others (Pinggu Miyun huairouyanqing) 7014.509766

5.3 weekly changes in the number of second-hand housing transactions in Beijing?

result_df = lianjia.groupby(["Transaction time (year)","Closing time (week)"]).size()
display(result_df.loc[2019].head(60))
//Closing time (week)
1      459
2      110
3      144
4      157
5      283
6        1
7       46
8      104
9      620
10     190
11     176
12     151
13     171
14     690
15     270
16     286
17     292
18    1252
19     365
20     445
21     439
22    1642
23     418
24     470
25     486
26     651
27    1625
28     602
29     700
30     913
31    2432
32     660
33     733
34     769
35    2596
36     956
37    1007
38     994
39    1229
40    1541
41     861
42     988
43    1157
44     508
dtype: int64
year = 2019
mpl.rcParams["font.size"] = 12
plt.figure(figsize=(12,6))
plt.bar(result_df.loc[year].index, result_df.loc[year].values)
plt.xticks(result_df.loc[year].index)
plt.yticks(np.linspace(0, 2750, 20))
font = {"family":"Kaiti",
       "style":"oblique",
        "weight":"normal",
        "color":"green",
        "size": 20
       }
plt.xlabel("week", fontdict=font)
plt.ylabel("Transaction number", fontdict=font)
plt.grid(axis="y", color="g", ls=":", lw=1)
plt.title(str(year) + "Volume of second-hand housing transactions in Beijing", fontdict=font, color= "r")

There will be a big increase in trading volume every four weeks or so. It may be at the beginning of the month or the end of the month, which needs further exploration.

5.4 trend of unit average price of second-hand house transaction every week

result_df = lianjia.groupby(["Transaction time (year)","Closing time (week)"])[["Transaction price (10000 yuan)", "Built-up area"]].agg({"Transaction price (10000 yuan)":"sum", "Built-up area":"sum"})
result_df["Unit average price"] = result_df["Transaction price (10000 yuan)"] / result_df["Built-up area"]
display(result_df.loc[2019].head(60))
year = 2019
mpl.rcParams["font.size"] = 12
plt.figure(figsize=(12,6))
plt.plot(result_df.loc[year].index, result_df.loc[year]["Unit average price"])


The average unit price range rose and remained stable after 20 weeks.

result_df2= lianjia.groupby(["Transaction time (year)","Closing time (week)"])["Transaction period (days)"].agg(
{"Transaction period (days)":"mean"})
display(result_df2.loc[2019].head(60))
year = 2019

mpl.rcParams["font.size"] = 12

plt.figure(figsize=(12,6))

plt.bar(result_df2.loc[year].index, result_df2.loc[year]["Transaction period (days)"])

plt.xticks(result_df.loc[year].index)
plt.yticks(np.linspace(0, 225, 10))
font = {"family":"Kaiti",
       "style":"oblique",
        "weight":"normal",
        "color":"green",
        "size": 20
       }
plt.xlabel("week", fontdict=font)
plt.ylabel("Transaction period (days)", fontdict=font)
plt.grid(axis="y", color="g", ls=":", lw=1)
plt.title(str(year) + "Sales cycle of second-hand houses in Beijing", fontdict=font, color= "r")


The long transaction cycle of the 6th, 18th, 22nd, 27th, 31st, 35th and 40th weeks needs further analysis

Published 1 original article, praised 0 and visited 13
Private letter follow

Keywords: Python encoding

Added by darktimesrpg on Tue, 03 Mar 2020 07:20:30 +0200