Python series - Pandas - time frequency pandas DataFrame. resample

Official website: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.resample.html

objective

This article is mainly written with the function of resample, the explanation of parameter configuration, and what parameters it can be used with.

It will be explained as follows

1. What effect can resample achieve

2. What are the parameters of resample

3. Common resample classification examples

1. What effect can resample achieve

resample can be combined with various time dimensions for grouping and aggregation. For grouping, you can use max, min, sum, mean, etc.

It can be used with three scenarios

groupby

Group by mapping, function, label, or list of labels.

Series.resample

Resample a Series.

DataFrame.resample

Resample a DataFrame

example

Suppose I have a batch of data, with 2 rows, one row of time series and one row of specific numbers, which are displayed in DataFrame. The details are as follows.

import pandas as pd
rng = pd.date_range("1/1/2012", periods=100, freq="D")
ts = pd.Series(np.random.randint(0, 500, len(rng)), index=rng)
ts = ts.reset_index(name='num')

Note: the data generated are from January 1, 2012 to April 09, 2012.

demand

Now there is a demand for the sum of data under different time dimensions, such as weekly and monthly.

Completion requirements

First, set the date type field as the index. This step is to be used with resample. It can only be used with datetimeindex

ts = ts.set_index('index')
ts.head()
-----------------------
            num
index          
2012-01-01  104
2012-01-02  249
2012-01-03  177
2012-01-04  262
2012-01-05  318
-------------------------
ts.index
-------------------------
DatetimeIndex(['2012-01-01', '2012-01-02', '2012-01-03', '2012-01-04',
               '2012-01-05', '2012-01-06', '2012-01-07', '2012-01-08',
               '2012-01-09', '2012-01-10', '2012-01-11', '2012-01-12',
               '2012-01-13', '2012-01-14', '2012-01-15', '2012-01-16',
               '2012-01-17', '2012-01-18', '2012-01-19', '2012-01-20',
               '2012-01-21', '2012-01-22', '2012-01-23', '2012-01-24',
               '2012-01-25', '2012-01-26', '2012-01-27', '2012-01-28',
               '2012-01-29', '2012-01-30', '2012-01-31', '2012-02-01',
               '2012-02-02', '2012-02-03', '2012-02-04', '2012-02-05',
               '2012-02-06', '2012-02-07', '2012-02-08', '2012-02-09',
               '2012-02-10', '2012-02-11', '2012-02-12', '2012-02-13',
               '2012-02-14', '2012-02-15', '2012-02-16', '2012-02-17',
               '2012-02-18', '2012-02-19', '2012-02-20', '2012-02-21',
               '2012-02-22', '2012-02-23', '2012-02-24', '2012-02-25',
               '2012-02-26', '2012-02-27', '2012-02-28', '2012-02-29',
               '2012-03-01', '2012-03-02', '2012-03-03', '2012-03-04',
               '2012-03-05', '2012-03-06', '2012-03-07', '2012-03-08',
               '2012-03-09', '2012-03-10', '2012-03-11', '2012-03-12',
               '2012-03-13', '2012-03-14', '2012-03-15', '2012-03-16',
               '2012-03-17', '2012-03-18', '2012-03-19', '2012-03-20',
               '2012-03-21', '2012-03-22', '2012-03-23', '2012-03-24',
               '2012-03-25', '2012-03-26', '2012-03-27', '2012-03-28',
               '2012-03-29', '2012-03-30', '2012-03-31', '2012-04-01',
               '2012-04-02', '2012-04-03', '2012-04-04', '2012-04-05',
               '2012-04-06', '2012-04-07', '2012-04-08', '2012-04-09'],
              dtype='datetime64[ns]', name='index', freq=None)
  • weekly

    # Sum by week
    ts.resample('7D').sum()
    --------------------------
                num
    index           
    2012-01-01  1817
    2012-01-08  2460
    2012-01-15  2070
    2012-01-22  2104
    2012-01-29  1812
    2012-02-05  2008
    2012-02-12  1949
    2012-02-19  2092
    2012-02-26  2527
    2012-03-04  1934
    2012-03-11  1856
    2012-03-18  1546
    2012-03-25  1206
    2012-04-01  1865
    2012-04-08   441
    
  • monthly

    # Sum by month
    ts.resample('M').sum()
    --------------------------
                 num
    index           
    2012-01-31  9552
    2012-02-29  8233
    2012-03-31  7596
    2012-04-30  2306
    

2. What are the parameters of resample

2.1 parameters of resample function
Signature:
ts.resample(
    rule,
    axis=0,
    closed: Union[str, NoneType] = None,
    label: Union[str, NoneType] = None,
    convention: str = 'start',
    kind: Union[str, NoneType] = None,
    loffset=None,
    base: int = 0,
    on=None,
    level=None,
)
Docstring:
Resample time-series data.
parameterexplain
freqIndicates the resampling frequency, such as'M ',' 5min ', Second(15)
how='mean'The function name or array function used to generate aggregate values, such as
‘mean’,‘ohlc’,np.max, etc. the default is' mean ', and other commonly used values
There are 'first', 'last', 'medium', 'max' and 'min'
axis=0The default is the vertical axis, and the horizontal axis is set to axis=1
fill_method = NoneHow to interpolate in ascending sampling, such as' fill ',' bfill ', etc
closed = 'right'When downsampling, which section of each time period is closed, 'right' or 'left', the default is' right '
label= 'right'How to set the label of aggregate value during downsampling, for example, whether 9:30-9:35 will be marked as 9:30 or 9:35,
Default 9:35
loffset = NoneThe time correction value of the bin tag, such as' - 1s' or Second(-1), is used to advance the aggregation tag by 1 second
limit=NoneThe maximum number of periods allowed to fill when filling forward or backward
kind = NoneAggregate to period ('period ') or timestamp ('timestamp'), which is the index type of time series by default
convention = NoneThe Convention (start or end) used to convert the low frequency to the high frequency during the resampling period. Default 'end'

If you need to understand the function parameters in detail, you can type the code yourself and experience the actual application. You can also press shift+tab to see the examples in the function description. There is no more detail here.

2.2 freq resampling parameters

freq resampling frequency has many parameters. Here are some collected by the network.

aliasOffset typeexplain
DDayDaily calendar day
BBusinessDayDaily working days (holidays)
HHourPer hour
T/minMinuteper minute
SSecondEvery second
MMonthEndLast calendar day of each month
BMBusinessMonthEndLast working day of each month
Q-JAN,Q-FRBQuarterEndFor a year ending in a specified month,
The last calendar day of the last month of each quarter
A-JAN,A-FEBYearEndThe last calendar day of the specified month of each year

3. Common resample classification examples

The key point is to observe the official examples. I think it is more important to briefly list the following.

January 1st of every decade

.resample('10AS')

Displayed as December 31 of every decade

.resample('10A')

days = pd.date_range('1/1/2000', periods=4, freq='D')
>>> d2 = {'price': [10, 11, 9, 13, 14, 18, 17, 19],
...       'volume': [50, 60, 40, 100, 50, 100, 40, 50]}
>>> df2 = pd.DataFrame(d2,
...                    index=pd.MultiIndex.from_product([days,
...                                                     ['morning',
...                                                      'afternoon']]
...                                                     ))
>>> df2
                      price  volume
2000-01-01 morning       10      50
           afternoon     11      60
2000-01-02 morning        9      40
           afternoon     13     100
2000-01-03 morning       14      50
           afternoon     18     100
2000-01-04 morning       17      40
           afternoon     19      50
>>> df2.resample('D', level=0).sum()
            price  volume
2000-01-01     21     110
2000-01-02     22     140
2000-01-03     32     150
2000-01-04     36      90

Keywords: Python Data Analysis pandas

Added by PHPcadet on Thu, 17 Feb 2022 13:40:53 +0200