Integrated learning: XGBoost

Rimeng Society

AI AI:Keras PyTorch MXNet TensorFlow PaddlePaddle deep learning real combat (irregular update)

Integrated learning: Bagging, random forest, Boosting, GBDT

Integrated learning: XGBoost

Integrated learning: lightGBM (I)

Integrated learning: lightGBM (II)

5.1 principle of xgboost algorithm

XGBoost (Extreme Gradient Boosting) is fully called extreme gradient lifting tree. XGBoost is the trump card of integrated learning method. In Kaggle data mining competition, most winners use XGBoost.

XGBoost performs very well in most regression and classification problems. This section will introduce the algorithm principle of XGBoost in detail.

1. Construction method of optimal model

As we have known before, the general method of constructing the optimal model is to minimize the loss function of training data.

We use the letter L to represent the loss, as follows:

Where F is the hypothetical space

Hypothesis space is a set of assumptions without omission for all cases that may meet the goal when the attributes and possible values of attributes are known.

Equation (1.1) is called empirical risk minimization, and the complexity of the trained model is high. When the training data is small, the model is prone to over fitting problems.

Therefore, in order to reduce the complexity of the model, the following formula is often used:

Where J(f)J(f) is the complexity of the model,

Equation (2.1) is called structural risk minimization. The model of structural risk minimization often has a good prediction of training data and unknown test data.

Application:

  • The generation and pruning of decision tree correspond to empirical risk minimization and structural risk minimization respectively,
  • The decision tree generation of XGBoost is the result of structural risk minimization, which will be described in detail later.

2. Derivation of objective function of xgboost

2.1 determination of objective function

The objective function, namely the loss function, constructs the optimal model by minimizing the loss function.

It can be seen from the above that the loss function should be added with a regular term representing the complexity of the model, and the model corresponding to XGBoost contains multiple CART trees. Therefore, the objective function of the model is:

2.2 introduction to cart tree

2.3 definition of tree complexity

2.3. 1 define the complexity of each lesson tree

The model corresponding to XGBoost method contains multiple cart trees, and the complexity of each tree is defined:

2.3. 2 example of tree complexity

Suppose we want to predict the family's preference for video games. Considering that young people are more likely to like video games than old people, and men prefer video games than women, we first distinguish children and adults according to their age, and then distinguish men and women by gender, and score each person's preference for video games one by one, As shown in the figure below:

In this way, two trees tree1 and tree2 are trained. Similar to the principle of gbdt before, the sum of the conclusions of the two trees is the final conclusion, so:

  • The predicted score of the little boy is the sum of the scores of the nodes where the child falls in the two trees: 2 + 0.9 = 2.9.
  • Grandpa's prediction score is the same: - 1 + (- 0.9) = - 1.9.

See the following figure for details:

2.4 derivation of objective function

3 regression tree construction method of xgboost

3.1 calculating split nodes

In the actual training process, when the t-th tree is established, XGBoost uses the greedy method to split the tree nodes:

Start when the tree depth is 0:

  • Try to split each leaf node in the tree;

  • After each split, the original leaf node continues to split into left and right child leaf nodes, and the sample set in the original leaf node will be dispersed into the left and right leaf nodes according to the judgment rules of the node;

  • After splitting a new node, we need to check whether this splitting will bring gain to the loss function. The definition of gain is as follows:

If gain > 0, that is, after splitting into two leaf nodes, the objective function decreases, then we will consider the result of this splitting.

So when will the division stop?

3.2 judgment of stop splitting conditions

Case 1: the scoring function derived in the previous section is the standard to measure the tree structure. Therefore, the scoring function can be used to select the best segmentation point. Firstly, determine all the segmentation points of the sample characteristics, and segment each determined segmentation point. The criteria for segmentation are as follows:

4. Difference between xgboost and GDBT

  • Difference 1:
    • XGBoost generates CART tree considering the complexity of the tree,
    • GDBT does not consider the complexity of the tree in the tree pruning step.
  • Difference 2:
    • XGBoost is the second derivative expansion fitting the loss function of the previous round, and GDBT is the first derivative expansion fitting the loss function of the previous round. Therefore, XGBoost has higher accuracy, meets the same training effect, and requires fewer iterations.
  • Difference 3:
    • Both XGBoost and GDBT iterate step by step to improve the performance of the model, but XGBoost can start multithreading when selecting the best segmentation point, which greatly improves the running speed.

5 Summary

5.2 introduction to xgboost algorithm api

1 xgboost installation:

Official website link: https://xgboost.readthedocs.io/en/latest/

pip3 install xgboost

2. Introduction to xgboost parameters

Although xgboost is called the kaggle game, if we want to train a good model, we must pass appropriate values to the parameters.

xgboost encapsulates many parameters, which are mainly composed of three types: general parameters, booster parameters and task parameters

  • General parameters: mainly macro function control;
  • Booster parameter: depending on the selected booster type, it is used to control the booster (tree, region) of each step;
  • Learning goal parameters: control the performance of training goals.

2.1 general parameters

  1. booster [default = gbtree]
  2. Decide which booster to use. It can be gbtree, gblinear or dart.

    • gbtree and dart use tree based models (DART mainly uses Dropout), while gblinear uses linear functions
  3. silent [default = 0]

    • Set to 0 to print operation information; Set to 1 silent mode, do not print
  4. nthread [default = set to maximum possible number of threads]

    • For the number of threads running xgboost in parallel, the input parameter should be < = the number of CPU cores of the system. If it is not set, the algorithm will detect and set it to all the cores of the CPU

The following two parameters do not need to be set. Just use the default

  1. num_pbuffer [xgboost is automatically set, no user setting is required]

    • The size of the prediction result cache is usually set to the number of training instances. This cache is used to store the prediction results of the last boosting operation.
  2. num_feature [xgboost is automatically set, no user setting is required]

    • Use the dimension of the feature in boosting and set it as the maximum dimension of the feature

2.2 booster parameters

2.2.1 Parameters for Tree Booster

  1. eta [default = 0.3, alias: learning_rate]

    • Reduce the step size in the update to prevent overfitting.

    • After each boosting, new feature weights can be obtained directly, which can make the boosting more robust.

    • Range: [0,1]
  2. gamma [default = 0, alias: min_split_loss]

    • When a node is split, it will split only when the value of the loss function decreases after splitting.
    • Gamma specifies the minimum loss function drop value required for node splitting. The larger the value of this parameter, the more conservative the algorithm is. The value of this parameter is closely related to the loss function, so it needs to be adjusted.

    • Range: [0, ∞]

  3. max_depth [default = 6]

    • This value is the maximum depth of the tree. This value is also used to avoid over fitting. max_ The greater the depth, the more specific and local samples the model will learn. Setting to 0 means there is no limit
    • Range: [0, ∞]
  4. min_child_weight [default = 1]

    • Determine the minimum leaf node sample weight and. This parameter of XGBoost is the sum of the minimum sample weights
    • When its value is large, the model can avoid learning local special samples. However, if this value is too high, it will lead to under fitting. This parameter needs to be adjusted using CV
    • Range: [0, ∞]
  5. subsample [default = 1]

    • This parameter controls the proportion of random sampling for each tree.
    • Reducing the value of this parameter will make the algorithm more conservative and avoid over fitting. However, if this value is set too small, it may result in under fitting.

    • Typical value: 0.5-1, 0.5 represents average sampling to prevent over fitting

    • Range: (0,1]
  6. colsample_bytree [default = 1]

    • It is used to control the proportion of the number of columns sampled randomly per tree (each column is a feature).
    • Typical value: 0.5-1
    • Range: (0,1]
  7. colsample_bylevel [default = 1]

    • It is used to control the sampling proportion of the number of columns for each split of each level of the tree.
    • Personally, I don't use this parameter because the subsample parameter and colsample parameter_ The bytree parameter can do the same. However, if you are interested, you can mine this parameter for more use.
    • Range: (0,1]
  8. Lambda [default = 1, alias: reg_lambda]

    • L2 regularization term of weight (similar to ridge expression).
    • This parameter is used to control the regularization part of XGBoost. Although most data scientists rarely use this parameter, this parameter
    • We can find more use in reducing over fitting
  9. Alpha [default = 0, alias: reg_alpha]

    • L1 regularization term of weight. (similar to lasso region). It can be applied in the case of high dimensions to make the algorithm faster.
  10. scale_pos_weight [default = 1]

    • When all kinds of samples are very unbalanced, setting this parameter to a positive value can make the algorithm converge faster. You can usually set it to negative
    • The ratio of the number of samples to the number of positive samples.

2.2.2 Parameters for Linear Booster

linear booster is rarely used.

  1. Lambda [default = 0, alias: reg_lambda]

    • L2 regularization penalty coefficient. Increasing this value will make the model more conservative.
  2. Alpha [default = 0, alias: reg_alpha]

    • L1 regularization penalty coefficient. Increasing this value will make the model more conservative.
  3. lambda_ Bias [default = 0, alias: reg_lambda_bias]

    • L2 regularization on bias (no bias on L1 because it is not important)

2.3 task parameters

  1. objective [default = reg:linear]

    1. "reg:linear" – linear regression
    2. "reg:logistic" – logistic regression
    3. "binary:logistic" – binary logistic regression with probability output
    4. "multi:softmax" – use softmax's multi classifier to return the predicted category (not probability). In this case, you also need to set one more parameter: num_ Class (number of categories)
    5. "multi:softprob" – the same as the multi:softmax parameter, but returns the probability that each data belongs to each category.
  2. eval_metric [default = selected by objective function]

    The options are as follows:

    1. "rmse": root mean square error
    2. "mae": mean absolute error
    3. "logloss": negative log likelihood function value
    4. "Error": secondary classification error rate.
      • Its value is obtained by the ratio of the number of wrong classifications to the number of all classifications. For the prediction, the prediction value greater than 0.5 is considered as positive, and others are classified as negative.
    5. “ error@t ”: different division thresholds can be set through't '
    6. "merror": multi category error rate. The calculation formula is (wrong cases)/(all cases)
    7. "mlogloss": multi category log loss
    8. "auc": area under curve
  3. seed [default = 0]

    • Seed of random number
  • Set it to reproduce the results of random data and adjust parameters

5.3 xgboost case introduction

1 case background

This case is the same as the case used in the previous decision tree.

The sinking of the Titanic is one of the most notorious shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 of 2224 passengers and crew. This sensational tragedy shocked the international community and formulated better safety regulations for ships. One of the reasons for the shipwreck is that the passengers and crew did not have enough lifeboats. Although there are some luck factors in survival and sinking, some people are easier to survive than others, such as women, children and upper class society. In this case, we ask you to complete the analysis of who is likely to survive. In particular, we ask you to use machine learning tools to predict which passengers survived the tragedy.

Case: https://www.kaggle.com/c/titanic/overview

The features we extracted from the data set include ticket category, survival, flight number, age and landing home Dest, room, boat and gender, etc.

Data: http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt

According to the observed data:

  • 1. The passenger class refers to the passenger class (1, 2, 3), which is the representative of the socio-economic class.
  • 2. The age data is missing.

2 step analysis

  • 1. Obtain data
  • 2. Basic data processing
    • 2.1 determination of characteristic value and target value
    • 2.2 missing value handling
    • 2.3 data set division
  • 3. Feature Engineering (dictionary feature extraction)
  • 4. Machine learning (xgboost)
  • 5. Model evaluation

3 code implementation

  • Import required modules
import pandas as pd
import numpy as np
from sklearn.feature_extraction import DictVectorizer
from sklearn.model_selection import train_test_split
  • 1. Obtain data
# 1. Get data
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")
  • 2. Basic data processing

    • 2.1 determination of characteristic value and target value
    x = titan[["pclass", "age", "sex"]]
    y = titan["survived"]
    
    • 2.2 missing value handling
    # Missing values need to be processed, and these features with categories in the features are extracted from the dictionary features
    x['age'].fillna(x['age'].mean(), inplace=True)
    
    • 2.3 data set division
    x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
    
  • 3. Feature Engineering (dictionary feature extraction)

If category symbols appear in the feature, one hot coding processing (DictVectorizer) is required

x.to_dict(orient="records") needs to convert array features into dictionary data

# For converting x to dictionary data x.to_dict(orient="records")
# [{"pclass": "1st", "age": 29.00, "sex": "female"}, {}]

transfer = DictVectorizer(sparse=False)

x_train = transfer.fit_transform(x_train.to_dict(orient="records"))
x_test = transfer.fit_transform(x_test.to_dict(orient="records"))
  • 4.xgboost model training and model evaluation
# Model preliminary training
from xgboost import XGBClassifier
xg = XGBClassifier()

xg.fit(x_train, y_train)

xg.score(x_test, y_test)
# For max_depth for model tuning
depth_range = range(10)
score = []
for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    s = xg.score(x_test, y_test)
    print(s)
    score.append(s)
# Result visualization
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()

In [1]:

# 1. get data
# 2. Basic data processing
#2.1 determination of characteristic value and target value
#2.2 missing value handling
#2.3 data set division
# 3. Feature Engineering (dictionary feature extraction)
# 4. Machine learning (xgboost)
# 5. Model evaluation

In [2]:

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.tree import DecisionTreeClassifier, export_graphviz

In [3]:

# 1. get data
titan = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic.txt")

In [4]:

titan

Out[4]:

row.namespclasssurvivednameageembarkedhome.destroomticketboatsex
011st1Allen, Miss Elisabeth Walton29.0000SouthamptonSt Louis, MOB-524160 L2212female
121st0Allison, Miss Helen Loraine2.0000SouthamptonMontreal, PQ / Chesterville, ONC26NaNNaNfemale
231st0Allison, Mr Hudson Joshua Creighton30.0000SouthamptonMontreal, PQ / Chesterville, ONC26NaN(135)male
341st0Allison, Mrs Hudson J.C. (Bessie Waldo Daniels)25.0000SouthamptonMontreal, PQ / Chesterville, ONC26NaNNaNfemale
451st1Allison, Master Hudson Trevor0.9167SouthamptonMontreal, PQ / Chesterville, ONC22NaN11male
561st1Anderson, Mr Harry47.0000SouthamptonNew York, NYE-12NaN3male
671st1Andrews, Miss Kornelia Theodosia63.0000SouthamptonHudson, NYD-713502 L7710female
781st0Andrews, Mr Thomas, jr39.0000SouthamptonBelfast, NIA-36NaNNaNmale
891st1Appleton, Mrs Edward Dale (Charlotte Lamson)58.0000SouthamptonBayside, Queens, NYC-101NaN2female
9101st0Artagaveytia, Mr Ramon71.0000CherbourgMontevideo, UruguayNaNNaN(22)male
10111st0Astor, Colonel John Jacob47.0000CherbourgNew York, NYNaN17754 L224 10s 6d(124)male
11121st1Astor, Mrs John Jacob (Madeleine Talmadge Force)19.0000CherbourgNew York, NYNaN17754 L224 10s 6d4female
12131st1Aubert, Mrs Leontine PaulineNaNCherbourgParis, FranceB-3517477 L69 6s9female
13141st1Barkworth, Mr Algernon H.NaNSouthamptonHessle, YorksA-23NaNBmale
14151st0Baumann, Mr John D.NaNSouthamptonNew York, NYNaNNaNNaNmale
15161st1Baxter, Mrs James (Helene DeLaudeniere Chaput)50.0000CherbourgMontreal, PQB-58/60NaN6female
16171st0Baxter, Mr Quigg Edmond24.0000CherbourgMontreal, PQB-58/60NaNNaNmale
17181st0Beattie, Mr Thomson36.0000CherbourgWinnipeg, MNC-6NaNNaNmale
18191st1Beckwith, Mr Richard Leonard37.0000SouthamptonNew York, NYD-35NaN5male
19201st1Beckwith, Mrs Richard Leonard (Sallie Monypeny)47.0000SouthamptonNew York, NYD-35NaN5female
20211st1Behr, Mr Karl Howell26.0000CherbourgNew York, NYC-148NaN5male
21221st0Birnbaum, Mr Jakob25.0000CherbourgSan Francisco, CANaNNaN(148)male
22231st1Bishop, Mr Dickinson H.25.0000CherbourgDowagiac, MIB-49NaN7male
23241st1Bishop, Mrs Dickinson H. (Helen Walton)19.0000CherbourgDowagiac, MIB-49NaN7female
24251st1Bjornstrm-Steffansson, Mr Mauritz Hakan28.0000SouthamptonStockholm, Sweden / Washington, DCNaNDmale
25261st0Blackwell, Mr Stephen Weart45.0000SouthamptonTrenton, NJNaNNaN(241)male
26271st1Blank, Mr Henry39.0000CherbourgGlen Ridge, NJA-31NaN7male
27281st1Bonnell, Miss Caroline30.0000SouthamptonYoungstown, OHC-7NaN8female
28291st1Bonnell, Miss Elizabeth58.0000SouthamptonBirkdale, England Cleveland, OhioC-103NaN8female
29301st0Borebank, Mr John JamesNaNSouthamptonLondon / Winnipeg, MBD-21/2NaNNaNmale
....................................
128312843rd0Vestrom, Miss Hulda Amanda AdolfinaNaNNaNNaNNaNNaNNaNfemale
128412853rd0Vonk, Mr JenkoNaNNaNNaNNaNNaNNaNmale
128512863rd0Ware, Mr FrederickNaNNaNNaNNaNNaNNaNmale
128612873rd0Warren, Mr Charles WilliamNaNNaNNaNNaNNaNNaNmale
128712883rd0Wazli, Mr YousifNaNNaNNaNNaNNaNNaNmale
128812893rd0Webber, Mr JamesNaNNaNNaNNaNNaNNaNmale
128912903rd1Wennerstrom, Mr August EdvardNaNNaNNaNNaNNaNNaNmale
129012913rd0Wenzel, Mr LinhartNaNNaNNaNNaNNaNNaNmale
129112923rd0Widegren, Mr Charles PeterNaNNaNNaNNaNNaNNaNmale
129212933rd0Wiklund, Mr Jacob AlfredNaNNaNNaNNaNNaNNaNmale
129312943rd1Wilkes, Mrs EllenNaNNaNNaNNaNNaNNaNfemale
129412953rd0Willer, Mr AaronNaNNaNNaNNaNNaNNaNmale
129512963rd0Willey, Mr EdwardNaNNaNNaNNaNNaNNaNmale
129612973rd0Williams, Mr Howard HughNaNNaNNaNNaNNaNNaNmale
129712983rd0Williams, Mr LeslieNaNNaNNaNNaNNaNNaNmale
129812993rd0Windelov, Mr EinarNaNNaNNaNNaNNaNNaNmale
129913003rd0Wirz, Mr AlbertNaNNaNNaNNaNNaNNaNmale
130013013rd0Wiseman, Mr PhillippeNaNNaNNaNNaNNaNNaNmale
130113023rd0Wittevrongel, Mr CamielNaNNaNNaNNaNNaNNaNmale
130213033rd1Yalsevac, Mr IvanNaNNaNNaNNaNNaNNaNmale
130313043rd0Yasbeck, Mr AntoniNaNNaNNaNNaNNaNNaNmale
130413053rd1Yasbeck, Mrs AntoniNaNNaNNaNNaNNaNNaNfemale
130513063rd0Youssef, Mr GeriosNaNNaNNaNNaNNaNNaNmale
130613073rd0Zabour, Miss HileniNaNNaNNaNNaNNaNNaNfemale
130713083rd0Zabour, Miss TaminiNaNNaNNaNNaNNaNNaNfemale
130813093rd0Zakarian, Mr ArtunNaNNaNNaNNaNNaNNaNmale
130913103rd0Zakarian, Mr MapriederNaNNaNNaNNaNNaNNaNmale
131013113rd0Zenn, Mr PhilipNaNNaNNaNNaNNaNNaNmale
131113123rd0Zievens, ReneNaNNaNNaNNaNNaNNaNfemale
131213133rd0Zimmerman, LeoNaNNaNNaNNaNNaNNaNmale

1313 rows × 11 columns

In [5]:

titan.describe()

Out[5]:

row.namessurvivedage
count1313.0000001313.000000633.000000
mean657.0000000.34196531.194181
std379.1747620.47454914.747525
min1.0000000.0000000.166700
25%329.0000000.00000021.000000
50%657.0000000.00000030.000000
75%985.0000001.00000041.000000
max1313.0000001.00000071.000000

In [6]:

# 2. Basic data processing
#2.1 determination of characteristic value and target value
x = titan[["pclass", "age", "sex"]]
y = titan["survived"]

In [7]:

x.head()

Out[7]:

pclassagesex
01st29.0000female
11st2.0000female
21st30.0000male
31st25.0000female
41st0.9167male

In [8]:

y.head()

Out[8]:

0    1
1    0
2    0
3    0
4    1
Name: survived, dtype: int64

In [9]:

#2.2 missing value handling
x['age'].fillna(value=titan["age"].mean(), inplace=True)

In [10]:

x.head()

Out[10]:

pclassagesex
01st29.0000female
11st2.0000female
21st30.0000male
31st25.0000female
41st0.9167male

In [11]:

#2.3 data set division
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22, test_size=0.2)

In [12]:

# 3. Feature Engineering (dictionary feature extraction)

In [13]:

x_train.head()

Out[13]:

pclassagesex
6493rd45.000000female
10783rd31.194181male
591st31.194181female
2011st18.000000male
611st31.194181female

In [14]:

x_train = x_train.to_dict(orient="records")
x_test = x_test.to_dict(orient="records")

In [15]:

x_train

Out[15]:

[{'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 13.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 10.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 43.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 51.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 4.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 12.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 69.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 14.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '1st', 'age': 53.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 57.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 22.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 6.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 41.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 34.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 39.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 39.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 67.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 11.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 59.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 44.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 65.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 37.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 52.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 48.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 2.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 27.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 0.9167, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 14.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 61.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.1667, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 20.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 62.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 70.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 51.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 33.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 59.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 3.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 28.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 15.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 40.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 8.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 63.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 43.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 1.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 38.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 4.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 57.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 40.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 47.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 37.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 5.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 21.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 41.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 28.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 35.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 50.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 50.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 52.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 11.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 40.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 49.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 9.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 35.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 32.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 32.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 45.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 26.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 24.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 64.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 46.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 29.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 34.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 0.8333, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 60.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 44.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 71.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 13.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 58.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 4.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 33.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 33.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 48.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 28.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 71.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 47.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 21.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 24.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 18.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 54.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 17.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 6.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 45.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 36.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 55.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 26.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '1st', 'age': 65.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 27.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 22.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '2nd', 'age': 7.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 30.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 39.0, 'sex': 'female'},
 {'pclass': '1st', 'age': 19.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 19.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 20.0, 'sex': 'male'},
 {'pclass': '1st', 'age': 56.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 38.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'female'},
 {'pclass': '2nd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 42.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 23.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 25.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '2nd', 'age': 16.0, 'sex': 'male'},
 {'pclass': '2nd', 'age': 42.0, 'sex': 'male'},
 {'pclass': '3rd', 'age': 2.0, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'female'},
 {'pclass': '3rd', 'age': 31.19418104265403, 'sex': 'male'},
 {'pclass': '1st', 'age': 36.0, 'sex': 'male'},
 ...]

In [16]:

transfer = DictVectorizer()

x_train = transfer.fit_transform(x_train)
x_test = transfer.fit_transform(x_test)

In [21]:

# 4.xgboost model training
#4.1 preliminary model training
from xgboost import XGBClassifier

xg = XGBClassifier()

xg.fit(x_train, y_train)

Out[21]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='binary:logistic', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [22]:

xg.score(x_test, y_test)

Out[22]:

0.7832699619771863

In [23]:

#4.2 max_depth for tuning

depth_range  = range(10)
score = []

for i in depth_range:
    xg = XGBClassifier(eta=1, gamma=0, max_depth=i)
    xg.fit(x_train, y_train)
    
    s = xg.score(x_test, y_test)
    
    print(s)
    score.append(s)

0.6311787072243346
0.7908745247148289
0.7870722433460076
0.7832699619771863
0.7870722433460076
0.7908745247148289
0.7908745247148289
0.7946768060836502
0.7908745247148289
0.7946768060836502

In [25]:

#4.3 visualization of tuning results
import matplotlib.pyplot as plt

plt.plot(depth_range, score)

plt.show()

5.4 otto case introduction -- Otto Group Product Classification Challenge [xgboost implementation]

1 background introduction

Otto group is one of the largest e-commerce companies in the world, with subsidiaries in more than 20 countries. The company sells millions of products around the world every day, so it is very important to classify its products according to their performance.

However, in practice, the staff found that many of the same products have been classified differently. This case requires you to correctly classify the products of Alto group. Provide classification accuracy as much as possible.

Link: https://www.kaggle.com/c/otto-group-product-classification-challenge/overview

2 thinking analysis

  • 1. Data acquisition

  • 2. Basic data processing

    • 2.1 intercepting some data
    • 2.2 converting label paper into numbers
    • 2.3 split data (using structured shufflesplit)
    • 2.4 data standardization
    • 2.5 data pca dimensionality reduction
  • 3. Model training

    • 3.1 basic model training
    • 3.2 model tuning
      • 3.2. 1. Tuning parameters:
        • n_estimator,
        • max_depth,
        • min_child_weights,
        • subsamples,
        • consample_bytrees,
        • etas
      • 3.2. 2 determine the final optimal parameters

Part 3 code implementation

  • 2. Basic data processing

    • 2.1 intercepting some data

    • 2.2 converting label paper into numbers

    • 2.3 split data (using structured shufflesplit)

      # Use structured shufflesplit to split the dataset
      from sklearn.model_selection import StratifiedShuffleSplit
      
      sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)
      for train_index, test_index in sss.split(X_resampled.values, y_resampled):
          print(len(train_index))
          print(len(test_index))
      
          x_train = X_resampled.values[train_index]
          x_val = X_resampled.values[test_index]
      
          y_train = y_resampled[train_index]
          y_val = y_resampled[test_index]
      
      # Graphical visualization of segmented data
      import seaborn as sns
      
      sns.countplot(y_val)
      
      plt.show()
      
    • 2.4 data standardization

      from sklearn.preprocessing import StandardScaler
      
      scaler = StandardScaler()
      scaler.fit(x_train)
      
      x_train_scaled = scaler.transform(x_train)
      x_val_scaled = scaler.transform(x_val)
      
    • 2.5 data pca dimensionality reduction

      print(x_train_scaled.shape)
      # (13888, 93)
      
      from sklearn.decomposition import PCA
      
      pca = PCA(n_components=0.9)
      x_train_pca = pca.fit_transform(x_train_scaled)
      x_val_pca = pca.transform(x_val_scaled)
      
      print(x_train_pca.shape, x_val_pca.shape)
      (13888, 65) (3473, 65)
      

      From the data output above, it can be seen that only 65 elements can express 90% of the information in the feature

      # Dimensionality reduction data visualization
      plt.plot(np.cumsum(pca.explained_variance_ratio_))
      
      plt.xlabel("Number of elements")
      plt.ylabel("Percentage of expressible information")
      
      plt.show()

  • 3. Model training

    • 3.1 basic model training

      from xgboost import XGBClassifier
      
      xgb = XGBClassifier()
      xgb.fit(x_train_pca, y_train)
      
      # Change the output mode of the predicted value so that the output result is the percentage and reduce the logloss value
      y_pre_proba = xgb.predict_proba(x_val_pca)
      
      # logloss for model evaluation
      from sklearn.metrics import log_loss
      log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)
      
      xgb.get_params
      
  • 3.2 model tuning

    • 3.2. 1. Tuning parameters:

      • n_estimator,

        scores_ne = []
        n_estimators = [100,200,400,450,500,550,600,700]
        
        for nes in n_estimators:
            print("n_estimators:", nes)
            xgb = XGBClassifier(max_depth=3, 
                                learning_rate=0.1, 
                                n_estimators=nes, 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_ne.append(score)
            print("Of test data logloss Value is:{}".format(score))
        
        # Data change visualization
        plt.plot(n_estimators, scores_ne, "o-")
        
        plt.ylabel("log_loss")
        plt.xlabel("n_estimators")
        print("n_estimators The optimal value is:{}".format(n_estimators[np.argmin(scores_ne)]))
        

      • max_depth,

        scores_md = []
        max_depths = [1,3,5,6,7]
        
        for md in max_depths:  # modify
            xgb = XGBClassifier(max_depth=md, # modify
                                learning_rate=0.1, 
                                n_estimators=n_estimators[np.argmin(scores_ne)],   # modify 
                                objective="multi:softprob", 
                                n_jobs=-1, 
                                nthread=4, 
                                min_child_weight=1, 
                                subsample=1, 
                                colsample_bytree=1,
                                seed=42)
        
            xgb.fit(x_train_pca, y_train)
            y_pre = xgb.predict_proba(x_val_pca)
            score = log_loss(y_val, y_pre)
            scores_md.append(score)  # modify
            print("Of test data logloss Value is:{}".format(log_loss(y_val, y_pre)))
        
        # Data change visualization
        plt.plot(max_depths, scores_md, "o-")  # modify
        
        plt.ylabel("log_loss")
        plt.xlabel("max_depths")  # modify
        print("max_depths The optimal value is:{}".format(max_depths[np.argmin(scores_md)]))  # modify
        
      • min_child_weights,

        • Adjust according to the above mode
      • subsamples,

      • consample_bytrees,

      • etas

    • 3.2. 2 determine the final optimal parameters

      xgb = XGBClassifier(learning_rate =0.1, 
                          n_estimators=550, 
                          max_depth=3, 
                          min_child_weight=3, 
                          subsample=0.7, 
                          colsample_bytree=0.7, 
                          nthread=4, 
                          seed=42, 
                          objective='multi:softprob')
      xgb.fit(x_train_scaled, y_train)
      
      y_pre = xgb.predict_proba(x_val_scaled)
      
      print("Of test data logloss Value is : {}".format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))
      

In [1]:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

Data acquisition

In [2]:

data = pd.read_csv("./data/otto/train.csv")

In [3]:

data.head()

Out[3]:

idfeat_1feat_2feat_3feat_4feat_5feat_6feat_7feat_8feat_9...feat_85feat_86feat_87feat_88feat_89feat_90feat_91feat_92feat_93target
01100000000...100000000Class_1
12000000010...000000000Class_1
23000000010...000000000Class_1
34100161500...012000000Class_1
45000000000...100001000Class_1

5 rows × 95 columns

In [4]:

data.shape

Out[4]:

(61878, 95)

In [5]:

data.describe()

Out[5]:

idfeat_1feat_2feat_3feat_4feat_5feat_6feat_7feat_8feat_9...feat_84feat_85feat_86feat_87feat_88feat_89feat_90feat_91feat_92feat_93
count61878.00000061878.0000061878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.000000...61878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.00000061878.000000
mean30939.5000000.386680.2630660.9014670.7790810.0710430.0256960.1937040.6624331.011296...0.0707520.5323061.1285760.3935490.8749150.4577720.8124210.2649410.3801190.126135
std17862.7843151.525331.2520732.9348182.7880050.4389020.2153331.0301022.2557703.474822...1.1514601.9004382.6815541.5754552.1154661.5273854.5978042.0456460.9823851.201720
min1.0000000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000...0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
25%15470.2500000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000...0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
50%30939.5000000.000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000...0.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.0000000.000000
75%46408.7500000.000000.0000000.0000000.0000000.0000000.0000000.0000001.0000000.000000...0.0000000.0000001.0000000.0000001.0000000.0000000.0000000.0000000.0000000.000000
max61878.00000061.0000051.00000064.00000070.00000019.00000010.00000038.00000076.00000043.000000...76.00000055.00000065.00000067.00000030.00000061.000000130.00000052.00000019.00000087.000000

8 rows × 94 columns

In [6]:

#Graphical visualization, viewing data distribution
import seaborn as sns

sns.countplot(data.target)

plt.show()

As can be seen from the above figure, the data category is unbalanced, so post-processing is required

Basic data processing

The data has been desensitized and no special processing is required

Intercept some data

In [7]:

new1_data = data[:10000]
new1_data.shape

Out[7]:

(10000, 95)

In [8]:

#Graphical visualization, viewing data distribution
import seaborn as sns

sns.countplot(new1_data.target)

plt.show()

It is not feasible to obtain the data in the above way, and then use random undersampling to obtain the data of the response

In [9]:

#Random undersampling data acquisition
#First, you need to determine the characteristic value \ label value

y = data["target"]
x = data.drop(["id", "target"], axis=1)

In [10]:

x.head()

Out[10]:

feat_1feat_2feat_3feat_4feat_5feat_6feat_7feat_8feat_9feat_10...feat_84feat_85feat_86feat_87feat_88feat_89feat_90feat_91feat_92feat_93
01000000000...0100000000
10000000100...0000000000
20000000100...0000000000
31001615001...22012000000
40000000000...0100001000

5 rows × 93 columns

In [11]:

y.head()

Out[11]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [12]:

#Under sampling data acquisition
from imblearn.under_sampling import RandomUnderSampler

rus = RandomUnderSampler(random_state=0)

X_resampled, y_resampled = rus.fit_resample(x, y)

In [13]:

x.shape, y.shape

Out[13]:

((61878, 93), (61878,))

In [14]:

X_resampled.shape, y_resampled.shape

Out[14]:

((17361, 93), (17361,))

In [15]:

#Graphical visualization, viewing data distribution
import seaborn as sns

sns.countplot(y_resampled)

plt.show()

Convert tag values to numbers

In [16]:

y_resampled.head()

Out[16]:

0    Class_1
1    Class_1
2    Class_1
3    Class_1
4    Class_1
Name: target, dtype: object

In [17]:

from sklearn.preprocessing import LabelEncoder

le = LabelEncoder()
y_resampled = le.fit_transform(y_resampled)
 

In [18]:

y_resampled

Out[18]:

array([0, 0, 0, ..., 8, 8, 8])

Split data

In [19]:

from sklearn.model_selection import train_test_split

x_train, x_test, y_train, y_test = train_test_split(X_resampled, y_resampled, test_size=0.2)

In [20]:

x_train.shape, y_train.shape

Out[20]:

((13888, 93), (13888,))

In [21]:

x_test.shape, y_test.shape

Out[21]:

((3473, 93), (3473,))

In [22]:

# 1. Data acquisition

# 2. Basic data processing

# 2.1 intercepting partial data
# 2.2 converting label paper into numbers
# 2.3 split data (using structured shufflesplit)
# 2.4 data standardization
# 2.5 data pca dimensionality reduction

# 3. model training
# 3.1 basic model training
# 3.2 model tuning
        # 3.2. 1. Tuning parameters:
            # n_estimator,
            # max_depth,
            # min_child_weights,
            # subsamples,
            # consample_bytrees,
            # etas
        # 3.2. 2 determine the final optimal parameters
    

In [23]:

#Graphic visualization
import seaborn as sns

sns.countplot(y_test)
plt.show()

In [28]:

#Data segmentation through structured shufflesplit

from sklearn.model_selection import StratifiedShuffleSplit

sss = StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)

for train_index, test_index in sss.split(X_resampled.values, y_resampled):
    print(len(train_index))
    print(len(test_index))
    
    x_train = X_resampled.values[train_index]
    x_val = X_resampled.values[test_index]
    
    y_train = y_resampled[train_index]
    y_val = y_resampled[test_index]

13888
3473

In [29]:

print(x_train.shape, x_val.shape)

(13888, 93) (3473, 93)

In [30]:

#Graphic visualization
import seaborn as sns

sns.countplot(y_val)
plt.show()

Data standardization

In [31]:

from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(x_train)

x_train_scaled = scaler.transform(x_train)
x_val_scaled = scaler.transform(x_val)

Data PCA dimensionality reduction

In [33]:

x_train_scaled.shape

Out[33]:

(13888, 93)

In [34]:

from sklearn.decomposition import PCA

pca = PCA(n_components=0.9)

x_train_pca = pca.fit_transform(x_train_scaled)
x_val_pca = pca.transform(x_val_scaled)

In [35]:

print(x_train_pca.shape, x_val_pca.shape)

(13888, 65) (3473, 65)

In [37]:

#Visual data dimensionality reduction information change degree
plt.plot(np.cumsum(pca.explained_variance_ratio_))

plt.xlabel("number of elements")
plt.ylabel("percentage of expressed information")

plt.show()

model training

Basic model training

In [38]:

from xgboost import XGBClassifier

xgb = XGBClassifier()
xgb.fit(x_train_pca, y_train)

Out[38]:

XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)

In [39]:

#Output the predicted value, and the predicted value with percentage proportion must be output
y_pre_proba = xgb.predict_proba(x_val_pca)

In [40]:

y_pre_proba

Out[40]:

array([[0.4893983 , 0.00375719, 0.00225278, ..., 0.06179977, 0.17131925,
        0.03980364],
       [0.14336601, 0.01110009, 0.01018962, ..., 0.00691424, 0.02062171,
        0.7525783 ],
       [0.00834821, 0.14602502, 0.65013766, ..., 0.01385602, 0.00602207,
        0.00240582],
       ...,
       [0.09568001, 0.00293341, 0.00582061, ..., 0.1031019 , 0.7587154 ,
        0.02730099],
       [0.40236628, 0.12317444, 0.03567632, ..., 0.18818544, 0.13276173,
        0.07105519],
       [0.00473167, 0.01536749, 0.02546864, ..., 0.00882399, 0.88531935,
        0.00384397]], dtype=float32)

In [42]:

#logloss assessment
from sklearn.metrics import log_loss

log_loss(y_val, y_pre_proba, eps=1e-15, normalize=True)

Out[42]:

0.7845457684689274

In [43]:

xgb.get_params

Out[43]:

<bound method XGBModel.get_params of XGBClassifier(base_score=0.5, booster='gbtree', colsample_bylevel=1,
              colsample_bynode=1, colsample_bytree=1, gamma=0,
              learning_rate=0.1, max_delta_step=0, max_depth=3,
              min_child_weight=1, missing=None, n_estimators=100, n_jobs=1,
              nthread=None, objective='multi:softprob', random_state=0,
              reg_alpha=0, reg_lambda=1, scale_pos_weight=1, seed=None,
              silent=None, subsample=1, verbosity=1)>

Model tuning

Determine the optimal estimators

In [44]:

scores_ne = []
n_estimators = [100, 200, 300, 400, 500, 550, 600, 700]

In [49]:

for nes in n_estimators:
    print("n_estimators:", nes)
    xgb = XGBClassifier(max_depth=3,
                        learning_rate=0.1, 
                        n_estimators=nes, 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_ne.append(score)
    
print("the logloss value of each test is: {}". format(score))

n_estimators: 100
 Of each test logloss Value is:0.7845457684689274
n_estimators: 200
 Of each test logloss Value is:0.7163659085830947
n_estimators: 300
 Of each test logloss Value is:0.6933389946023942
n_estimators: 400
 Of each test logloss Value is:0.68119252278615
n_estimators: 500
 Of each test logloss Value is:0.67700775120196
n_estimators: 550
 Of each test logloss Value is:0.6756911007299885
n_estimators: 600
 Of each test logloss Value is:0.6757532660164814
n_estimators: 700
 Of each test logloss Value is:0.6778721089881976

In [50]:

#Graphically display the corresponding logloss value
plt.plot(n_estimators, scores_ne, "o-")

plt.xlabel("n_estimators")
plt.ylabel("log_loss")
plt.show()

print("the optimal n_estimators value is: {}". format(n_estimators[np.argmin(scores_ne)]))

Optimal n_estimators Value is:550

Determine the optimal max_depth

In [63]:

scores_md = []
max_depths = [1,3,5,6,7]

In [64]:

for md in max_depths:
    print("max_depth:", md)
    xgb = XGBClassifier(max_depth=md,
                        learning_rate=0.1, 
                        n_estimators=n_estimators[np.argmin(scores_ne)], 
                        objective="multi:softprob", 
                        n_jobs=-1, 
                        nthread=4, 
                        min_child_weight=1,
                        subsample=1,
                        colsample_bytree=1,
                        seed=42)
    
    xgb.fit(x_train_pca, y_train)
    y_pre = xgb.predict_proba(x_val_pca)
    score = log_loss(y_val, y_pre)
    scores_md.append(score)
    
print("the logloss value of each test is: {}". format(score))

max_depth: 1
 Of each test logloss Value is:0.8186777106711784
max_depth: 3
 Of each test logloss Value is:0.6756911007299885
max_depth: 5
 Of each test logloss Value is:0.730323661087053
max_depth: 6
 Of each test logloss Value is:0.7693314501840949
max_depth: 7
 Of each test logloss Value is:0.7889236364892144

In [67]:

#Graphically display the corresponding logloss value
plt.plot(max_depths, scores_md, "o-")

plt.xlabel("max_depths")
plt.ylabel("log_loss")
plt.show()

print("the optimal max_depths value is: {}". format(max_depths[np.argmin(scores_md)]))

Optimal max_depths Value is:3

Run and debug the following parameters according to the above mode

min_child_weights,

subsamples,

consample_bytrees,

etas

In [69]:

xgb = XGBClassifier(learning_rate =0.1, 
                    n_estimators=550, 
                    max_depth=3, 
                    min_child_weight=3, 
                    subsample=0.7, 
                    colsample_bytree=0.7, 
                    nthread=4, 
                    seed=42, 
                    objective='multi:softprob')

xgb.fit(x_train_scaled, y_train)

y_pre = xgb.predict_proba(x_val_scaled)

print("the log_loss value of the test data is: {}". format(log_loss(y_val, y_pre, eps=1e-15, normalize=True)))

Of test data log_loss Value is : 0.5944022517380477

Keywords: AI sklearn

Added by Pikachu2000 on Tue, 28 Dec 2021 10:41:22 +0200