Using PaddlePaddle to achieve table tennis timing action positioning open source plan

Project address

https://aistudio.baidu.com/aistudio/projectdetail/3389378?contributionType=1

The project can fork run with one key.

Introduction to competition questions

In many large-scale video analysis scenarios, locating and identifying human actions in a short time from lengthy and untrimmed videos has become a subject of great concern. The current solutions for human motion detection are difficult to work on large-scale video sets. Efficient processing of large-scale video data is still a challenging task in the field of computer vision. Its core problem can be divided into two parts. One is that the complexity of motion recognition algorithm is still high, and the other is the lack of methods that can produce less video proposals (paying more attention to the proposals of short-term actions themselves).

The video action proposal referred to here refers to some candidate video clips containing specific actions. In order to adapt to large-scale video analysis tasks, the sequential action proposal should meet the following two requirements as far as possible:
(1) Higher processing efficiency, for example, a mechanism can be designed to make sequential video segment coding and scoring more efficient;
(2) Stronger discrimination performance, for example, it can accurately locate the time interval of action.

This competition aims to inspire more developers and researchers to pay attention to and participate in the research on video motion positioning, and create a motion positioning model with better performance.

Data set introduction

The data set of this competition contains the characteristic information of standard single bit HD broadcast pictures in table tennis international competitions (World Cup, world championships, Asian Championships and Olympic Games) and domestic competitions (National Games and Table Tennis Super League) in the 19th-21st season, including 912 video characteristic files. Each video is 0-6 minutes long, and the characteristic dimension is 2048, which is saved in pkl format. In the characteristic data, we marked the swing movements of the athletes facing the lens in the round. The single action often ranges from 0 to 2 seconds. The training data is 729 marked videos, the A-test data is 91 videos, and the B-test data is 92 videos. The training data label is given in json format.

Data set preprocessing

This scheme adopts the BMN model in PaddleVideo. The BMN model is Baidu self research, the 2019 ActivityNet winning scheme, which provides an efficient solution for the generation of proposal in the video action location problem. It is the first open source on PaddlePaddle. This model introduces the boundary matching (BM) mechanism to evaluate the confidence of the proposal. All possible proposals are combined into a two-dimensional BM confidence graph according to the position and length of the proposal start boundary. The value of each point in the graph represents the confidence score of the corresponding proposal. The network is composed of three modules. The basic module is used as the backbone network to process the input characteristic sequence. The TEM module predicts the probability that each timing position belongs to the beginning and end of the action, and the PEM module generates the BM confidence map.

The data in this competition contains 912 video features extracted by ppTSM. The features are saved in pkl format, and the file name corresponds to the video name. After reading pkl, it represents a single video feature in the form of (num_of_frames, 2048) vector. Where num_ of_ Frames are not fixed, and the number is relatively large, so pkl files can not be directly used for training. At the same time, because the time of each action of table tennis is very short, in order to make the model better identify the action, the data is segmented here.

  1. Unzip the dataset first
    Execute the following command to decompress the data set. After decompressing, delete the compressed package to ensure that the project space is less than 100G. Otherwise, the project will be terminated.
%cd /home/aistudio/data/
!tar xf data122998/Features_competition_train.tar.gz
!tar xf data123004/Features_competition_test_A.tar.gz
!cp data122998/label_cls14_train.json .
!rm -rf data12*
/home/aistudio/data
  1. After decompressing the data, first segment the label label file. Execute the following script to split the dimension file.
import json
import random

import numpy as np

random.seed(0)
source_path = "/home/aistudio/data/label_cls14_train.json"

annos = json.load(open(source_path))
fps = annos['fps']
annos = annos['gts']
new_annos = {}
max_frams = 0

for anno in annos:
    if anno['total_frames'] > max_frams:
        max_frams = anno['total_frames']
    for i in range(9000//100):
        subset = 'training'
        clip_start = i * 4
        clip_end = (i + 1) * 4
        video_name = anno['url'].split('.')[0] + f"_{i}"
        new_annos[video_name] = {
            'duration_second': 100 / fps,
            'subset': subset,
            'duration_frame': 100,
            'annotations': [],
            'feature_frame': -1

        }
        actions = anno['actions']
        for act in actions:
            start_id = act['start_id']
            end_id = act['end_id']
            new_start_id = -1
            new_end_id = -1
            if start_id > clip_start and end_id < clip_end:
                new_start_id = start_id - clip_start
                new_end_id = end_id - clip_start
            elif start_id < clip_start < end_id < clip_end:
                new_start_id = 0
                new_end_id = end_id - clip_start
            elif clip_start < start_id < clip_end < end_id:
                new_start_id = start_id - clip_start
                new_end_id = 4
            elif start_id < clip_start < clip_end < end_id:
                new_start_id = 0
                new_end_id = 4
            else:
                continue

            new_annos[video_name]['annotations'].append({
                'segment': [round(new_start_id, 2), round(new_end_id, 2)],
                'label': str(act['label_ids'][0])
            })
        if len(new_annos[video_name]['annotations']) == 0:
            new_annos.pop(video_name)


json.dump(new_annos, open('new_label_cls14_train.json', 'w+'))
print(len(list(new_annos.keys())))
12597

After execution, a new annotation file new is generated in the data directory_ label_ cls14_ train. json. Let's start dividing the data of the training set and the test set.

  1. Execute the following script to split the training set.
import os
import os.path as osp
import glob
import pickle
import paddle

import numpy as np

file_list = glob.glob("/home/aistudio/data/Features_competition_train/*.pkl")

max_frames = 9000

npy_path = ("/home/aistudio/data/Features_competition_train/npy/")
if not osp.exists(npy_path):
    os.makedirs(npy_path)

for f in file_list:
    video_feat = pickle.load(open(f, 'rb'))
    tensor = paddle.to_tensor(video_feat['image_feature'])
    pad_num = 9000 - tensor.shape[0]
    pad1d = paddle.nn.Pad1D([0, pad_num])
    tensor = paddle.transpose(tensor, [1, 0])
    tensor = paddle.unsqueeze(tensor, axis=0)
    tensor = pad1d(tensor)
    tensor = paddle.squeeze(tensor, axis=0)
    tensor = paddle.transpose(tensor, [1, 0])

    sps = paddle.split(tensor, num_or_sections=90, axis=0)
    for i, s in enumerate(sps):
        file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
        np.save(file_name, s.detach().numpy())
    pass


W0107 21:28:29.299958   141 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 21:28:29.305644   141 device_context.cc:465] device: 0, cuDNN Version: 7.6.
!rm /home/aistudio/data/Features_competition_train/*.pkl

After execution, in data / features_ competition_ numpy data for training is generated in the train / NPY directory.

import glob
import pickle
import json

import numpy as np
import paddle

file_list = glob.glob("/home/aistudio/data/Features_competition_test_A/*.pkl")

max_frames = 9000

npy_path = ("/home/aistudio/data/Features_competition_test_A/npy/")
if not osp.exists(npy_path):
    os.makedirs(npy_path)

for f in file_list:
    video_feat = pickle.load(open(f, 'rb'))
    tensor = paddle.to_tensor(video_feat['image_feature'])
    pad_num = 9000 - tensor.shape[0]
    pad1d = paddle.nn.Pad1D([0, pad_num])
    tensor = paddle.transpose(tensor, [1, 0])
    tensor = paddle.unsqueeze(tensor, axis=0)
    tensor = pad1d(tensor)
    tensor = paddle.squeeze(tensor, axis=0)
    tensor = paddle.transpose(tensor, [1, 0])

    sps = paddle.split(tensor, num_or_sections=90, axis=0)
    for i, s in enumerate(sps):
        file_name = osp.join(npy_path, f.split('/')[-1].split('.')[0] + f"_{i}.npy")
        np.save(file_name, s.detach().numpy())
    pass

Training model

After the data set is segmented, you can start training the model and use the following commands to train the model. First, you need to install the dependent package of PaddleVideo.

%cd /home/aistudio/PaddleVideo/
!pip install -r requirements.txt

Start training the model.

%cd /home/aistudio/PaddleVideo/
!python main.py -c configs/localization/bmn.yaml
/home/aistudio/PaddleVideo
[01/07 21:42:50] DALI is not installed, you can improve performance if use DALI
[01/07 21:42:50] [35mDATASET[0m : 
[01/07 21:42:50]     [35mbatch_size[0m : [92m16[0m
[01/07 21:42:50]     [35mnum_workers[0m : [92m8[0m
[01/07 21:42:50]     [35mtest[0m : 
[01/07 21:42:50]         [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50]         [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50]         [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50]         [35mtest_mode[0m : [92mTrue[0m
[01/07 21:42:50]     [35mtest_batch_size[0m : [92m1[0m
[01/07 21:42:50]     [35mtrain[0m : 
[01/07 21:42:50]         [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50]         [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50]         [35msubset[0m : [92mtrain[0m
[01/07 21:42:50]     [35mvalid[0m : 
[01/07 21:42:50]         [35mfile_path[0m : [92m/home/aistudio/data/new_label_cls14_train.json[0m
[01/07 21:42:50]         [35mformat[0m : [92mBMNDataset[0m
[01/07 21:42:50]         [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mINFERENCE[0m : 
[01/07 21:42:50]     [35mdscale[0m : [92m100[0m
[01/07 21:42:50]     [35mfeat_dim[0m : [92m2048[0m
[01/07 21:42:50]     [35mname[0m : [92mBMN_Inference_helper[0m
[01/07 21:42:50]     [35mresult_path[0m : [92mdata/bmn/BMN_INFERENCE_results[0m
[01/07 21:42:50]     [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mMETRIC[0m : 
[01/07 21:42:50]     [35mdscale[0m : [92m100[0m
[01/07 21:42:50]     [35mfile_path[0m : [92mdata/bmn_data/activitynet_1.3_annotations.json[0m
[01/07 21:42:50]     [35mground_truth_filename[0m : [92mdata/bmn_data/activity_net_1_3_new.json[0m
[01/07 21:42:50]     [35mname[0m : [92mBMNMetric[0m
[01/07 21:42:50]     [35moutput_path[0m : [92mdata/bmn/BMN_Test_output[0m
[01/07 21:42:50]     [35mresult_path[0m : [92mdata/bmn/BMN_Test_results[0m
[01/07 21:42:50]     [35msubset[0m : [92mvalidation[0m
[01/07 21:42:50]     [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mMODEL[0m : 
[01/07 21:42:50]     [35mbackbone[0m : 
[01/07 21:42:50]         [35mdscale[0m : [92m100[0m
[01/07 21:42:50]         [35mfeat_dim[0m : [92m2048[0m
[01/07 21:42:50]         [35mname[0m : [92mBMN[0m
[01/07 21:42:50]         [35mnum_sample[0m : [92m32[0m
[01/07 21:42:50]         [35mnum_sample_perbin[0m : [92m3[0m
[01/07 21:42:50]         [35mprop_boundary_ratio[0m : [92m0.5[0m
[01/07 21:42:50]         [35mtscale[0m : [92m100[0m
[01/07 21:42:50]     [35mframework[0m : [92mBMNLocalizer[0m
[01/07 21:42:50]     [35mloss[0m : 
[01/07 21:42:50]         [35mdscale[0m : [92m100[0m
[01/07 21:42:50]         [35mname[0m : [92mBMNLoss[0m
[01/07 21:42:50]         [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mOPTIMIZER[0m : 
[01/07 21:42:50]     [35mlearning_rate[0m : 
[01/07 21:42:50]         [35mboundaries[0m : [92m[39000][0m
[01/07 21:42:50]         [35miter_step[0m : [92mTrue[0m
[01/07 21:42:50]         [35mname[0m : [92mCustomPiecewiseDecay[0m
[01/07 21:42:50]         [35mvalues[0m : [92m[0.001, 0.0001][0m
[01/07 21:42:50]     [35mname[0m : [92mAdam[0m
[01/07 21:42:50]     [35mweight_decay[0m : 
[01/07 21:42:50]         [35mname[0m : [92mL2[0m
[01/07 21:42:50]         [35mvalue[0m : [92m0.0001[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mPIPELINE[0m : 
[01/07 21:42:50]     [35mtest[0m : 
[01/07 21:42:50]         [35mload_feat[0m : 
[01/07 21:42:50]             [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50]             [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50]         [35mtransform[0m : 
[01/07 21:42:50]             [35mGetMatchMap[0m : 
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50]             [35mGetVideoLabel[0m : 
[01/07 21:42:50]                 [35mdscale[0m : [92m100[0m
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50]     [35mtrain[0m : 
[01/07 21:42:50]         [35mload_feat[0m : 
[01/07 21:42:50]             [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50]             [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50]         [35mtransform[0m : 
[01/07 21:42:50]             [35mGetMatchMap[0m : 
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50]             [35mGetVideoLabel[0m : 
[01/07 21:42:50]                 [35mdscale[0m : [92m100[0m
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50]     [35mvalid[0m : 
[01/07 21:42:50]         [35mload_feat[0m : 
[01/07 21:42:50]             [35mfeat_path[0m : [92m/home/aistudio/data/Features_competition_train/npy[0m
[01/07 21:42:50]             [35mname[0m : [92mLoadFeat[0m
[01/07 21:42:50]         [35mtransform[0m : 
[01/07 21:42:50]             [35mGetMatchMap[0m : 
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50]             [35mGetVideoLabel[0m : 
[01/07 21:42:50]                 [35mdscale[0m : [92m100[0m
[01/07 21:42:50]                 [35mtscale[0m : [92m100[0m
[01/07 21:42:50] ------------------------------------------------------------
[01/07 21:42:50] [35mepochs[0m : [92m100[0m
[01/07 21:42:50] [35mlog_level[0m : [92mINFO[0m
[01/07 21:42:50] [35mmodel_name[0m : [92mBMN[0m
[01/07 21:42:50] [35mresume_from[0m : [92m[0m
W0107 21:42:50.985046  3073 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 21:42:50.990319  3073 device_context.cc:465] device: 0, cuDNN Version: 7.6.
[01/07 21:42:57] train subset video numbers: 12597
[01/07 21:43:03] [35mepoch:[  1/100][0m [95mtrain step:0   [0m [92mloss: 2.58411 lr: 0.001000[0m [92mbatch_cost: 5.10899 sec,[0m [92mreader_cost: 2.53376 sec,[0m ips: 3.13173 instance/sec.
[01/07 21:43:27] epoch:[  1/100] [95mtrain step:10  [0m [92mloss: 2.23687 lr: 0.001000[0m [92mbatch_cost: 2.50506 sec,[0m [92mreader_cost: 0.00167 sec,[0m ips: 6.38707 instance/sec.
[01/07 21:43:53] epoch:[  1/100] [95mtrain step:20  [0m [92mloss: 2.30660 lr: 0.001000[0m [92mbatch_cost: 2.50880 sec,[0m [92mreader_cost: 0.00028 sec,[0m ips: 6.37755 instance/sec.
[01/07 21:44:18] epoch:[  1/100] [95mtrain step:30  [0m [92mloss: 2.01538 lr: 0.001000[0m [92mbatch_cost: 2.52706 sec,[0m [92mreader_cost: 0.00153 sec,[0m ips: 6.33146 instance/sec.
[01/07 21:44:43] epoch:[  1/100] [95mtrain step:40  [0m [92mloss: 2.03807 lr: 0.001000[0m [92mbatch_cost: 2.52628 sec,[0m [92mreader_cost: 0.00032 sec,[0m ips: 6.33342 instance/sec.
[01/07 21:45:08] epoch:[  1/100] [95mtrain step:50  [0m [92mloss: 1.40200 lr: 0.001000[0m [92mbatch_cost: 2.54893 sec,[0m [92mreader_cost: 0.00157 sec,[0m ips: 6.27714 instance/sec.
^C
[01/07 21:45:13] main proc 3139 exit, kill process group 3073
[01/07 21:45:13] main proc 3138 exit, kill process group 3073
[01/07 21:45:13] main proc 3140 exit, kill process group 3073
[01/07 21:45:13] main proc 3141 exit, kill process group 3073
[01/07 21:45:13] main proc 3135 exit, kill process group 3073
[01/07 21:45:13] main proc 3142 exit, kill process group 3073
[01/07 21:45:13] main proc 3137 exit, kill process group 3073
[01/07 21:45:13] main proc 3136 exit, kill process group 3073

In order to demonstrate that after training an epoch, stop training and export the model. In practice, multiple epochs can be trained to improve the accuracy of the model.

Model export

Export the trained model for reasoning and prediction, and execute the following script.

%cd /home/aistudio/PaddleVideo/
!python tools/export_model.py -c configs/localization/bmn.yaml -p output/BMN/BMN_epoch_00001.pdparams -o inference/BMN
/home/aistudio/PaddleVideo
Building model(BMN)...
W0107 23:10:26.288929  9431 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.1, Runtime API Version: 10.1
W0107 23:10:26.295006  9431 device_context.cc:465] device: 0, cuDNN Version: 7.6.
Loading params from (output/BMN/BMN_epoch_00001.pdparams)...
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/layers/utils.py:77: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
  return (isinstance(seq, collections.Sequence) and
model (BMN) has been already saved in (inference/BMN).

Inference prediction

Use the exported model for reasoning and prediction, and execute the following command.

%cd /home/aistudio/PaddleVideo/
!python tools/predict.py --input_file /home/aistudio/data/Features_competition_test_A/npy \
 --config configs/localization/bmn.yaml \
 --model_file inference/BMN/BMN.pdmodel \
 --params_file inference/BMN/BMN.pdiparams \
 --use_gpu=True \
 --use_tensorrt=False

The json files output by the above program are the prediction results after segmentation, and these files need to be combined together. Execute the following script:

import os
import json
import glob

json_path = "/home/aistudio/data/Features_competition_test_A/npy"
json_files = glob.glob(os.path.join(json_path, '*_*.json'))

submit_dic = {"version": None,
              "results": {},
              "external_data": {}
              }
results = submit_dic['results']
for json_file in json_files:
    j = json.load(open(json_file, 'r'))
    old_video_name = list(j.keys())[0]
    video_name = list(j.keys())[0].split('/')[-1].split('.')[0]
    video_name, video_no = video_name.split('_')
    start_id = int(video_no) * 4
    if len(j[old_video_name]) == 0:
        continue
    for i, top in enumerate(j[old_video_name]):
        if video_name in results.keys():
            results[video_name].append({'score': round(top['score'], 2),
                                        'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]})
        else:
            results[video_name] = [{'score':round(top['score'], 2),
                                        'segment': [round(top['segment'][0] + start_id, 2), round(top['segment'][1] + start_id, 2)]}]

json.dump(submit_dic, open('/home/aistudio/submission.json', 'w', encoding='utf-8'))

Finally, a submission will be generated in the user directory JSON file, compressed, downloaded and submitted.

%cd /home/aistudio/
!zip submission.zip submission.json
/home/aistudio
updating: submission.json (deflated 91%)

Only trained one epoch and scored 38 points. The score of this scheme is not high. It just provides you with an idea that you can run through the program and data preprocessing. You can try to put forward a better data processing scheme and obtain better results.

Optimization ideas

  1. The number of epoch s trained can be increased.
  2. Learning rate strategies can be adjusted, such as warmup and cosine annealing.
  3. I think the most important thing is data preprocessing. This scheme is simply divided every 4 seconds. In fact, it is unreasonable. It may be possible to divide an action into two files. Can refer to FootballAciton The partition method is used to further optimize the training data.

Finally, I wish you all good results.

Welcome to my official account: AI research institute.
Get the latest competition Baseline. You can reply to the competition name or website in the background. I will try my best to provide you with the Baseline.

Keywords: AI Deep Learning paddlepaddle

Added by bealers on Tue, 11 Jan 2022 04:26:58 +0200