[github excellent AI project] realize 4K60 frame video human body real-time matting

Project address:

https://github.com/PeterL1n/RobustVideoMatting

article:

Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Powerful video matting function in PyTorch, TensorFlow and TensorFlow. js，ONNX，CoreML!

Stable video matting (RVM)

Thesis Robust High-Resolution Video Matting with Temporal Guidance The official GitHub library. RVM is designed for stable character video matting. Unlike the existing neural network, which processes each frame as a separate picture, RVM uses cyclic neural network to have time memory when processing video streams. RVM can do real-time HD matting on any video. • 4K 76FPS and HD 104FPS on Nvidia GTX 1080Ti. This research project comes from Byte runout.

Show video

Watch the presentation video( YouTube, Bilibili ), understand model capabilities.

All materials in the video are available for download and can be used to test the model: Google Drive

Demo

Webpage : view the camera matting effect in the browser to display the cyclic memory value inside the model.
Colab : convert your video with our model.

download

It is recommended to use the model of MobileNetV3 in general. The ResNet50 model is much larger and the effect is slightly improved. Our model supports many frameworks. For details, please read Infer document.

frame	download	remarks
PyTorch	rvm_mobilenetv3.pth rvm_resnet50.pth	Official PyTorch model weights. file
TorchHub	No manual download is required.	Make it easier to use this model in your PyTorch project. file
TorchScript	rvm_mobilenetv3_fp32.torchscript rvm_mobilenetv3_fp16.torchscript rvm_resnet50_fp32.torchscript rvm_resnet50_fp16.torchscript	If you need to infer at the mobile end, you can consider deriving int8 quantitative model by yourself. file
ONNX	rvm_mobilenetv3_fp32.onnx rvm_mobilenetv3_fp16.onnx rvm_resnet50_fp32.onnx rvm_resnet50_fp16.onnx	Tested on the CPU and CUDA backend of ONNX Runtime. The provided model uses opset 12. file，export
TensorFlow	rvm_mobilenetv3_tf.zip rvm_resnet50_tf.zip	TensorFlow 2 SavedModel format. file
TensorFlow.js	rvm_mobilenetv3_tfjs_int8.zip	Run the model on the web page. Exhibition，Model code
CoreML	rvm_mobilenetv3_1280x720_s0.375_fp16.mlmodel rvm_mobilenetv3_1280x720_s0.375_int8.mlmodel rvm_mobilenetv3_1920x1080_s0.25_fp16.mlmodel rvm_mobilenetv3_1920x1080_s0.25_int8.mlmodel	CoreML can only export fixed resolutions, and other resolutions can be exported by themselves. Supports iOS 13 +. s ， represents the down sampling ratio. file，export

All models are available in Google Drive Or Baidu network disk (password: gym7).

PyTorch example

1 install Python Library:

pip install -r requirements_inference.txt

2. Loading model:

import torch
from model import MattingNetwork

model = MattingNetwork('mobilenetv3').eval().cuda()  # Or "resnet50"
model.load_state_dict(torch.load('rvm_mobilenetv3.pth'))

3 if you only need to do video matting, we provide a simple API:

from inference import convert_video

convert_video(
    model,                           # Model, which can be loaded into any device (cpu or cuda)
    input_source='input.mp4',        # Video files, or picture sequence folders
    output_type='video',             # Optional "video" or "png_sequence"
    output_composition='com.mp4',    # If the video is exported, provide the file path. If the PNG sequence is exported, provide the folder path
    output_alpha="pha.mp4",          # [optional] output transparency prediction
    output_foreground="fgr.mp4",     # [optional] output foreground forecast
    output_video_mbps=4,             # If the video is exported, the video bit rate is provided
    downsample_ratio=None,           # The down sampling ratio can be adjusted according to the specific video, or None can select automatic
    seq_chunk=12,                    # Set up multi frame parallel computing
)

4 or write your own inference logic:

from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from inference_utils import VideoReader, VideoWriter

reader = VideoReader('input.mp4', transform=ToTensor())
writer = VideoWriter('output.mp4', frame_rate=30)

bgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda()  # Green background
rec = [None] * 4                                       # Initial cyclic States
downsample_ratio = 0.25                                # Down sampling ratio, adjusted according to video

with torch.no_grad():
    for src in DataLoader(reader):                     # Input tensor, RGB channel, range 0 ~ 1
        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)  # Give the memory of the previous frame to the next frame
        com = fgr * pha + bgr * (1 - pha)              # Composes the foreground to a green background
        writer.write(com)                              # Output frame

5 models and API s can also be quickly loaded through TorchHub.

# Loading model
model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3") # Or "resnet50"

# Conversion API
convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter")

Infer document There is a pair of downsamples in the_ Explanation of ratio parameter, API use, and high-level use.

Training and evaluation

Please refer to Training documents (English).

speed

Speed information_ speed_ test. Py) measurement for reference.

GPU	dType	HD (1920x1080)	4K (3840x2160)
RTX 3090	FP16	172 FPS	154 FPS
RTX 2060 Super	FP16	134 FPS	108 FPS
GTX 1080 Ti	FP32	104 FPS	74 FPS

Note 1: HD downsample is used_ Ratio = 0.25, 4K downsample_ratio=0.125. All tests used batch size 1 and frame chunk 1.
Note 2: GPU before Turing architecture does not support FP16 reasoning, so GTX 1080 Ti uses FP32.
Note 3: we only measure tensor throughput. The provided video conversion script is much slower because it does not use hardware video encoding / decoding and does not complete tensor transmission on parallel threads. If you are interested in implementing hardware video encoding / decoding in Python, please refer to PyNvCodec.

Keywords: github AI

Added by cybercog on Tue, 18 Jan 2022 15:47:28 +0200

Programming VIP