[github excellent AI project] realize 4K60 frame video human body real-time matting

Project address:



Robust Video Matting in PyTorch, TensorFlow, TensorFlow.js, ONNX, CoreML!

Powerful video matting function in PyTorch, TensorFlow and TensorFlow. js,ONNX,CoreML!

Stable video matting (RVM)

Thesis Robust High-Resolution Video Matting with Temporal Guidance The official GitHub library. RVM is designed for stable character video matting. Unlike the existing neural network, which processes each frame as a separate picture, RVM uses cyclic neural network to have time memory when processing video streams. RVM can do real-time HD matting on any video. • 4K 76FPS and HD 104FPS on Nvidia GTX 1080Ti. This research project comes from Byte runout.

Show video

Watch the presentation video( YouTubeBilibili ), understand model capabilities.

All materials in the video are available for download and can be used to test the model: Google Drive


  • Webpage : view the camera matting effect in the browser to display the cyclic memory value inside the model.
  • Colab : convert your video with our model.


It is recommended to use the model of MobileNetV3 in general. The ResNet50 model is much larger and the effect is slightly improved. Our model supports many frameworks. For details, please read Infer document.

Official PyTorch model weights. file
TorchHubNo manual download is required.Make it easier to use this model in your PyTorch project. file
If you need to infer at the mobile end, you can consider deriving int8 quantitative model by yourself. file
Tested on the CPU and CUDA backend of ONNX Runtime. The provided model uses opset 12. fileexport
TensorFlow 2 SavedModel format. file
TensorFlow.jsrvm_mobilenetv3_tfjs_int8.zipRun the model on the web page. ExhibitionModel code
CoreML can only export fixed resolutions, and other resolutions can be exported by themselves. Supports iOS 13 +. s , represents the down sampling ratio. fileexport

All models are available in Google Drive Or Baidu network disk (password: gym7).

PyTorch example

  1. 1 install Python Library:
pip install -r requirements_inference.txt
  1. 2. Loading model:
import torch
from model import MattingNetwork

model = MattingNetwork('mobilenetv3').eval().cuda()  # Or "resnet50"
  1. 3 if you only need to do video matting, we provide a simple API:
from inference import convert_video

    model,                           # Model, which can be loaded into any device (cpu or cuda)
    input_source='input.mp4',        # Video files, or picture sequence folders
    output_type='video',             # Optional "video" or "png_sequence"
    output_composition='com.mp4',    # If the video is exported, provide the file path. If the PNG sequence is exported, provide the folder path
    output_alpha="pha.mp4",          # [optional] output transparency prediction
    output_foreground="fgr.mp4",     # [optional] output foreground forecast
    output_video_mbps=4,             # If the video is exported, the video bit rate is provided
    downsample_ratio=None,           # The down sampling ratio can be adjusted according to the specific video, or None can select automatic
    seq_chunk=12,                    # Set up multi frame parallel computing
  1. 4 or write your own inference logic:
from torch.utils.data import DataLoader
from torchvision.transforms import ToTensor
from inference_utils import VideoReader, VideoWriter

reader = VideoReader('input.mp4', transform=ToTensor())
writer = VideoWriter('output.mp4', frame_rate=30)

bgr = torch.tensor([.47, 1, .6]).view(3, 1, 1).cuda()  # Green background
rec = [None] * 4                                       # Initial cyclic States
downsample_ratio = 0.25                                # Down sampling ratio, adjusted according to video

with torch.no_grad():
    for src in DataLoader(reader):                     # Input tensor, RGB channel, range 0 ~ 1
        fgr, pha, *rec = model(src.cuda(), *rec, downsample_ratio)  # Give the memory of the previous frame to the next frame
        com = fgr * pha + bgr * (1 - pha)              # Composes the foreground to a green background
        writer.write(com)                              # Output frame
  1. 5 models and API s can also be quickly loaded through TorchHub.
# Loading model
model = torch.hub.load("PeterL1n/RobustVideoMatting", "mobilenetv3") # Or "resnet50"

# Conversion API
convert_video = torch.hub.load("PeterL1n/RobustVideoMatting", "converter")

Infer document There is a pair of downsamples in the_ Explanation of ratio parameter, API use, and high-level use.

Training and evaluation

Please refer to Training documents (English).


Speed information_ speed_ test. Py) measurement for reference.

GPUdTypeHD (1920x1080)4K (3840x2160)
RTX 3090FP16172 FPS154 FPS
RTX 2060 SuperFP16134 FPS108 FPS
GTX 1080 TiFP32104 FPS74 FPS
  • Note 1: HD downsample is used_ Ratio = 0.25, 4K downsample_ratio=0.125. All tests used batch size 1 and frame chunk 1.
  • Note 2: GPU before Turing architecture does not support FP16 reasoning, so GTX 1080 Ti uses FP32.
  • Note 3: we only measure tensor throughput. The provided video conversion script is much slower because it does not use hardware video encoding / decoding and does not complete tensor transmission on parallel threads. If you are interested in implementing hardware video encoding / decoding in Python, please refer to PyNvCodec.

Keywords: github AI

Added by cybercog on Tue, 18 Jan 2022 15:47:28 +0200