Monthly production of 10000 Chinese medicine popular science short video methods, Python Programming AI tutorial

You're right. I don't know if other programming languages and tools can do 10000 popular science short videos a month, but it can be done with the MoviePy module of Python.

In 19 years, when I was still in school, the leader arranged for me to use pr to make a popular science short video of traditional Chinese medicine. I really couldn't get that thing. Later, I grabbed the traditional Chinese medicine data of traditional Chinese medicine website with a crawler and found that the data were in a fixed format. Since it was a repetitive work, I should be able to mass produce it with Python script.

Let's take a look at a case of finished products. The production takes about 5 minutes. The idea can be expanded. Videos similar to this popular science can be mass produced. If you are interested, look down at the implementation method.

YYSD can produce 10000 short videos per month in Python for correct posture of traditional Chinese Medicine

Software, hardware and skill requirements

The CPU should be above I7-8750, or the overall production will be very slow
Python version 3.6 or above
The Moviepy module does not support GPU temporarily, so the quality of the graphics card is ignored
PyEcharta module map making and image rendering, because the video needs to be used
Routine data cleaning and processing operations, otherwise the video content cannot be done
Need to be able to use PPT to make materials, pictures and videos
You need to be able to write the basis of the crawler. The Scrapy framework is not available
You need to be able to operate the Moviepy module. If not, please see the corresponding introduction and operation methods in my column
1-N mobile phone numbers are required to apply for Baidu AI's free API
It requires patience and fine tuning in many places

Data acquisition

The content manuscript of the video naturally needs data. The data of this video is public and can be found in the drug intelligence data.

Here's the data. The specific capture method is very simple. Python can write a crawler script. It takes an hour as a whole. The last data is so long that it is found that there are 13000 + kinds of traditional Chinese medicine in our country, which is really eye opening.

The red font part is the name of the province and city where the medicinal materials are grown and the website page of the corresponding data access extracted through the excel form. The part without data is filled in automatically after grabbing the blank part of the original data, which will be used later.

Basic material preparation

Video cover material, fixed template video cover. This thing can be done with PPT. It's very simple. Then save the created image as jpg image.

Watermark production, using a picture factory to make one is also relatively simple. Remember to save it in png format.

Other general material pictures, such as prescriptions and decoction pictures, are almost the same after all.

These picture materials are put into material_jpg\base directory.

Video template, which can also be done with PPT, and then saved into a circular video, which can be cut with code later. Then cut the template of this video into two parts: transition and part. If you need a trailer video, just make one yourself.

Text to speech API, just take a mobile phone number and apply for Baidu AI interface service. If you are not afraid of expensive, you can use iFLYTEK. Choose by yourself, and the code will be fine tuned according to the interface. I won't read the API documents carefully. If I don't have patience, I don't need to read them later.

Process and code

Understanding the production process of business processing is helpful to understand the code, or it is easy to understand the code.

Let's take a look at the overall project catalogue first.

Overall project execution document

TCM_MakeMovie.py

# coding:utf-8
__author__ = 'Mr.Data Yang'
__explain__ = 'File directory description:' \
              'data: Used to store the generated content in the video excel Tabular data, and AI Matting log file' \
              'font: Used to store font files' \
              'material_base: It is used to store video material, including the beginning, end, middle and end of the film MP4' \
              'material_everypart: No voice generated according to different content storage algorithms part Material and cover' \
              'material_jpg: It is used to store watermarks, covers and pictures generated according to rules with different contents for video' \
              'material_mp3: Used to store Baidu AI Generated MP3 file' \
              'material_result: It is used to store the result file generated by the final video. If the same content is generated, the source file needs to be deleted' \
              'material_video: After storing the synthetic speech generated by the algorithm according to different contents part Material and cover, total synthesis results' \
              'Backup script: this item Debug Process of' \
              '' \
              'instructions:' \
              '1.stay material_jpg The folder whose name corresponds to the content created in pic_name Name of' \
              '2.Collect pictures of corresponding contents on the Internet and change their names pic_name.jpg format' \
              '3.Brainless startup script, etc material_result Result' \
              '4.At line 423 of code, according to material_jpg of base Under folder fuyong,zhongzhi Randomly switch pictures to generate different contents, and the materials can be done by themselves' \
              '5.Watermark basis material_jpg of base Lower logo.png Replace' \
              '6.Cover according to material_jpg of base Lower cover.pptx Operation generation base.jpg Replace'

# Load the third-party installation package used
import pandas as pd

# Load custom py method
from configs import *  # Engineering configuration data
from ready_work import *  # Preparation for project start-up
from process_images import *  # Processing image data required by the project
from process_audio import *  # Process audio data required by the project
from process_video import *  # Processing video data required by the project

if __name__ == '__main__':
    # Read basic Excel data and basic information
    df = pd.read_excel("data/data.xlsx")
    # Select the num data in Excel to make, and the starting data is 2
    num = 3
    # Set index number
    row = num - 1
    # Extract the row data to be made and reset the index
    df = df.loc[row:row, :]
    df.reset_index(drop=True, inplace=True)
    # Extract the name of the subject medicine of this production
    pic_name = df["Medicinal material name"][0].replace("?", "")
    print("Content of this production:", pic_name)

   # Load configuration file
    # Extract the font path used font_path
    # Chinese and English corresponding data column name dictionary column_en2cn_dict\column_cn2en_dict
    font_path, column_en2cn_dict, column_cn2en_dict = config()
    # Renaming columns is easy in English
    df = df.rename(columns=column_cn2en_dict)
    # Delete useless fields of data
    df.drop(["_id"], inplace=True, axis=1)
    df.drop(["url"], inplace=True, axis=1)
    # Replace the useless parts found in the data with replace, and the discovery is the addition
    df["QYFB"][0] = df["QYFB"][0]. \
        replace(" ", ""). \
        replace("ecological environment", ""). \
        replace("resource distribution ", "")
    # Fill in parts without data
    df = df.fillna("No data")
    # Create the production directory that will be used in the word video, and the corresponding directory is pic_name is the top-level directory
    # 1. Material directory of each part of a single video_ everypart
    # 2. Synthesize the directory of each part of the material and the directory of material combined with all the materials_ video
    # 3. Catalogue of picture materials used in a single video_ jpg
    # 4. Catalogue of audio materials used in a single video_ mp3
    MakeMaterialDir(pic_name)
    # Avoid making mistakes in material data repeatedly, and empty the original old data every time
    # 1. Clear the material directory of each part of a single video_ everypart
    # 2. Clear the material directory of each part of the synthesis_ video
    # 3. Clear the audio material directory used by a single video_ mp3
    CleanFiles(pic_name)
    # If the picture needs to be replaced from Baidu Encyclopedia
    # Try to crawl Wikipedia again in the future
    RequestGetImage(pic_name)
    # The captured pictures of medicinal materials are automatically removed from the background by the algorithm
    CutoutJPG(pic_name)
    # Use the basic background picture to synthesize the matting image and synthesize the picture to the cover
    CompositeCoverJPG(pic_name)
    # Audio file data processing
    # Use the API interface to generate the audio file corresponding to the caption and save it to material_ Under the corresponding directory of MP3
    ChangeWordsToMp3(df)
    time_name_dict = Mp3Info(df)
    # Body 1-6
    try:
        FirstPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    try:
        SecondPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    try:
        ThirdPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    try:
        FourthPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    try:
        FifthPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    try:
        SixthPart(pic_name, df, time_name_dict, column_en2cn_dict)
    except:
        pass
    # Composite cover MP4 file
    MakeCoverMp4(pic_name)
    # Splicing video to synthesize background music
    StitchingVideo(pic_name)

Several important py files.