CMeKG code interpretation

The core content of this article is to learn and interpret the python code of CMeKG for discussion and reference and common progress.

CMeKG (Chinese Medical Knowledge Graph) is a Chinese medical knowledge map developed by man-machine combination based on large-scale medical text data using natural language processing and text mining technology.

Project source: Chinese medical knowledge atlas CMeKG2 Release of version 0 - natural language processing laboratory, Institute of Computational Linguistics of Peking University, natural language processing laboratory of Zhengzhou University and intelligent medical research group of artificial intelligence research center of Pengcheng laboratory jointly released Chinese medical knowledge atlas CMeKG2 Version 0 http://cmekg.pcl.ac.cn/ Welcome to try and put forward your valuable opinions! CMeKG (Chinese Medical Knowledge Graph) is a Chinese medical knowledge map developed by man-machine combination based on large-scale medical text data using natural language processing and text mining technology. The construction of CMeKG refers to the authoritative international medical standards such as ICD, ATC, SNOMED and MeSH, as well as the large-scale and multi-sourcehttp://www5.zzu.edu.cn/nlp/info/1018/1785.htm Project achievement display:

CMeKG Chinese medical knowledge atlashttp://cmekg.pcl.ac.cn/ Project source code acquisition:

https://github.com/king-yyf/CMeKG_toolshttps://github.com/king-yyf/CMeKG_toolsmedical_re.py

The first class is the config class

class config:
    batch_size = 32
    max_seq_len = 256
    num_p = 23
    learning_rate = 1e-5
    EPOCH = 2

    PATH_SCHEMA = "/Users/yangyf/workplace/model/medical_re/predicate.json"
    PATH_TRAIN = '/Users/yangyf/workplace/model/medical_re/train_data.json'
    PATH_BERT = "/Users/yangyf/workplace/model/medical_re/"
    PATH_MODEL = "/Users/yangyf/workplace/model/medical_re/model_re.pkl"
    PATH_SAVE = '/content/model_re.pkl'
    tokenizer = BertTokenizer.from_pretrained("/Users/yangyf/workplace/model/medical_re/" + 'vocab.txt')

    id2predicate = {}
    predicate2id = {}

Some basic variables are defined in this class. For example, the data size of a batch is 32, the maximum word length is 256, and so on. At the same time, it also defines some file paths, which need to be modified according to the location of files in your computer.

tokenizer = BerTokenizer. from_ Pre trained is a method in transformers. The workflow of this method is that it will first judge "from"_ Parameter of the pre trained , function. If it is pre trained_ MODEL_ ARCHIVE_ If map # already exists, you will find it in the cache; If not, it will judge whether it is a path, and will find the required files, a config file and a bin file under this path.

 PRETRAINED_MODEL_ARCHIVE_MAP = {
        'bert-base-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz",
        'bert-large-uncased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-large-uncased.tar.gz",
        'bert-base-cased': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-cased.tar.gz",
        'bert-base-multilingual': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-multilingual.tar.gz",
        'bert-base-chinese': "https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-chinese.tar.gz",
    }
"""
    Params:
    pretrained_model_name: either:
    - a str with the name of a pre-trained model to load selected in the list of:
    . `bert-base-uncased`
    . `bert-large-uncased`
    . `bert-base-cased`
    . `bert-base-multilingual`
    . `bert-base-chinese`
    - a path or url to a pretrained model archive containing:
    . `bert_config.json` a configuration file for the model
    . `pytorch_model.bin` a PyTorch dump of a BertForPreTraining instance
    *inputs, **kwargs: additional input for the specific Bert class
    (ex: num_labels for BertForSequenceClassification)
"""

Keywords: Python github

Added by mrodrigues on Sun, 06 Feb 2022 05:04:25 +0200