CRF + +: a CRF Toolkit

reference resources:

train

Mode 1:

% crf_learn template_file train_file model_file

Where template_file and train_file files that need to be prepared in advance. crf_learn in model_file to generate the trained model file.

The training output results are as follows:

CRF++: Yet Another CRF Tool Kit
Copyright(C) 2005 Taku Kudo, All rights reserved.

reading training data: 100.. 200.. 300.. 400.. 500.. 600.. 700.. 800.. 
Done! 1.94 s

Number of sentences: 823
Number of features:  1075862
Number of thread(s): 1
Freq:                1
eta:                 0.00010
C:                   1.00000
shrinking size:      20
Algorithm:           CRF

iter=0 terr=0.99103 serr=1.00000 obj=54318.36623 diff=1.00000
iter=1 terr=0.35260 serr=0.98177 obj=44996.53537 diff=0.17161
iter=2 terr=0.35260 serr=0.98177 obj=21032.70195 diff=0.53257
iter=3 terr=0.23879 serr=0.94532 obj=13642.32067 diff=0.35138
iter=4 terr=0.15324 serr=0.88700 obj=8985.70071 diff=0.34134
iter=5 terr=0.11605 serr=0.80680 obj=7118.89846 diff=0.20775
iter=6 terr=0.09305 serr=0.72175 obj=5531.31015 diff=0.22301
iter=7 terr=0.08132 serr=0.68408 obj=4618.24644 diff=0.16507
iter=8 terr=0.06228 serr=0.59174 obj=3742.93171 diff=0.18953

iter: number of iterations
Terr: error rate of tags (# of error tags/# of all tag)
serr: error rate of sentences (# of error sentences/# of all sentences)
obj: current object value, |||||^ 2. When this value converges to a fixed point, CRF + + stops the iteration.
diff: relative difference from the previous object value, i.e. (4618.24644-3742.93171) / 4618.24644 = 0.18953

There are four main parameters to control the training conditions:

-a CRF-L2 or CRF-L1: change the regularization algorithm. The default setting is L2. Generally speaking, L2 performs slightly better than L1, and the number of non-zero features in L1 is much smaller than L2
-c float: this option can change the super parameter of CRF. With larger C values, CRF tends to over fit a given training corpus. This parameter makes a trade-off between over fitting and under fitting, which will significantly affect the results. You can find the best value using retained data or more general model selection methods, such as cross validation
-f NUM: this parameter sets the cutoff threshold of the feature. CRF + + uses features that appear no less than NUM times in a given training data. The default value is 1. When you apply CRF + + to big data, the number of unique features will reach millions. This option is useful in this case
-p NUM: NUM is the number of threads. If your PC has multiple CPU s, you can speed up your training by using multithreading.

Mode 2:

% crf_learn -f 3 -c 1.5 template_file train_file model_file

Starting from version 0.45, CRF + + supports single best MIRA training. MIRA training is used when the - a MIRA option is set.

% crf_learn -a MIRA template train.data model
CRF++: Yet Another CRF Tool Kit
Copyright(C) 2005 Taku Kudo, All rights reserved.

reading training data: 100.. 200.. 300.. 400.. 500.. 600.. 700.. 800.. 
Done! 1.92 s

Number of sentences: 823
Number of features:  1075862
Number of thread(s): 1
Freq:                1
eta:                 0.00010
C:                   1.00000
shrinking size:      20
Algorithm:           MIRA

iter=0 terr=0.11381 serr=0.74605 act=823 uact=0 obj=24.13498 kkt=28.00000
iter=1 terr=0.04710 serr=0.49818 act=823 uact=0 obj=35.42289 kkt=7.60929
iter=2 terr=0.02352 serr=0.30741 act=823 uact=0 obj=41.86775 kkt=5.74464
iter=3 terr=0.01836 serr=0.25881 act=823 uact=0 obj=47.29565 kkt=6.64895
iter=4 terr=0.01106 serr=0.17011 act=823 uact=0 obj=50.68792 kkt=3.81902
iter=5 terr=0.00610 serr=0.10085 act=823 uact=0 obj=52.58096 kkt=3.98915

Parameters:

act: the number of active examples in the working set
uact: the two parameters reach the upper limit of soft boundary C. 0 uact indicates that the given training data is linearly separable
kkt: maximum kkt violation value. When it reaches 0.0, MIRA training ends

There are some parameters that can control MIRA training conditions:

-c float: change the soft boundary parameter, which is similar to the soft boundary parameter C in support vector machine. The definition is basically consistent with the - C option in CRF training. When a given corpus is over trained, the mic tends to be over fitted
-f NUM:
Same as CRF
-H NUM: change the reduced size. When a training statement is not used to update the parameter vector NUM times, we can think that the instance will no longer contribute to the training. MIRA attempts to delete such instances. This process is called "contraction". When a smaller NUM is set, contraction occurs early, which greatly reduces the training time. However, it is not recommended that NUM be too small. After the training, MIRA tries to traverse all training examples again to see if all KKT conditions are really met. NUM too small will increase the chance of recheck.
Changes shrinking size. When a training sentence is not used in updating parameter vector NUM times, we can consider that the instance doesn't contribute training any more. MIRA tries to remove such instances. The process is called "shrinking". When setting smaller NUM, shrinking occurs in early stage, which drastically reduces training time. However, too small NUM is not recommended. When training finishes, MIRA tries to go through all training examples again to know whether or not all KKT conditions are really satisfied. Too small NUM would increase the chances of recheck.

test

% crf_test -m model_file test_files ...

Output:

% crf_test -m model test.data
Rockwell        NNP     B       B
International   NNP     I       I
Corp.   NNP     I       I
's      POS     B       B
Tulsa   NNP     I       I
unit    NN      I       I
..

The last column gives the (estimated) label. If column 3 is true answer tag, you can evaluate the accuracy by simply looking at the difference between columns 3 and 4.

Level of detail:

-The v option sets the level of detail. The default value is 0. By increasing the level, you can get additional information from CRF + +

level 1:

You can also set marginal probabilities for each label (a confidence measure for each output label) and the conditional likelihood of the output (a confidence measure for the entire output)

% crf_test -v1 -m model test.data| head
# 0.478113
Rockwell        NNP     B       B/0.992465
International   NNP     I       I/0.979089
Corp.   NNP     I       I/0.954883
's      POS     B       B/0.986396
Tulsa   NNP     I       I/0.991966
...

level 2:

% crf_test -v2 -m model test.data
# 0.478113
Rockwell        NNP     B       B/0.992465      B/0.992465      I/0.00144946    O/0.00608594
International   NNP     I       I/0.979089      B/0.0105273     I/0.979089      O/0.0103833
Corp.   NNP     I       I/0.954883      B/0.00477976    I/0.954883      O/0.040337
's      POS     B       B/0.986396      B/0.986396      I/0.00655976    O/0.00704426
Tulsa   NNP     I       I/0.991966      B/0.00787494    I/0.991966      O/0.00015949
unit    NN      I       I/0.996169      B/0.00283111    I/0.996169      O/0.000999975

Keywords: Machine Learning NLP

Added by harshilshah on Wed, 02 Feb 2022 03:41:09 +0200

Programming VIP