This article continues with the last part. This article continues with examples of multi-catalog data training models.
4. Multi-directory data training
This example is based on previous batch data training and repeated iteration training, and stores data files in multiple directories instead of just one directory to support distributed data acquisition, for example, by date, different date data in different directories; Or group by provider and store data from different people in different directories. Note: The names of multiple directories where data is stored also need to be numbered so that they can be processed uniformly.
(1) Create projects and import SDK s
Follow the instructions in the previous article for project creation and importing SDK s, as described in the previous article.
(2) Encoding
The whole code is still divided into four parts, that is, the initialization of neural network, the reading of training data, the training of network model, and the preservation of model. The two parts of data reading and model training are placed in a three-level loop, the inner loop is used to read all the data in a directory for training, and the middle loop is used to read multiple directories. Outer loops are used for iteration until all data does not cause the model to grow. The overall code is as follows:
void main(void) { FILE* pFile = NULL; char strFoldName[256] = { 0 }; char strFileName[256] = { 0 }; int nPieces = 0; //Number of pieces in training data (a game of chess) int narrSteps[1000] = { 0 }; //Information from the first to the last child in the game: X coordinate, Y coordinate, faller (black is -1, white is 1) int nFileIndexStart = 1; //Number corresponding to the name of the first data file int nFoldIndexStart = 1; //Number corresponding to the name of the first directory int* parrSteps = NULL; bool bAgain = false; //Indicates whether another iteration is to be repeated int nInterCount = 0; //Number of iterations InitWithoutModelFile(15, 15, 5); do { nInterCount++; bAgain = false; nFoldIndexStart = 1; bool bNext = true; //Need to continue with next directory sprintf(strFoldName, "D:/Data/%d", nFoldIndexStart++); while (bNext) { bNext = false; nFileIndexStart = 1; sprintf(strFileName, "%s/%d.txt", strFoldName, nFileIndexStart++); while ((pFile = fopen(strFileName, "r")) != NULL) { bNext = true; printf("%d: %s\n", nInterCount, strFileName); nPieces = 0; parrSteps = narrSteps; while (fscanf(pFile, "%d %d %d", parrSteps, parrSteps + 1, parrSteps + 2) == 3) { parrSteps += 3; nPieces++; } fclose(pFile); bAgain = bAgain || TrainNetwork(narrSteps, nPieces, 15, 15); sprintf(strFileName, "%s/%d.txt", strFoldName, nFileIndexStart++); } sprintf(strFoldName, "D:/Data/%d", nFoldIndexStart++); } } while (bAgain); SaveModel("D:/Model/model.mod"); }
First, the function InitWithoutModelFile(15,15,5) of SDK is invoked to initialize the neural network. This means that the Gobang on a 15x15 board is initialized from an empty model. If you want to do additional training on an existing model, the InitFromModelFile() function is invoked. If you want to train a six-link submodel, you only need to change the third parameter from 5 to 6.
Then, use a while loop to open and read the data files one by one, and use the read data to call the TrainNetwork() function to train the neural network model. Outside the while loop, a while loop is wrapped to increment the directories in turn, that is, directory by directory traversal, where the control variable bNext is false if the specified directory does not exist or is empty, stopping the directory traversal and entering the outer loop. Outside the while loop, a do loop is wrapped to iterate over the entire batch of data until it no longer grows on the model. Note: In iterations represented by a do loop, The value nFoldIndexStart corresponding to the directory name is initially reset to 1 (the starting directory name), and bAgain is reset to false to avoid the impact of the previous round; in the middle loop, the value nFileIndexStart corresponding to the file name is initially reset to 1 (the name of the starting file) and reset bNext to false to exit the layer loop when a non-existent or empty directory is encountered.
Finally, the SDK function SaveModel() is called to save the trained model after the model training is completed.
Overall, the code for this example does not differ much from the code for loop iteration training, except that a loop is added between the two layers to iterate through the directories. If you only want to use data files from several directories for one training session, you only need to remove the outermost do loop without iteration.
(3) Testing
So far, examples of multi-directory data training have been coded. We have split the 211 previously used training data into directories, with 50 files in each directory, as shown in the following figure:
After running the program, you can see the newly trained model file model in the Model directory of the D disk. Mod, you can use this model for testing in software like Neural Network Gobang (Home Edition) introduced in the previous article.
This example still takes a long time to run and the entire batch of data has been iterated 155 times, as shown in the following figure:
(4) Follow-up
The multi-catalog data training routines have been described. Keep an eye on the following articles for other routines.
The complete code for this example is as follows:
#include "stdio.h" #include "Inter.h" #pragma comment(lib, "AIWZQDll.lib") void main(void) { FILE* pFile = NULL; char strFoldName[256] = { 0 }; char strFileName[256] = { 0 }; int nPieces = 0; //Number of pieces in training data (a game of chess) int narrSteps[1000] = { 0 }; //Information from the first to the last child in the game: X coordinate, Y coordinate, faller (black is -1, white is 1) int nFileIndexStart = 1; //Number corresponding to the name of the first data file int nFoldIndexStart = 1; //Number corresponding to the name of the first directory int* parrSteps = NULL; bool bAgain = false; //Indicates whether another iteration is to be repeated int nInterCount = 0; //Number of iterations InitWithoutModelFile(15, 15, 5); do { nInterCount++; bAgain = false; nFoldIndexStart = 1; bool bNext = true; //Need to continue with next directory sprintf(strFoldName, "D:/Data/%d", nFoldIndexStart++); while (bNext) { bNext = false; nFileIndexStart = 1; sprintf(strFileName, "%s/%d.txt", strFoldName, nFileIndexStart++); while ((pFile = fopen(strFileName, "r")) != NULL) { bNext = true; printf("%d: %s\n", nInterCount, strFileName); nPieces = 0; parrSteps = narrSteps; while (fscanf(pFile, "%d %d %d", parrSteps, parrSteps + 1, parrSteps + 2) == 3) { parrSteps += 3; nPieces++; } fclose(pFile); bAgain = bAgain || TrainNetwork(narrSteps, nPieces, 15, 15); sprintf(strFileName, "%s/%d.txt", strFoldName, nFileIndexStart++); } sprintf(strFoldName, "D:/Data/%d", nFoldIndexStart++); } } while (bAgain); SaveModel("D:/Model/model.mod"); }