Spring Batch -- file reading and writing

stay Top down structure of Spring batch in Job,Step They are all framework level functions. Most of the time, they provide some configuration options for developers Item The Reader, Processor and Writer in belong to the business level. It opens some business entry interfaces. However, there are many common and consistent functions during file reading and writing. Spring Batch provides consistent implementation classes for these same functions.

Flat structure file

Flat structure file (also known as matrix structure file, hereinafter referred to as file) is the most common file type. It usually represents a record in a row, and the field data is divided in some way. The main difference from standard format data (xml, JSON, etc.) is that it has no structural description scheme (SXD, JSON-SCHEME) and no structural segmentation specification. Therefore, you need to set the field segmentation method before reading and writing such files.

There are usually two ways to split the field data of a file: using separators or fixed field length. The former usually uses symbols such as commas (,) to divide the field data, and the length of each column of field data in the latter is fixed. The framework provides a FieldSet for file reading, which is used to map the information in the file structure to an object. The function of FieldSet is to bind the data of the file to the field of the class (field is a common concept in Java. If you are unclear, you can understand java reflection).

data fetch

Spring Batch provides the FlatFileItemReader class for file reading, which provides basic functions for reading and converting data in files. There are two main functional interfaces in FlatFileItemReader: resource and LineMapper. Resource is used to obtain external files. Please check for details Spring core - resource management The following is an example:

Resource resource = new FileSystemResource("resources/trades.csv");

In a complex production environment, files are usually managed by a centralized or process based framework (such as EAI). Therefore, files often need to be obtained from other locations by means of FTP. How to migrate files is beyond the scope of the Spring Batch framework. You can refer to the Spring Integration project in the Spring system.

The following are the properties of FlatFileItemReader. Each property provides a Setter method.

Attribute name	Parameter type	explain
comments	String[]	Specifies the comment prefix in the file, which is used to filter comment content lines
encoding	String	Specifies the encoding method of the file. The default is charset defaultCharset()
lineMapper	LineMapper	Use the LineMapper interface to convert a line of string into an object
linesToSkip	int	The number of lines to skip the start of the file, which is used to skip the description line of some fields
recordSeparatorPolicy	RecordSeparatorPolicy	Used to judge whether the data ends
resource	Resource	Specify external resource file location
skippedLinesCallback	LineCallbackHandler	When linesToSkip is configured, every skip will be called back, and the skipped row data content will be passed in

Each attribute provides some functions for file parsing. The following is a description of the structure.

LineMapper

This interface is used to convert strings into objects:

public interface LineMapper { T mapLine(String line, int lineNumber) throws Exception; }

The basic processing logic of the interface is that the aggregation class (FlatFileItemReader) passes a line of string and line number to LineMapper::mapLine, and the method returns a mapped object after processing.

LineTokenizer

The function of this interface is to convert a row of data into a FieldSet structure. For Spring Batch, the mapping of flat structure files to Java entities is controlled by FieldSet, so the process of reading and writing files needs to complete the conversion from string to FieldSet:

public interface LineTokenizer { FieldSet tokenize(String line); }

The meaning of this interface is to pass a line of string data, and then obtain a FieldSet.

The framework provides three implementation classes for LineTokenizer:

DelimitedLineTokenizer: converts data to FieldSet using delimiters. The most common delimiter is comma. The class provides configuration and resolution methods for delimiters.
FixedLengthTokenizer: parses the FieldSet structure according to the length of the field. Field width must be defined for the record.
Pattern matching composite LineTokenizer: uses a matching mechanism to dynamically determine which LineTokenizer to use.

FieldSetMapper

This interface is used to convert FieldSet to object:

public interface FieldSetMapper { T mapFieldSet(FieldSet fieldSet) throws BindException; }

FieldSetMapper is usually used in conjunction with LineTokenizer: String - > fieldset - > object.

DefaultLineMapper

DefaultLineMapper is the implementation of LineMapper, which implements the mapping from files to Java entities:

public class DefaultLineMapper implements LineMapper<>, InitializingBean {
	private LineTokenizer tokenizer;
	private FieldSetMapper fieldSetMapper;
	public T mapLine(String line, int lineNumber) throws Exception {
		return fieldSetMapper.mapFieldSet(tokenizer.tokenize(line));
	}
	public void setLineTokenizer(LineTokenizer tokenizer) {
		this.tokenizer = tokenizer;
	}
	public void setFieldSetMapper(FieldSetMapper fieldSetMapper) {
		this.fieldSetMapper = fieldSetMapper;
	}
}

When parsing the file, the data is parsed by line:

Pass in a line of string.
The LineTokenizer parses the string into a FieldSet structure.
FieldSetMapper continues to parse into a Java entity object and returns it to the caller.

DefaultLineMapper is the default implementation class provided by the framework. It seems very simple, but it can expand many functions by using the composite mode.

Automatic data mapping

In the conversion process, if the names attribute of FieldSet is bound with the field of the target class, you can directly use reflection to realize data conversion. BeanWrapperFieldSetMapper is provided for this framework.

DefaultLineMapper<WeatherEntity> lineMapper = new DefaultLineMapper<>(); //Create LineMapper
 
DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(); //Create LineTokenizer
tokenizer.setNames(new String[] { "siteId", "month", "type", "value", "ext" }); //Set Field name
 
BeanWrapperFieldSetMapper<WeatherEntity> wrapperMapper 
	= new BeanWrapperFieldSetMapper<>(); //Create FieldSetMapper
wrapperMapper.setTargetType(WeatherEntity.class); //Set the entity. The field name of the entity must be the same as tokenizer Names are consistent.
 
// Combining lineMapper
lineMapper.setLineTokenizer(tokenizer);
lineMapper.setFieldSetMapper(wrapperMapper);

File read summary

Various interfaces and implementations mentioned above are actually introduced around the properties of FlatFileItemReader. Although there are many contents, they actually include the following points:

First, locate the file. Spring Batch provides Resource related location methods.
The second is to convert the line string data in the file into objects. The function of LineMapper is to complete this function.
The framework provides DefaultLineMapper as the default implementation method for LineMapper. LineTokenizer and FieldSetMapper need to be combined in DefaultLineMapper. The former converts the string to a Field, and the latter converts the Field to the target object.
LineTokenizer has three implementation classes available, and FieldSetMapper has a default implementation class BeanWrapperFieldSetMapper.

File read executable source code

The executable source code is in the items sub project at the following address:

You need to configure the database link before running. See readme in the source code library md.

The main logic of file reading is at org chenkui. spring. batch. sample. items. Flatfilereader class:

public class FlatFileReader {
    // The field name of the FeildSet. After setting the field name, you can directly use the name as the index to obtain data. You can also use the index location to get data
    public final static String[] Tokenizer = new String[] { "siteId", "month", "type", "value", "ext" };
    private boolean userWrapper = false;
 
    @Bean
    //Define FieldSetMapper for fieldset - > weatherentity
    public FieldSetMapper<WeatherEntity> fieldSetMapper() {
        return new FieldSetMapper<WeatherEntity>() {
            @Override
            public WeatherEntity mapFieldSet(FieldSet fieldSet) throws BindException {
                if (null == fieldSet) {
                    return null; // If fieldSet does not exist, skip the processing of this row
                } else {
                    WeatherEntity observe = new WeatherEntity();
                    observe.setSiteId(fieldSet.readRawString("siteId"));
                    //Setter
                    return observe;
                }
            }
        };
    }
 
    @Bean
    // Configure Reader
    public ItemReader<WeatherEntity> flatFileReader(
                           @Qualifier("fieldSetMapper") FieldSetMapper<WeatherEntity> fieldSetMapper) {
        FlatFileItemReader<WeatherEntity> reader = new FlatFileItemReader<>();
        reader.setResource(new FileSystemResource("src/main/resources/data.csv")); // Read resource file
        DefaultLineMapper<WeatherEntity> lineMapper = new DefaultLineMapper<>(); // Initialize the LineMapper implementation class
        DelimitedLineTokenizer tokenizer = new DelimitedLineTokenizer(); // Create LineTokenizer interface implementation
 
        tokenizer.setNames(Tokenizer); // Set the name of each field. If it is not set, you need to use the index to obtain the value
        lineMapper.setLineTokenizer(tokenizer); // Set tokenizer tool
 
        if (userWrapper) { //Use BeanWrapperFieldSetMapper to convert directly using reflection
            BeanWrapperFieldSetMapper<WeatherEntity> wrapperMapper = new BeanWrapperFieldSetMapper<>();
            wrapperMapper.setTargetType(WeatherEntity.class);
            fieldSetMapper = wrapperMapper;
        }
 
        lineMapper.setFieldSetMapper(fieldSetMapper);
        reader.setLineMapper(lineMapper);
        reader.setLinesToSkip(1); // The initial row skipped for filtering field rows
        reader.open(new ExecutionContext());
        return reader;
    }
}

Read file by field length

In addition to the separator, some files can extract data according to the placeholder length of field data. According to the procedure described above, you can actually modify the LineTokenizer interface. The framework provides the FixedLengthTokenizer class:

@Bean
public FixedLengthTokenizer fixedLengthTokenizer() {
    FixedLengthTokenizer tokenizer = new FixedLengthTokenizer();
 
    tokenizer.setNames("ISIN", "Quantity", "Price", "Customer");
    //Range is used to set the length of the data.
    tokenizer.setColumns(new Range(1-12),
                        new Range(13-15),
                        new Range(16-20),
                        new Range(21-29));
	return tokenizer;
}

Write flat structure file

Writing data to a file is the reverse of reading: converting an object to a string.

LineAggregator

Corresponding to LineMapper is LineAggregator, whose function is to convert entities into strings:

public interface LineAggregator<T> {
    public String aggregate(T item);
}

PassThroughLineAggregator

The framework provides a very simple implementation class for the LineAggregator interface - PassThroughLineAggregator. Its only implementation is to use the toString method of the object:

public class PassThroughLineAggregator<T> implements LineAggregator<T> {
    public String aggregate(T item) {
        return item.toString();
    }
}

DelimitedLineAggregator

Another implementation class of LineAggregator is DelimitedLineAggregator. Unlike PassThroughLineAggregator, which simply uses the toString method directly, DelimitedLineAggregator requires a conversion interface FieldExtractor:

DelimitedLineAggregator<CustomerCredit> lineAggregator = new DelimitedLineAggregator<>();
lineAggregator.setDelimiter(",");
lineAggregator.setFieldExtractor(fieldExtractor);

FieldExtractor

FieldExtractor is used to convert entity classes to collection structures. It can be compared with LineTokenizer. The former converts an entity class into flat data structure, and the latter converts a String into a FieldSet structure.

public interface FieldExtractor<T> {
    Object[] extract(T item);
}

The framework provides a reflection based implementation class BeanWrapperFieldExtractor for the FieldExtractor interface. Its process is to convert entity objects into lists:

BeanWrapperFieldExtractor<CustomerCredit> fieldExtractor = new BeanWrapperFieldExtractor<>();
fieldExtractor.setNames(new String[] {"field1", "field2"});

The setName method specifies the list of field s to convert.

Output file processing

The logic of file reading is very simple: open the file and write data, and throw an exception when the file does not exist. But writing to a file obviously can't be so simple and rough. When creating a JobInstance, the most intuitive operation is to throw an exception if a file with the same name exists, and create a file and write data if it does not exist. However, this obviously has a big problem. When a problem occurs during batch processing and restart is required, all data will not be processed from the beginning, but the file will exist and continue to be written. To ensure this process, the FlatFileItemWriter will delete the existing file by default when the new JobInstance is running, and continue to write at the end of the file when the new JobInstance is restarted. FlatFileItemWriter can use shouldDeleteIfExists, appendAllowed, shouldDeleteIfEmpty to control files.

File write executable source code

The main code written in the file is at org chenkui. spring. batch. sample. items. FlatFileWriter:

public class FlatFileWriter {
 
    private boolean useBuilder = true;
 
    @Bean
    public ItemWriter<MaxTemperatureEntiry> flatFileWriter() {
        BeanWrapperFieldExtractor<MaxTemperatureEntiry> fieldExtractor = new BeanWrapperFieldExtractor<>();
        fieldExtractor.setNames(new String[] { "siteId", "date", "temperature" }); //Set mapping field
        fieldExtractor.afterPropertiesSet(); //Parameter check
 
        DelimitedLineAggregator<MaxTemperatureEntiry> lineAggregator = new DelimitedLineAggregator<>();
        lineAggregator.setDelimiter(","); //Set output separator
        lineAggregator.setFieldExtractor(fieldExtractor); //Setting up the FieldExtractor processor
 
        FlatFileItemWriter<MaxTemperatureEntiry> fileWriter = new FlatFileItemWriter<>();
        fileWriter.setLineAggregator(lineAggregator);
        fileWriter.setResource(new FileSystemResource("src/main/resources/out-data.csv")); //Set output file location
        fileWriter.setName("outpufData");
 
        if (useBuilder) {//Using builder to create
            fileWriter = new FlatFileItemWriterBuilder<MaxTemperatureEntiry>().name("outpufData")
                .resource(new FileSystemResource("src/main/resources/out-data.csv")).lineAggregator(lineAggregator)
                .build();
        }
        return fileWriter;
    }
}

The writing process of the file is exactly opposite to the reading process: first convert the object into a collection structure (list) with FieldExtractor, and then convert the collection into a delimited string with lineAggregator.

Code description

The test data in the code comes from the data analysis exchange project bi-process-example , is NOAA's 2015 global weather monitoring data. In order to facilitate the source code storage, a large number of cuts have been made. There are millions of original data. If necessary, download them in the following ways:
```
	curl -O ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/by_year/2015.csv.gz # data file
	curl -O ftp://ftp.ncdc.noaa.gov/pub/data/ghcn/daily/ghcnd-stations.txt # file structure and type description
```
The code realizes the whole process of reading, processing and writing files. In the process of processing files, only the monitored maximum temperature information (Type=TMAX) is obtained, and others are filtered.
The code of this case uses org chenkui. spring. batch. sample. flatfile. The flatfileitemapplication:: main method runs in the Command Runner mode (see for the description of the operation mode) Item concept and usage code Command line mode, Java embedded mode).

This article is reproduced from: https://my.oschina.net/chkui/blog/3071788

Keywords: Spring

Added by Bob Norris on Sat, 15 Jan 2022 17:10:00 +0200

Programming VIP