Document conversion project based on libreoffice, no framework dependency, plug and play
Project source code: github/workable-converter
- 1. Technology stack
- 2. Function
- 4. To-do items
- 5. Notes
- 6. Reference Links
1. Technology stack
- LibreOffice:v6.2.3
- jodconverter:4.2.2
- PDFBox:2.0.12
- cglib dynamic agent + lazy factory mode + strategy mode + decorator mode
- qtools-property management configuration file (application.yml, bootstrap.yml, workable-converter.yml three named configuration files can contain any one)
2. Function
- Support doc, docx, html, ppt, png, pdf and other types of file conversion
- Support different postures according to file path, byte input and output stream, Base64, etc.
- It does not rely on third-party framework, Plug and Play, and supports three configurations: application.yml, bootstrap.yml and workable-converter.yml.
3. Use
3.1 Installation Configuration LibreOffice 6.2.3
CentOS refers directly to this article: CentOS 7 installs LibreOffice 6.2.3
windows and Mac can also get download links in the above articles
When the installation is complete, remember the Home directory of your LibreOffice, which you need to use later
Default directory:
- CentOS: /opt/libreoffice6.2/
- Mac: /Applications/LibreOffice.app/Contents/
- Windows: C:\Program Files\LibreOffice\
3.2 Access Dependency
- Maven
<dependency> <groupId>com.liumapp.workable.converter</groupId> <artifactId>workable-converter</artifactId> <version>v1.2.0</version> </dependency>
- Gradle
compile group: 'com.liumapp.workable.converter', name: 'workable-converter', version: 'v1.2.0'
3.3 Edit Profile
In the resources directory of the project, create a YML configuration file, and you need to ensure that the name of the file is either application.yml, bootstrap.yml or workable-converter.yml.
Add the following configuration:
com: liumapp: workable-converter: libreofficePath: "/Applications/LibreOffice.app/Contents"
The value of libreofficePath is the installation directory of LibreOffice:6.2.3
The complete list of configuration items is as follows
Parameter name | explain | Default values |
---|---|---|
libreofficePath | LibreOffice installation directory | (String) No default value, this item must be filled in |
libreofficePort | LibreOffice listening port | (int) 2002 |
tmpPath | Temporary storage directory | (String) "./data/" |
3.4 Execute conversion
3.4.1 Conversion by File Path
Take doc to PDF as an example
WorkableConverter converter = new WorkableConverter();//At the same time of instantiation, the configuration item is initialized, and the verification of the configuration item is decorated by Decorator. ConvertPattern pattern = ConvertPatternManager.getInstance(); pattern.fileToFile("./data/test.doc", "./data/pdf/result1.pdf"); //test.doc is the file path to be converted, result1.pdf is the storage path of the converted results. pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); converter.setConverterType(CommonConverterManager.getInstance());//Policy mode, after the implementation of the new conversion strategy, change here, the image conversion will consider using the new strategy to complete. boolean result = converter.convert(pattern.getParameter();
If you want to use html to transfer PDF, the above code's
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
Change to
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.HTML); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
Other types of equivalence
3.4.2 Conversion by Input-Output Stream
Take doc to pdf as an example
// you can also choice not use proxy WorkableConverter converter = new WorkableConverter(); ConvertPattern pattern = ConvertPatternManager.getInstance(); pattern.streamToStream(new FileInputStream("./data/test.doc"), new FileOutputStream("./data/pdf/result1_2.pdf")); // attention !!! convert by stream must set prefix. pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); converter.setConverterType(CommonConverterManager.getInstance()); boolean result = converter.convert(pattern.getParameter();
As in the previous example, the only change is to set the input and output streams by pattern.streamToStream(). The converted source file data is read from the input stream, and the converted results are written directly to the output stream.
At the same time, to switch the conversion format, just set different prefix as in the previous example.
3.4.3 Conversion by file Base64
Still take doc to pdf as an example
WorkableConverter converter = new WorkableConverter(); ConvertPattern pattern = ConvertPatternManager.getInstance(); pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test.doc"))); // attention !!! convert by base64 must set prefix. pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); converter.setConverterType(CommonConverterManager.getInstance()); boolean result = converter.convert(pattern.getParameter(); String destBase64 = pattern.getBase64Result();
Enter Base64 to perform the conversion. First, set the base64 value of the conversion source by pattern.base64ToBase64().
The result result of the transformation is still a boolean type, and the base64 value of the result of the transformation is obtained by pattern.getBase64Result
To switch the conversion format, just set different prefix as in the previous example.
3.5 Picture Processing
At present, for image processing, only PDF to PNG image is supported (if a PDF file has 20 pages, then it will be converted to 20 png images). The realization of this function is based on PDFBox:2.0.12.
3.5.1 Processing by File Path
pattern.fileToFiles() The first parameter is the pdf file path to be converted, and the second parameter is the converted image storage path.
WorkableConverter converter = new WorkableConverter(); ConvertPattern pattern = ConvertPatternManager.getInstance(); pattern.fileToFiles("./data/test5.pdf", "./data/"); pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PNG); converter.setConverterType(PdfBoxConverterManager.getInstance()); // pdf box converter manager only support pdf to png assertEquals(true, converter.convert(pattern.getParameter())); assertEquals(true, FileTool.isFileExists("./data/test5_0.png")); assertEquals(true, FileTool.isFileExists("./data/test5_1.png")); assertEquals(true, FileTool.isFileExists("./data/test5_2.png")); assertEquals(true, FileTool.isFileExists("./data/test5_3.png"));
3.5.2 Processing according to file Base64
The parameter of pattern.base64ToBase64() is the base64 value of the pdf file to be converted
After the conversion, get the set of base64 values of the converted image through List < String > resultBase64 = pattern. getBase64Results ().
WorkableConverter converter = new WorkableConverter(); ConvertPattern pattern = ConvertPatternManager.getInstance(); pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test5.pdf"))); pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PNG); converter.setConverterType(PdfBoxConverterManager.getInstance()); // pdf box converter manager only support pdf to png boolean result = converter.convert(pattern.getParameter()); List<String> resultBase64 = pattern.getBase64Results(); assertEquals(true, result); assertEquals(4, resultBase64.size());
3.6 Adding Watermarking
Watermark Converter
Attentions to Watermarking
- Make sure that the input source file suffix is PDF and the output source file suffix is PDF.
- Watermark parameters need a new WaterMarkRequire to be set
- setWaterMarkPage(int page) represents which page watermarking is added, and if it is 0, it represents all pages.
-
The watermarking itself is a PDF file, which only needs one page. The content of the first page will be added to the source file as a watermarking.
For example, if you want to add text with transparency of 0.3 as a watermarking, you can use tools such as word to draw fonts with transparency of 0.3 (or png pictures with transparency can also be used) and save them as a watermark.pdf file.
Then use waterMarkRequire. setWaterMarkPDFBase64 (Base64 FileTool. FileToBase64 (new File (". / data / watermark. pdf")).
Or waterMarkRequire.setWaterMarkPDFBytes(FileUtils.readFileToByteArray(new File("./data/watermark.pdf")) can input the base64 or bytes value of the file.
Specific use can be divided into three ways
3.6.1 Add watermarking according to file path
WorkableConverter converter = new WorkableConverter(); converter.setConverterType(WaterMarkConverterManager.getInstance());//Select specific watermarking conversion strategy ConvertPattern pattern = ConvertPatternManager.getInstance(); WaterMarkRequire waterMarkRequire = new WaterMarkRequire();//Parameters needed to create a watermarking //Specify which page to add watermarking, and 0 to add watermarking to all pages waterMarkRequire.setWaterMarkPage(0);//0 means all age waterMarkRequire.setWaterMarkPDFBase64(Base64FileTool.FileToBase64(new File("./data/watermark.pdf"))); pattern.setWaterMarkRequire(waterMarkRequire); pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.fileToFile("./data/test5.pdf", "./data/test5_with_mark01.pdf");//The watermarked file is saved in. / data / directory, named test5_with_mark01.pdf boolean result = converter.convert(pattern.getParameter()); assertEquals(true, result);
3.6.2 Adding Watermarks to Streams
WorkableConverter converter = new WorkableConverter(); converter.setConverterType(WaterMarkConverterManager.getInstance()); ConvertPattern pattern = ConvertPatternManager.getInstance(); WaterMarkRequire waterMarkRequire = new WaterMarkRequire(); waterMarkRequire.setWaterMarkPage(0);//0 means all age waterMarkRequire.setWaterMarkPDFBytes(FileUtils.readFileToByteArray(new File("./data/watermark.pdf"))); pattern.setWaterMarkRequire(waterMarkRequire); pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.streamToStream(new FileInputStream("./data/test5.pdf"), new FileOutputStream("./data/test5_with_mark02.pdf")); boolean result = converter.convert(pattern.getParameter()); assertEquals(true, result);
3.6.3 Add watermarking according to base64
WorkableConverter converter = new WorkableConverter(); converter.setConverterType(WaterMarkConverterManager.getInstance()); ConvertPattern pattern = ConvertPatternManager.getInstance(); WaterMarkRequire waterMarkRequire = new WaterMarkRequire(); waterMarkRequire.setWaterMarkPage(0);//0 means all age waterMarkRequire.setWaterMarkPDFBase64(Base64FileTool.FileToBase64(new File("./data/watermark.pdf"))); pattern.setWaterMarkRequire(waterMarkRequire); pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF); pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test5.pdf"))); boolean result = converter.convert(pattern.getParameter()); String base64Result = pattern.getBase64Result(); Base64FileTool.saveBase64File(base64Result, "./data/test5_with_mark03.pdf"); assertEquals(true, result);
4. To-do items
- There are doc, docx and html that have passed the test to rotate PDF according to different postures. Other types of test units have not been written, and subsequent consideration will be added.
- At present, only yml configuration is supported, and other types of configuration support (xml, properties, etc.) will be added later.
- At present, Markdown format is very popular. Consider the implementation of PDF (markdown - > HTML - > pdf) in markdown format.
5. Notes
- Because LibreOffice support is required, it is not recommended to run in containers such as Docker (LibreOffice does not currently have a Docker stable release image)
- Conversion of scramble code and conversion time is too long. Please check whether the server has Chinese fonts installed.
- After the start of the project, when performing the first conversion task, it will take a long time because of the connection with LibreOffice and other operations. The second task and later stable within 0.5 seconds (specific time will vary due to machine configuration).