Tool Library-Document Operation Based on LibreOffice

Document conversion project based on libreoffice, no framework dependency, plug and play

Project source code: github/workable-converter

1. Technology stack

  • LibreOffice:v6.2.3
  • jodconverter:4.2.2
  • PDFBox:2.0.12
  • cglib dynamic agent + lazy factory mode + strategy mode + decorator mode
  • qtools-property management configuration file (application.yml, bootstrap.yml, workable-converter.yml three named configuration files can contain any one)

2. Function

  • Support doc, docx, html, ppt, png, pdf and other types of file conversion
  • Support different postures according to file path, byte input and output stream, Base64, etc.
  • It does not rely on third-party framework, Plug and Play, and supports three configurations: application.yml, bootstrap.yml and workable-converter.yml.

3. Use

3.1 Installation Configuration LibreOffice 6.2.3

CentOS refers directly to this article: CentOS 7 installs LibreOffice 6.2.3

windows and Mac can also get download links in the above articles

When the installation is complete, remember the Home directory of your LibreOffice, which you need to use later

Default directory:

  • CentOS: /opt/libreoffice6.2/
  • Mac: /Applications/LibreOffice.app/Contents/
  • Windows: C:\Program Files\LibreOffice\

3.2 Access Dependency

  • Maven
<dependency>
  <groupId>com.liumapp.workable.converter</groupId>
  <artifactId>workable-converter</artifactId>
  <version>v1.2.0</version>
</dependency>
  • Gradle
compile group: 'com.liumapp.workable.converter', name: 'workable-converter', version: 'v1.2.0'

3.3 Edit Profile

In the resources directory of the project, create a YML configuration file, and you need to ensure that the name of the file is either application.yml, bootstrap.yml or workable-converter.yml.

Add the following configuration:

com:
  liumapp:
    workable-converter:
      libreofficePath: "/Applications/LibreOffice.app/Contents"

The value of libreofficePath is the installation directory of LibreOffice:6.2.3

The complete list of configuration items is as follows

Parameter name explain Default values
libreofficePath LibreOffice installation directory (String) No default value, this item must be filled in
libreofficePort LibreOffice listening port (int) 2002
tmpPath Temporary storage directory (String) "./data/"

3.4 Execute conversion

3.4.1 Conversion by File Path

Take doc to PDF as an example

WorkableConverter converter = new WorkableConverter();//At the same time of instantiation, the configuration item is initialized, and the verification of the configuration item is decorated by Decorator.

ConvertPattern pattern = ConvertPatternManager.getInstance();
pattern.fileToFile("./data/test.doc", "./data/pdf/result1.pdf"); //test.doc is the file path to be converted, result1.pdf is the storage path of the converted results.
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);

converter.setConverterType(CommonConverterManager.getInstance());//Policy mode, after the implementation of the new conversion strategy, change here, the image conversion will consider using the new strategy to complete.
boolean result = converter.convert(pattern.getParameter();

If you want to use html to transfer PDF, the above code's

pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);

Change to

pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.HTML);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);

Other types of equivalence

3.4.2 Conversion by Input-Output Stream

Take doc to pdf as an example

// you can also choice not use proxy
WorkableConverter converter = new WorkableConverter();
ConvertPattern pattern = ConvertPatternManager.getInstance();
pattern.streamToStream(new FileInputStream("./data/test.doc"), new FileOutputStream("./data/pdf/result1_2.pdf"));
// attention !!! convert by stream must set prefix.
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
converter.setConverterType(CommonConverterManager.getInstance());
boolean result = converter.convert(pattern.getParameter();

As in the previous example, the only change is to set the input and output streams by pattern.streamToStream(). The converted source file data is read from the input stream, and the converted results are written directly to the output stream.

At the same time, to switch the conversion format, just set different prefix as in the previous example.

3.4.3 Conversion by file Base64

Still take doc to pdf as an example

WorkableConverter converter = new WorkableConverter();
ConvertPattern pattern = ConvertPatternManager.getInstance();
pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test.doc")));
// attention !!! convert by base64 must set prefix.
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.DOC);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
converter.setConverterType(CommonConverterManager.getInstance());
boolean result = converter.convert(pattern.getParameter();
String destBase64 = pattern.getBase64Result();

Enter Base64 to perform the conversion. First, set the base64 value of the conversion source by pattern.base64ToBase64().

The result result of the transformation is still a boolean type, and the base64 value of the result of the transformation is obtained by pattern.getBase64Result

To switch the conversion format, just set different prefix as in the previous example.

3.5 Picture Processing

At present, for image processing, only PDF to PNG image is supported (if a PDF file has 20 pages, then it will be converted to 20 png images). The realization of this function is based on PDFBox:2.0.12.

3.5.1 Processing by File Path

pattern.fileToFiles() The first parameter is the pdf file path to be converted, and the second parameter is the converted image storage path.

WorkableConverter converter = new WorkableConverter();
ConvertPattern pattern = ConvertPatternManager.getInstance();
pattern.fileToFiles("./data/test5.pdf", "./data/");
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PNG);
converter.setConverterType(PdfBoxConverterManager.getInstance()); // pdf box converter manager only support pdf to png
assertEquals(true, converter.convert(pattern.getParameter()));
assertEquals(true, FileTool.isFileExists("./data/test5_0.png"));
assertEquals(true, FileTool.isFileExists("./data/test5_1.png"));
assertEquals(true, FileTool.isFileExists("./data/test5_2.png"));
assertEquals(true, FileTool.isFileExists("./data/test5_3.png"));

3.5.2 Processing according to file Base64

The parameter of pattern.base64ToBase64() is the base64 value of the pdf file to be converted

After the conversion, get the set of base64 values of the converted image through List < String > resultBase64 = pattern. getBase64Results ().

WorkableConverter converter = new WorkableConverter();
ConvertPattern pattern = ConvertPatternManager.getInstance();
pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test5.pdf")));
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PNG);
converter.setConverterType(PdfBoxConverterManager.getInstance()); // pdf box converter manager only support pdf to png
boolean result = converter.convert(pattern.getParameter());
List<String> resultBase64 = pattern.getBase64Results();
assertEquals(true, result);
assertEquals(4, resultBase64.size());

3.6 Adding Watermarking

Watermark Converter

Attentions to Watermarking

  • Make sure that the input source file suffix is PDF and the output source file suffix is PDF.
  • Watermark parameters need a new WaterMarkRequire to be set
  • setWaterMarkPage(int page) represents which page watermarking is added, and if it is 0, it represents all pages.
  • The watermarking itself is a PDF file, which only needs one page. The content of the first page will be added to the source file as a watermarking.

    For example, if you want to add text with transparency of 0.3 as a watermarking, you can use tools such as word to draw fonts with transparency of 0.3 (or png pictures with transparency can also be used) and save them as a watermark.pdf file.

    Then use waterMarkRequire. setWaterMarkPDFBase64 (Base64 FileTool. FileToBase64 (new File (". / data / watermark. pdf")).

    Or waterMarkRequire.setWaterMarkPDFBytes(FileUtils.readFileToByteArray(new File("./data/watermark.pdf")) can input the base64 or bytes value of the file.

Specific use can be divided into three ways

3.6.1 Add watermarking according to file path

WorkableConverter converter = new WorkableConverter();
converter.setConverterType(WaterMarkConverterManager.getInstance());//Select specific watermarking conversion strategy

ConvertPattern pattern = ConvertPatternManager.getInstance();
WaterMarkRequire waterMarkRequire = new WaterMarkRequire();//Parameters needed to create a watermarking

//Specify which page to add watermarking, and 0 to add watermarking to all pages
waterMarkRequire.setWaterMarkPage(0);//0 means all age
waterMarkRequire.setWaterMarkPDFBase64(Base64FileTool.FileToBase64(new File("./data/watermark.pdf")));

pattern.setWaterMarkRequire(waterMarkRequire);
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.fileToFile("./data/test5.pdf", "./data/test5_with_mark01.pdf");//The watermarked file is saved in. / data / directory, named test5_with_mark01.pdf

boolean result = converter.convert(pattern.getParameter());
assertEquals(true, result);

3.6.2 Adding Watermarks to Streams

WorkableConverter converter = new WorkableConverter();
converter.setConverterType(WaterMarkConverterManager.getInstance());

ConvertPattern pattern = ConvertPatternManager.getInstance();
WaterMarkRequire waterMarkRequire = new WaterMarkRequire();

waterMarkRequire.setWaterMarkPage(0);//0 means all age
waterMarkRequire.setWaterMarkPDFBytes(FileUtils.readFileToByteArray(new File("./data/watermark.pdf")));

pattern.setWaterMarkRequire(waterMarkRequire);
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.streamToStream(new FileInputStream("./data/test5.pdf"), new FileOutputStream("./data/test5_with_mark02.pdf"));

boolean result = converter.convert(pattern.getParameter());
assertEquals(true, result);

3.6.3 Add watermarking according to base64

WorkableConverter converter = new WorkableConverter();
converter.setConverterType(WaterMarkConverterManager.getInstance());

ConvertPattern pattern = ConvertPatternManager.getInstance();
WaterMarkRequire waterMarkRequire = new WaterMarkRequire();

waterMarkRequire.setWaterMarkPage(0);//0 means all age
waterMarkRequire.setWaterMarkPDFBase64(Base64FileTool.FileToBase64(new File("./data/watermark.pdf")));

pattern.setWaterMarkRequire(waterMarkRequire);
pattern.setSrcFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.setDestFilePrefix(DefaultDocumentFormatRegistry.PDF);
pattern.base64ToBase64(Base64FileTool.FileToBase64(new File("./data/test5.pdf")));

boolean result = converter.convert(pattern.getParameter());
String base64Result = pattern.getBase64Result();
Base64FileTool.saveBase64File(base64Result, "./data/test5_with_mark03.pdf");
assertEquals(true, result);

4. To-do items

  • There are doc, docx and html that have passed the test to rotate PDF according to different postures. Other types of test units have not been written, and subsequent consideration will be added.
  • At present, only yml configuration is supported, and other types of configuration support (xml, properties, etc.) will be added later.
  • At present, Markdown format is very popular. Consider the implementation of PDF (markdown - > HTML - > pdf) in markdown format.

5. Notes

  • Because LibreOffice support is required, it is not recommended to run in containers such as Docker (LibreOffice does not currently have a Docker stable release image)
  • Conversion of scramble code and conversion time is too long. Please check whether the server has Chinese fonts installed.
  • After the start of the project, when performing the first conversion task, it will take a long time because of the connection with LibreOffice and other operations. The second task and later stable within 0.5 seconds (specific time will vary due to machine configuration).

6. Reference Links

Keywords: Java CentOS github Windows Mac

Added by BlooPanthr on Mon, 05 Aug 2019 09:19:41 +0300