[advanced] pdf generation (watermarked), pdf preview (pagable), pdf printing: a one-stop solution for the whole stack

preface

Every front-end developer will always encounter some needs related to PDF in his life, but searching online articles is mostly the realization of some functions. It is not easy to obtain a complete scheme that meets his own needs. Based on this, combined with my relevant work experience, I sorted out a set including front-end PDF generation, front-end pdf preview The complete technical scheme of front-end PDF printing. If you find it useful, you can collect this article for reference in future work.

This article demo sample code address: https://github.com/Alansad/pdfArticle

pdf generation

1, Scheme comparison

Generally speaking, there are two schemes for generating pdf. The first is generated on the client and the second is generated on the server. I recommend generating pdf on the server.

On the client side, it is generally generated based on canvas:

1. Use the html2canvas library to convert HTML into canvas objects
2. Use canvas The todataurl method converts the canvas into a picture
3. Use jsPDF library to convert pictures to pdf
Although the scheme looks simple, it has two fatal disadvantages:
1. Generated pdf blur
2. The client cannot store the pdf for a long time
Therefore, I recommend using the second scheme to generate pdf on the server:


1. Generate html string
2. Open html in headless browser
3. Generate pdf through screenshots of the page
Some server-side plug-ins make the process of opening / screenshot of headless browser into a black box, which is not felt by developers.
However, whether using java, nodejs, python and other languages, the above scheme is generally adopted. The pdf generated by this scheme has high definition and strong reduction.

2, Concrete implementation

Let me introduce a specific case to introduce the technical details of the scheme in detail.

Requirement Description: provide an interface to generate pdf, render different pdf according to different request parameters, and return the pdf file in the form of url link.

Analysis: according to the above requirements, we first need to make an html template, then fill in the html according to the parameters in the request, convert the html into pdf with a headless browser, store it on the file service, and finally return the url to the front end.

Code implementation:

The following example code adopts the native node language for your understanding:

1. First, we prepare the html string template:

// html template, changing the title according to the title
const getHtml = (params) => {
  const {
    title = ' ',
  } = params
  return (`
<!DOCTYPE html>
<html lang="zh-CN">
<head>
  <meta http-equiv="content-type" content="text/html;charset=utf-8">
  <title>demo</title>
</head>
<body>
<div class="wrapper">
  <h style="color:red">${title}</h>
  <div>
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
    <img src="https://gimg2.baidu.com/image_search/src=http%3A%2F%2Fn1-q.mafengwo.net%2Fs6%2FM00%2FFC%2FCC%2FwKgB4lNzI2yAK4tdAAELj6RBVtE37.jpeg%3FimageMogr2%252Fthumbnail%252F%21310x207r%252Fgravity%252FCenter%252Fcrop%252F%21310x207%252Fquality%252F90&refer=http%3A%2F%2Fn1-q.mafengwo.net&app=2002&size=f9999,10000&q=a80&n=0&g=0n&fmt=jpeg?sec=1627634331&t=0efacd9a64806ffc74c5cdfa8f7f261f" alt="">
    <img src="https://img1.baidu.com/it/u=1361135963,570304265&fm=26&fmt=auto&gp=0.jpg" alt="">
  </div>
</div>
</html>
  `)
}

2. Then we use the npm package html pdf to convert html into pdf (html pdf document)

  • https://www.npmjs.com/package/html-pdf
const pdf = require('html-pdf')

// Parameters for generating pdf
const optionDefault = {
 'format': 'A4',
 'header': {
   'height': '10mm',
   'contents': '',
 }
}

// Convert html to pdf
const exportPdf = (html, options = optionDefault) => {
 return new Promise((resolve, reject) => {
   pdf.create(html, options).toBuffer((err, res) => {
     if (err) {
       reject(err)
     } else {
       resolve(res)
     }
   })
 })
}

3. Finally, we start an http service and write an interface to return pdf:

const http = require('http')
const url = require('url')
const querystring = require("querystring")
const {getHtml, exportPdf} = require('./utils/htmlToPdf')
http.createServer(async (request, response) => {
  const {query, pathname} = url.parse(request.url)
  const {title} = querystring.parse(query)
  if (pathname === '/') {
    response.writeHead(200, {
      'Content-Type': 'application/pdf',
      'Access-Control-Allow-Origin': '*'
    })
    const html = getHtml({title})
    const pdf = await exportPdf(html)
    response.end(pdf)
  }
}).listen(8888)

4. Our example here is to directly return pdf in buffer format. If you need to upload it to the storage service (take Alibaba cloud storage service as an example), we can use pdf create(html, options). Tostream obtains the pdf file in Stream format, and then uploads it with post request.

pdf.create(html, options).toStream((err, res) => {
      if (err) {
        reject(err)
      } else {
        resolve(res)
      }
    })
    

After understanding the principle of the scheme, it becomes very simple to add watermark to pdf:

Because the principle of this scheme is to take screenshots of html pages, we only need to add watermarks to html pages. There are many watermarks on the Internet. Just add a script to html to add watermarks.

In addition to the above functions, there are two precautions:

Because the scheme is implemented based on headless browser, the speed of generating pdf directly depends on the speed of loading html by the browser. If the time is too long, it is recommended to obtain pdf asynchronously. In addition, if there is high concurrency and the amount of html loaded is too large, you also need to pay attention to the memory problem of the service. It is best to separate it from the business code and deploy it to different servers.
When the browser loads html Chinese, it depends on the Chinese font library. If Chinese is not displayed in pdf, it needs to install Chinese fonts in the system; If you don't want to install Chinese fonts for the system, you can also specify Chinese fonts yourself:

@font-face {
          font-family: pdfZh;
          src: url("http://localhost:3000/pdf_zh.ttf");
        }
        body{
           font-family: pdfZh;
        }

pdf Preview

The principle of pdf preview is to convert pdf into canvas, and the most popular library is pdf JS, I will introduce the library with an official example. The final effect is as follows. You can Preview pdf in pages.

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Title</title>
  <!--<script src="./pdf.js"></script>-->
  <script src="//mozilla.github.io/pdf.js/build/pdf.js"></script>
  <style>
    #the-canvas {
      border: 1px solid black;
      direction: ltr;
    }
  </style>
</head>
<body>
<div>
  <button id="prev">previous page</button>
  <button id="next">next page</button>
  &nbsp; &nbsp;
  <span>Page: <span id="page_num"></span> / <span id="page_count"></span></span>
</div>
<canvas id="the-canvas" style="width: 100%; height: auto"></canvas>
</body>
<script>
  // If absolute URL from the remote server is provided, configure the CORS
  // header on that server.
  // const url = 'http://localhost:8888/?title=123'
  var url = 'https://raw.githubusercontent.com/mozilla/pdf.js/ba2edeae/web/compressed.tracemonkey-pldi-09.pdf';

  // Loaded via <script> tag, create shortcut to access PDF.js exports.
  const pdfjsLib = window['pdfjs-dist/build/pdf'];

  // The workerSrc property shall be specified.
  pdfjsLib.GlobalWorkerOptions.workerSrc = '//mozilla.github.io/pdf.js/build/pdf.worker.js';

  var pdfDoc = null,
    pageNum = 1,
    pageRendering = false,
    pageNumPending = null,
    scale = 3,
    canvas = document.getElementById('the-canvas'),
    ctx = canvas.getContext('2d');
  /**
   * Get page info from document, resize canvas accordingly, and render page.
   * @param num Page number.
   */
  function renderPage(num) {
    pageRendering = true;
    // Using promise to fetch the page
    pdfDoc.getPage(num).then(function(page) {
      var viewport = page.getViewport({scale: scale});
      canvas.height = viewport.height;
      canvas.width = viewport.width;

      // Render PDF page into canvas context
      var renderContext = {
        canvasContext: ctx,
        viewport: viewport
      };
      var renderTask = page.render(renderContext);

      // Wait for rendering to finish
      renderTask.promise.then(function() {
        pageRendering = false;
        if (pageNumPending !== null) {
          // New page rendering is pending
          renderPage(pageNumPending);
          pageNumPending = null;
        }
      });
    });

    // Update page counters
    document.getElementById('page_num').textContent = num;
  }
  /**
   * If another page rendering in progress, waits until the rendering is
   * finised. Otherwise, executes rendering immediately.
   */
  function queueRenderPage(num) {
    if (pageRendering) {
      pageNumPending = num;
    } else {
      renderPage(num);
    }
  }

  /**
   * Displays previous page.
   */
  function onPrevPage() {
    if (pageNum <= 1) {
      return;
    }
    pageNum--;
    queueRenderPage(pageNum);
  }
  document.getElementById('prev').addEventListener('click', onPrevPage);

  /**
   * Displays next page.
   */
  function onNextPage() {
    if (pageNum >= pdfDoc.numPages) {
      return;
    }
    pageNum++;
    queueRenderPage(pageNum);
  }
  document.getElementById('next').addEventListener('click', onNextPage);

  /**
   * Asynchronously downloads PDF.
   */
  pdfjsLib.getDocument(url).promise.then(function(pdfDoc_) {
    pdfDoc = pdfDoc_;
    document.getElementById('page_count').textContent = pdfDoc.numPages;

    // Initial/first page rendering
    renderPage(pageNum);
  });
</script>
</html>

pdf. JS document is: https://github.com/mozilla/pdf.js It should be noted that the larger the scale value is set in theory, the clearer the display is. However, if it is set too large, the parsing process may get stuck.

Print

On the browser side, we do not have permission to directly connect to the printer to print files, because there are great security risks. A common requirement is to evoke the print interface of the browser, so that users can adjust and operate printing by themselves.

The effect shown in the above dynamic diagram is to automatically evoke the function of printing pdf.

The purpose of changing pdf into objectURL here is to uniformly solve cross domain problems.

The specific code is:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Title</title>
</head>
<body>
<iframe id="frame-result" style="height: 100vh;width: 100vw;"></iframe>
</body>
<script>
  downloadRes = async () => {
    let response = await fetch('http://localhost:8888/?title=123')
    // Convert content to blob address
    let blob = await response.blob()
    const iframeEle =  document.querySelector('#frame-result')
    iframeEle.src = URL.createObjectURL(new Blob([blob], {type: 'application/pdf'}))
    if (iframeEle) {
      iframeEle.onload = () => {
        iframeEle.contentWindow.print();
      }
    }
  }
  downloadRes()
</script>
</html>

If you need to evoke the print window without displaying pdf, hide iframe

<iframe id="frame-result" style="display: none"></iframe>


If you need to open a new window to print, you can use the following code:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <title>Title</title>
</head>
<body>
</body>
<script>
  downloadRes = async () => {
    let response = await fetch('http://localhost:8888/?title=123')
    // Convert content to blob address
    let blob = await response.blob()
    const newWindow = window.open(URL.createObjectURL(new Blob([blob], {type: 'application/pdf'})))
    if (newWindow) {
      newWindow.onload = () => {
        newWindow.print();
      }
    }
  }
  downloadRes()
</script>
</html>

summary

This paper introduces the complete scheme including pdf generation, pdf preview and pdf printing. You are welcome to communicate and correct. In addition, students interested in file stream and headless browser can also pay attention to me. Later, I will introduce its practical application in work in detail.

Keywords: Javascript Front-end Vue html

Added by taldos on Tue, 01 Mar 2022 04:29:02 +0200