Use Markdown in the front page and optimize the a tag

Recently, I added a pair of in my own project Markdown syntax The project of markedjs is mainly used to support. The project is hosted on GitHub at: https://github.com/markedjs/marked/

Installation of the project

After downloading the project, execute the following in the root directory npm command Install

$ npm install

After installation, the directory structure of the final project is as follows

Let's take a look at the package. In the root directory json file, part of which is as follows. json has its own syntax format, which you can refer to Tutorial, Jason

"scripts": {
    "test": "jasmine --config=jasmine.json",
    "test:all": "npm test && npm run test:lint",
    "test:unit": "npm test -- test/unit/**/*-spec.js",
    "test:specs": "npm test -- test/specs/**/*-spec.js",
    "test:lint": "eslint bin/marked .",
    "test:redos": "node test/vuln-regex.js",
    "test:update": "node test/update-specs.js",
    "rules": "node test/rules.js",
    "bench": "npm run rollup && node test/bench.js",
    "lint": "eslint --fix bin/marked .",
    "build:reset": "git checkout upstream/master lib/marked.js lib/marked.esm.js marked.min.js",
    "build": "npm run rollup && npm run minify",
    "build:docs": "node build-docs.js",
    "rollup": "npm run rollup:umd && npm run rollup:esm",
    "rollup:umd": "rollup -c rollup.config.js",
    "rollup:esm": "rollup -c rollup.config.esm.js",
    "minify": "uglifyjs lib/marked.js -cm  --comments /Copyright/ -o marked.min.js",
    "minifyMessage": "uglifyjs ext/onmpwmessage.js -cm  --comments /Copyright/ -o ext/onmpwmessage.min.js",
    "preversion": "npm run build && (git diff --quiet || git commit -am build)"
  }

Execute the following command

$ npm run build

After the command is executed, a marked Min.js file

Finally, we'll be # marked Copy the min.js file into our project, and then you can use it

Compile Markdown content using markedjs parsing

Introduce marked in the page Min.js file

<script type="text/javascript" src="/js/marked.min.js"></script>

The next step is to parse the content. First, initialize the marked object

marked.setOptions({
    renderer: new marked.Renderer(),
    gfm: true,
    tables: true,
    breaks: false,
    pedantic: false,
    sanitize: false,
    smartLists: true,
    smartypants: false,
    highlight: function (code,lang) {
        //Use the highlight plug-in to parse the code part in the document
        return hljs.highlightAuto(code,[lang]).value;
    }
});

Then call the marked function to parse.

let originText = "[Trace memory guest](https://www.jiyik.com)";
let newText = marked(originText);
console.log(newText);

In fact, we can get the content of markdown from the background through ajax, then parse it into html through marked, and put the parsed html content in the corresponding place in the page.

Tell me about my markdown app

In my project, I do not convert Markdown at the front end, but after editing the content according to Markdown syntax in the editor, convert the content into html through markdjs and store it in the database. What is taken out at the front end is the parsed content, which can be directly displayed on the page.

Optimization of markedJs

Now to the key content of this time, markedJs is relatively mature, and its personal sensory function is relatively comprehensive. However, the drawback is that it may be affected by the default syntax of markdown a label Only the current page is opened, and there is no syntax for opening a new window. That is to say, for the following syntax

[Trace memory guest](https://www.jiyik.com "here is the title")

Finally, it can only be converted into

<a href="https://www.jiyik. Com "title =" here is title "> Ji Yike</a>

If I want to open the a tag in a new window, there is no corresponding syntax to use. You can't abandon markedJs just because of an a tag. In the face of this situation, even if the project is open source, try to see if you can add this attribute.

I have used a total of three methods to add the target attribute

Direct violence addition

At first, I thought so. In projects, markdown syntax is usually used in the content of the article. Generally, the jump in the article content will be opened with a new window. Therefore, the attribute target="_blank" is directly added to the parsed a tag.

According to this idea, I will directly look at the source code. The simplest way to do this is to search all items < a. Find the place where the a tag is constructed and add "target="_blank "directly behind it.

Src / renderer in the project About 140 lines in JS file

let out = '<a href="' + escape(href) + '"';

Add target attribute directly

let out = '<a href="' + escape(href) + '" target="_blank"';

Then execute the command in the root directory

$ npm run build

The generated {marked Min.js is applied to the project. The newly added a tags all have the "target="_blank "attribute.

Although it has been added, but think about it carefully. This method is no different from that before optimization. It's just a new window and no new window. Being unable to control is the most painful. It would be perfect if this property could be controlled in some way.

use! Controls whether attributes are added

To control the target attribute, you need to mark it with some symbol in [] (). The syntax of the markdown corresponding to the IMG tag is! [](). Using the syntax of img tag for reference, I put the exclamation mark in brackets [!] To control the target attribute.

The effects to be achieved are as follows

[Trace memory guest](https://www.jiyik.com) / / parsed as < a href=“ https://www.jiyik.com "> memories of customers</a>

[!Trace memory guest](https://www.jiyik.com) / / parsed as < a href=“ https://www.jiyik.com " target="_ Blank "> reminiscence</a>

To achieve this effect, it is not the same as the above. It is useless to search < a , directly. Here I use WebStorm to open the marked project, and then use the above debugging tool to track its code.

First, configure markedJs in webstorm to make it run. New first node.js Script run

After the creation is successful, you can mark a breakpoint in the code and use the debugging function of webstorm to track its code.

Of course, you can't break points in the entry file of the project, which is very painful in the process of tracking, because if the code level is very deep, it's easy to get lost when walking.

Read the source code first and mark the breakpoint where you think it is related to parsing the a tag. After reading the source code, I was in Src / tokenizer The breakpoint marked in the {link() method in the JS file (on line 474)

After tracking, I finally followed Src / tokenizer The outputLink() method in JS is implemented as follows:

function outputLink(cap, link, raw) {
  const href = link.href;
  const title = link.title ? escape(link.title) : null;
  const text = cap[1].replace(/\\([\[\]])/g, '$1');

  if (cap[0].charAt(0) !== '!') {
    return {
      type: 'link',
      raw,
      href,
      title,
      text,
    };
  } else {
    return {
      type: 'image',
      raw,
      href,
      title,
      text: escape(text)
    };
  }
}

The text in the code saves the text in [trace memory guest] (trace memory guest). If we add an exclamation mark, [! Trace memory guest], the value of text is "! Trace memory guest". In this way, we can make a judgment on the text of text. If the first letter is an exclamation mark!, Then set the value of target to "_blank". Otherwise, the target is empty. Then add the target attribute to the returned object. The modified code is as follows

function outputLink(cap, link, raw) {
  const href = link.href;
  const title = link.title ? escape(link.title) : null;
  const text = cap[1].replace(/\\([\[\]])/g, '$1');

  if (cap[0].charAt(0) !== '!') {
    let a_text = text;
    let target = "";
    if(a_text.charAt(0) === '!') {
      target = "_blank";
      a_text = a_text.substring(1); // Here will be in the text! Remove
    }
    return {
      type: 'link',
      raw,
      href,
      title,
      text:a_text,
      target
    };
  } else {
    return {
      type: 'image',
      raw,
      href,
      title,
      text: escape(text)
    };
  }
}

Then we continue to track the code and come to the place where we added the violence in our first method, the {link() method. We don't use violence here, because we have a choice now. We need to add a parameter to the link method, target.

link(href, title, text, target) {
    href = cleanUrl(this.options.sanitize, this.options.baseUrl, href);
    if (href === null) {
      return text;
    }
    let out = '';
    if(target !== "") {
      out = '<a href="' + escape(href) + '" target="' + escape(target) + '"';
    }else{
      out = '<a href="' + escape(href) + '"';
    }
    if (title) {
      out += ' title="' + title + '"';
    }
    out += '>' + text + '</a>';
    return out;
  }

Then we continue to find the place where the link method is called -- Src / parser Line 219 of JS file

Pass the target parameter where the link method is called

case 'link': {
      out += renderer.link(token.href, token.title, this.parseInline(token.tokens, renderer),token.target);
      break;
    }

At this point, all our code has been modified. The next step is to compile the project and generate marked Min.js file, which is used in my project.

Used for a period of time and found no problems. But I always feel that this method is not thorough enough. Of course, it is not thorough enough in grammar, but in code. After text is matched, the first letter of text should be judged, and then the string should be intercepted. There should be some deficiencies in efficiency (although the actual situation has no impact, it should be based on the attitude of excellence, isn't it? Please allow me to install it). Or should we continue to optimize the code, and then comes the ultimate method

Study the big trick and modify the rules

Even if you don't want to start from the text, you need to change its matching rules. Similarly, continue to use webstorm breakpoint debugging. It can be found that the matching rules for all tags are as follows

const inline = {
  escape: /^\\([!"#$%&'()*+,\-./:;<=>?@\[\]\\^_`{|}~])/,
  autolink: /^<(scheme:[^\s\x00-\x1f<>]*|email)>/,
  url: noopTest,
  tag: '^comment'
    + '|^</[a-zA-Z][\\w:-]*\\s*>' // self-closing tag
    + '|^<[a-zA-Z][\\w-]*(?:attribute)*?\\s*/?>' // open tag
    + '|^<\\?[\\s\\S]*?\\?>' // processing instruction, e.g. <?php ?>
    + '|^<![a-zA-Z]+\\s[\\s\\S]*?>' // declaration, e.g. <!DOCTYPE html>
    + '|^<!\\[CDATA\\[[\\s\\S]*?\\]\\]>', // CDATA section
  link: /^!?\[(label)\]\(\s*(href)(?:\s+(title))?\s*\)/,
  reflink: /^!?\[(label)\]\[(?!\s*\])((?:\\[\[\]]?|[^\[\]\\])+)\]/,
  nolink: /^!?\[(?!\s*\])((?:\[[^\[\]]*\]|\\[\[\]]|[^\[\]])*)\](?:\[\])?/,
  reflinkSearch: 'reflink|nolink(?!\\()',
  emStrong: {
    lDelim: /^(?:\*+(?:([punct_])|[^\s*]))|^_+(?:([punct*])|([^\s_]))/,
    //        (1) and (2) can only be a Right Delimiter. (3) and (4) can only be Left.  (5) and (6) can be either Left or Right.
    //        () Skip other delimiter (1) #***                (2) a***#, a***                   (3) #***a, ***a                 (4) ***#              (5) #***#                 (6) a***a
    rDelimAst: /\_\_[^_]*?\*[^_]*?\_\_|[punct_](\*+)(?=[\s]|$)|[^punct*_\s](\*+)(?=[punct_\s]|$)|[punct_\s](\*+)(?=[^punct*_\s])|[\s](\*+)(?=[punct_])|[punct_](\*+)(?=[punct_])|[^punct*_\s](\*+)(?=[^punct*_\s])/,
    rDelimUnd: /\*\*[^*]*?\_[^*]*?\*\*|[punct*](\_+)(?=[\s]|$)|[^punct*_\s](\_+)(?=[punct*\s]|$)|[punct*\s](\_+)(?=[^punct*_\s])|[\s](\_+)(?=[punct*])|[punct*](\_+)(?=[punct*])/ // ^- Not allowed for _
  },
  code: /^(`+)([^`]|[^`][\s\S]*?[^`])\1(?!`)/,
  br: /^( {2,}|\\)\n(?!\s*$)/,
  del: noopTest,
  text: /^(`+|[^`])(?:(?= {2,}\n)|[\s\S]*?(?:(?=[\\<!\[`*_]|\b_|$)|[^ ](?= {2,}\n)))/,
  punctuation: /^([\spunctuation])/
};

Here we only care about the link rules

link: /^!?\[(label)\]\(\s*(href)(?:\s+(title))?\s*\)/

It turns out that you didn't take our lovely target into account at the beginning. Target must not be your own.

If you don't want it, let's add it ourselves. The rules are modified as follows

link: /^!?\[(target)(label)\]\(\s*(href)(?:\s+(title))?\s*\)/,

This is not enough. For example, target, label, href and title are all marks to explain what should be here. You can't match anything with this regularity. There must be something hidden below. So I continued to search and finally found the following code

inline._label = /(?:\[(?:\\.|[^\[\]\\])*\]|\\.|`[^`]*`|[^\[\]\\`])*?/;
inline._href = /<(?:\\.|[^\n<>\\])+>|[^\s\x00-\x1f]*/;
inline._title = /"(?:\\"?|[^"\\])*"|'(?:\\'?|[^'\\])*'|\((?:\\\)?|[^)\\])*\)/;

inline.link = edit(inline.link)
  .replace('label', inline._label)
  .replace('href', inline._href)
  .replace('title', inline._title)
  .getRegex();

Ah ha ha, that's right. This is to prevent such a long regular form from being difficult to read, so it uses tags to explain it, and then the program itself replaces it. Is it very humanized? Here's a compliment. That's easy to do. We immediately added the target mark on it. Here, we also add a regular to match our exclamation mark!

inline._target = /!?/;

inline.link = edit(inline.link)
  .replace('target',inline._target)
  .replace('label', inline._label)
  .replace('href', inline._href)
  .replace('title', inline._title)
  .getRegex();

Because I want to capture the matching results, I put parentheses outside the target tag above. Here are the knowledge points of regular expressions. So regular expressions are still very important. If we don't know regular expressions, we won't have a big move. In the second way, it stops. See if this is an impulse to learn regular expressions. Click to learn regular expressions.

Next, we will modify the {outputlink() method modified in the second method again

function outputLink(cap, link, raw) {
  const href = link.href;
  const title = link.title ? escape(link.title) : null;
  const text = cap[2].replace(/\\([\[\]])/g, '$1');
  const target = (cap[1].length == 1 && cap[1] === '!')?"_blank":"";

  if (cap[0].charAt(0) !== '!') {
    return {
      type: 'link',
      raw,
      href,
      title,
      text,
      target
    };
  } else {
    return {
      type: 'image',
      raw,
      href,
      title,
      text: escape(text)
    };
  }
}

It seems that it has become simpler. However, just modifying here is not enough, because we added an additional capture group in the regular, so the grouping index of the previous text, href and title should be added with 1. Where do you want to modify it? Continue to look here and find another link method, but this link method is different from the link method with parameters. The link method is Src / tokenizer JS file.

link(src) {
    const cap = this.rules.inline.link.exec(src);
    if (cap) {
      const trimmedUrl = cap[3].trim(); // Originally cap [2] trim()
      if (!this.options.pedantic && /^</.test(trimmedUrl)) {
        // commonmark requires matching angle brackets
        if (!(/>$/.test(trimmedUrl))) {
          return;
        }

        // ending angle bracket cannot be escaped
        const rtrimSlash = rtrim(trimmedUrl.slice(0, -1), '\\');
        if ((trimmedUrl.length - rtrimSlash.length) % 2 === 0) {
          return;
        }
      } else {
        // find closing parenthesis
        // Originally const lastParenIndex = findClosingBracket(cap[2], '()')
        const lastParenIndex = findClosingBracket(cap[3], '()'); 
        if (lastParenIndex > -1) {
          const start = cap[0].indexOf('!') === 0 ? 5 : 4;
          const linkLen = start + cap[1].length + lastParenIndex;

          // Originally cap [2] = cap [2] substring(0, lastParenIndex);
          cap[3] = cap[3].substring(0, lastParenIndex); 
          cap[0] = cap[0].substring(0, linkLen).trim();
          cap[4] = ''; // Originally cap[3] = '';
        }
      }
      let href = cap[3]; // Originally let href = cap[2];
      let title = '';
      if (this.options.pedantic) {
        // split pedantic href and title
        const link = /^([^'"]*[^\s])\s+(['"])(.*)\2/.exec(href);

        if (link) {
          href = link[1];
          title = link[3];
        }
      } else {
        // Originally title = cap [3]? cap[3]. slice(1, -1) : '';
        title = cap[4] ? cap[4].slice(1, -1) : '';
      }

      href = href.trim();
      if (/^</.test(href)) {
        if (this.options.pedantic && !(/>$/.test(trimmedUrl))) {
          // pedantic allows starting angle bracket without ending angle bracket
          href = href.slice(1);
        } else {
          href = href.slice(1, -1);
        }
      }
      return outputLink(cap, {
        href: href ? href.replace(this.rules.inline._escapes, '$1') : href,
        title: title ? title.replace(this.rules.inline._escapes, '$1') : title
      }, cap[0]);
    }
  }

The code in other places modified in the second method should not continue to move, just keep the modification in the second method.

This is the end of the ultimate trick. Use the command to compile and generate marked The min.js file is OK.

Added by markbm on Sun, 27 Feb 2022 04:42:25 +0200