In the intrusion detection service, for the alarm details of file content type, it is necessary to match the upper and lower lines of hit content.
The first implementation
Let's first look at an implementation:
/** * Created by zhangli on 19-12-18. * Highlight text tool class */ public class HighLightUtils { private static final Integer LINE_NUM = 10; private static final int MAX_REGEX_NUM = 10; /** * @param content Text content * @param keywords Keyword list * @return Highlight content paragraph set */ public static List<MatchedContent> highlight(String content, List<String> keywords) { if (StringUtils.isEmpty(content) || CollectionUtils.isEmpty(keywords)) { return Collections.emptyList(); } List<MatchedContent> partContentList = Lists.newArrayList(); for (String keyword : keywords) { if (!content.contains(keyword)) { continue; } partContentList.addAll(highlight(content, escapeRegexSpecialWord(keyword))); } return partContentList; } /** * @param content Text content * @param regex regular expression * @return Highlight content paragraph set */ public static List<MatchedContent> highlight(String content, String regex) { return highlight(content, regex, MAX_REGEX_NUM, LINE_NUM); } public static List<MatchedContent> highlight(String content, String regex, int maxMatchNum, int lineNum) { if (StringUtils.isEmpty(content) || StringUtils.isEmpty(regex)) { return Collections.emptyList(); } content = content.replaceAll("\\r\\n", "\n"); Pattern pattern = Pattern.compile(regex); Matcher m = pattern.matcher(content); List<MatchedContent> partContentList = Lists.newArrayList(); int maxNum = maxMatchNum; while (m.find()) { RegexMatchPoint regexMatchPoint = new RegexMatchPoint(m.start(), m.end()); partContentList.add(getPartContentMap(content, regexMatchPoint, lineNum)); if (--maxNum == 0) { break; } } return partContentList; } /** * Get the highlighted content and the starting line according to the regular matching */ private static MatchedContent getPartContentMap(String content, RegexMatchPoint m, int lineNum) { // Gets the number of lines of matching content in the file int startMatchLine = content.substring(0, m.getStart()).split("\\n").length; int endMatchLine = content.substring(0, m.getEnd()).split("\\n").length; // Highlight file matches String highlightContent = highlightOneRegexContent(content, m); // A total of 20 lines are intercepted before and after the matching content (if the matching content spans lines and is greater than 10 lines, it is intercepted from the matching place) String partContent = getPartContent(highlightContent, startMatchLine, endMatchLine); // Get the line number of the first line of the intercepted content int startLine = endMatchLine - lineNum + 1; //If the matching content is greater than 10 lines, start from the initial matching line instead of the fixed 10 lines if (startMatchLine < startLine) { startLine = startMatchLine; } return MatchedContent.builder() .startLine(startLine < 1 ? 1 : startLine) .partContent(partContent) .build(); } /** * Get the contents before and after the highlighted line */ private static String getPartContent(String content, Integer startMatchLine, Integer endMatchLine) { int start = StringUtils.ordinalIndexOf(content, "\n", endMatchLine - LINE_NUM); if (endMatchLine - startMatchLine > LINE_NUM) { start = StringUtils.ordinalIndexOf(content, "\n", startMatchLine - 1); } start = start < 0 ? 0 : start + 1; int end = StringUtils.ordinalIndexOf(content, "\n", endMatchLine + LINE_NUM); end = end < 0 ? content.length() : end; return content.substring(start, end); } /** * Highlight a single match */ private static String highlightOneRegexContent(String content, RegexMatchPoint point) { int start = 0; StringBuffer highlightContentSb = new StringBuffer(); highlightContentSb.append(content.substring(start, point.getStart())).append(CommonValues.HIGH_LIGHT_START) .append(content.substring(point.getStart(), point.getEnd())).append(CommonValues.HIGH_LIGHT_END) .append(content.substring(point.getEnd())); return highlightContentSb.toString(); } private static String escapeRegexSpecialWord(String keyword) { if (keyword != "") { String[] fbsArr = { "\\", "$", "(", ")", "*", "+", ".", "[", "]", "?", "^", "{", "}", "|" }; for (String key : fbsArr) { if (keyword.contains(key)) { keyword = keyword.replace(key, "\\" + key); } } } return keyword; } @Setter @Getter @ToString public static class RegexMatchPoint implements Comparable<RegexMatchPoint> { private Integer start; private Integer end; public RegexMatchPoint(Integer start, Integer end) { this.start = start; this.end = end; } //Sort by start position @Override public int compareTo(RegexMatchPoint o) { if (start.compareTo(o.getStart()) == 0) { return end.compareTo(o.getEnd()); } else { return start.compareTo(o.getStart()); } } public RegexMatchPoint copy() { return new RegexMatchPoint(start, end); } } }
This implementation is still good. At least it gives people good inspiration and is a good basis for improvement.
So, what's the problem?
- Highlight by single match. If there are multiple matches in a row, it is difficult to merge;
- There is no record of line number. All things related to line are realized through split("\n") and substring(start, end);
- Merging hit content is difficult.
Separation of construction and use
What is the separation of construction and use? It refers to extracting enough necessary information when constructing; In use, it uses this information to process, rather than "building while using". Just like the compiler does code compilation and automatic generation, it should not generate code while compiling.
The implementation of using while building will couple the construction and processing together. Once it needs to be changed, it will be more difficult.
Obviously, if we want to separate construction from use, what do we need to get first? (the line number, start position and end position of the hit content; all file lines and line numbers) these necessary information should be extracted first. Once we have determined the necessary information to solve the problem, it is natural to come up with a clear algorithm.
A brief introduction to the algorithm of merging and displaying highlighted content
Step 1: get all lines and line numbers [line number, line content];
Step 2: first find the line numbers and start and end points of all matching regular strings. regexMatchPoints =(lineNo, start, end);
Step 3: group regexMatchPoints by line; Because the combination of multiple matches in the line is very troublesome;
Step 4: all matched line numbers are sorted according to the matched line number to facilitate the final display according to the line number sequence;
Step 5: generate highlighted content by line and display [line number, highlighted line content];
Step 6: calculate the line numbers of the start line and the end line according to the matching line number. If the line numbers already in this range can be filtered (consolidated);
Step 7: obtain the corresponding line content according to all start line numbers and end line numbers.
Implementation code
/** * Highlight text display tool class * Created by qinshu on 2021/12/31 */ public class HighLightUtil { private static final Logger LOG = LogUtils.getLogger(HighLightUtil.class); /** Number of lines before and after highlighting */ private static final Integer HIGHLIGHT_LINE_NUM = 5; /** How many maximum matches */ private static final int MAX_REGEX_NUM = 10; /** * @param content Text content * @param regex regular expression * @return Highlight content paragraph set */ public static List<MatchedFileContent> highlight(String content, String regex) { return highlight(content, regex, MAX_REGEX_NUM, HIGHLIGHT_LINE_NUM); } /** * @param base64Content Text content (base64 encoded text) * @param regex regular expression * @return Highlight content paragraph set */ public static List<MatchedFileContent> highlightBase64(String base64Content, String regex) { if (StringUtils.isEmpty(base64Content)) { return Collections.emptyList(); } return highlight(Base64Utils.decodeContent(base64Content), regex); } public static List<MatchedFileContent> highlight(String content, String regex, int maxMatchNum, int highlightLineNum) { if (StringUtils.isEmpty(content) || StringUtils.isEmpty(regex)) { return Collections.emptyList(); } content = content.replaceAll("\\r\\n", "\n"); List<String> allLines = Arrays.asList(content.split("\n")); Pattern pattern = Pattern.compile(regex); List<RegexMatchPoint> regexMatchPoints = findAllRegexMatches(allLines, pattern); // Group by line number, match and highlight. Because multiple matched highlights in a single line need to be displayed in a single line, it is troublesome to merge after separation Map<Integer, List<RegexMatchPoint>> regexMatchPointMap = regexMatchPoints.stream().collect(Collectors.groupingBy(RegexMatchPoint::getLineNo)); // highLightLineMap: [line number, highlight line] Map<Integer, String> highLightLineMap = new HashMap<>(); regexMatchPointMap.forEach((lineNo, matchPointsOfLine) -> { highLightLineMap.put(lineNo, highlightOneLineContent(allLines.get(lineNo), matchPointsOfLine)); } ); List<MatchedFileContent> partContentList = merge(highLightLineMap, allLines, highlightLineNum); return partContentList.subList(0, Math.min(partContentList.size(), maxMatchNum)); } private static List<MatchedFileContent> merge(Map<Integer, String> highLightLineMap, List<String> allLines, int highlightLineNum) { // Sort by line number List<Integer> highLightLineNos = Lists.newArrayList(highLightLineMap.keySet()); Collections.sort(highLightLineNos); // Calculate the line number to be displayed List<MatchedFileLine> matchedFileLines = Lists.newArrayList(); for (Integer highLineNo: highLightLineNos) { if (!exist(matchedFileLines, highLineNo)) { int startLine = highLineNo - highlightLineNum; int endLine = 0; if (startLine < 0) { startLine = 0; endLine = highLineNo + highlightLineNum; } else { startLine = highLineNo - highlightLineNum + 1; endLine = highLineNo + highlightLineNum; } matchedFileLines.add(new MatchedFileLine(startLine, endLine)); } } return matchedFileLines.stream() .map(fileLine -> getMatchedFileContent(highLightLineMap, allLines, fileLine)).collect(Collectors.toList()); } /** * Gets the line content of the specified line number */ private static String getLine(Map<Integer, String> highLightLineMap, List<String> allLines, Integer lineNo) { String highLightLine = highLightLineMap.get(lineNo); return highLightLine != null ? highLightLine : allLines.get(lineNo); } private static boolean exist(List<MatchedFileLine> matchedFileLines, Integer lineNo) { return matchedFileLines.stream().anyMatch(fileLine -> exist(fileLine, lineNo)); } private static boolean exist(MatchedFileLine matchedFileLine, Integer lineNo) { return lineNo >= matchedFileLine.getStartLine() && lineNo < matchedFileLine.getEndLine(); } /** * Get according to the starting line number * @param highLightLineMap Gao Liangxing * @param allLines File all lines * @param fileLine Match content context line number * @return Matching content context and starting line number */ private static MatchedFileContent getMatchedFileContent(Map<Integer, String> highLightLineMap, List<String> allLines, MatchedFileLine fileLine) { StringBuilder partContentBuilder = new StringBuilder(); for (int i = fileLine.getStartLine(); i < fileLine.getEndLine() && i < allLines.size(); i++) { partContentBuilder.append(getLine(highLightLineMap, allLines, i) + "\n"); } return new MatchedFileContent(fileLine.getStartLine() + 1, partContentBuilder.toString()); } /** * Get all regular matching points * @param allLines All lines of file content * @param pattern Regular matching compiled expression * @return The position of all strings that match the regular expression */ private static List<RegexMatchPoint> findAllRegexMatches(List<String> allLines, Pattern pattern) { // Get all regular matching points first, and the line number starts from 0 List<RegexMatchPoint> regexMatchPoints = Lists.newArrayList(); for (int i=0; i < allLines.size(); i++) { String line = allLines.get(i); Matcher m = pattern.matcher(line); while (m.find()) { RegexMatchPoint regexMatchPoint = new RegexMatchPoint(i, m.start(), m.end()); regexMatchPoints.add(regexMatchPoint); } } return regexMatchPoints; } /** * Highlight text content */ public static String highlightContent(String content, List<String> match) { if (CollectionUtils.isEmpty(match)) { return content; } try { for (String matchContent : match) { String highlightContent = String.format("%s%s%s", CommonValues.HIGH_LIGHT_START, matchContent, CommonValues.HIGH_LIGHT_END); content = content.replaceAll(ExprUtils.escapeExprSpecialWord(matchContent), highlightContent); } } catch (Exception e) { LOG.error("highlight content error, content:{}, match:{}", content, match); } return content; } /** * Highlight the display of one line */ public static String highlightOneLineContent(String content, List<RegexMatchPoint> points) { int start = 0; int lastMatchEnd = 0; StringBuilder sb = new StringBuilder(); for (RegexMatchPoint point: points) { sb.append(content, start, point.getStart()).append(CommonValues.HIGH_LIGHT_START) .append(content, point.getStart(), point.getEnd()).append(CommonValues.HIGH_LIGHT_END); start = point.getEnd(); lastMatchEnd = point.getEnd(); } sb.append(content.substring(lastMatchEnd)); return sb.toString(); } @Setter @Getter @ToString public static class RegexMatchPoint implements Comparable<RegexMatchPoint> { private Integer lineNo; private Integer start; private Integer end; public RegexMatchPoint(Integer lineNo, Integer start, Integer end) { this.lineNo = lineNo; this.start = start; this.end = end; } public RegexMatchPoint copy() { return new RegexMatchPoint(lineNo, start, end); } } @Setter @Getter public static class MatchedFileLine { private Integer startLine; private Integer endLine; public MatchedFileLine(Integer startLine, Integer endLine) { this.startLine = startLine; this.endLine = endLine; } } }
Self test
/** * Highlight * Created by qinshu on 2021/12/31 */ public class HighlightUtilTest { String content = "dependencies {\n" + " testCompile group: 'junit', name: 'junit'\n" + "\n" + " compile project(\":detect-lib\")\n" + " compile project(\":connect-cli\")\n" + " compile project(\":wisteria-client\")\n" + " compile project(\":upload-cli\")\n" + " compile project(\":scan-client\")\n" + " compile(\"com.qt.qt-common:config-loader\")\n" + " compile project(\":switches-lib\")\n" + " compile project(\":bizevent-lib\")\n" + " compile project(\":user-client\")\n" + " compile project(\":notif-client\")\n" + " compile project(\":detect-client\")\n" + " compile project(\":job-cli\")\n" + " compile('com.qt.qt-common:redis-lib')\n" + " compile('com.qt.qt-common:rabbitmq-lib')\n" + " compile('com.qt.qt-common:encrypt-property-lib')\n" + " compile project(\":leader-latch-lib\")\n" + " compile(\"com.qt.qt-common:eventflow-lib:1.0.0-SNAPSHOT\")\n" + " compile(\"com.qt.qt-common:intrusion-detect-lib:1.0.1\")\n" + " compile('com.qt.qt-common:mysql-lib')\n" + " compile('com.qt.qt-common:rule-crypto')\n" + " compile project(\":rule-lib\")\n" + " compile project(\":api-auth-lib\")\n" + "\n" + " // Spring Cloud\n" + " // Configuration center \ n+ " compile ('org.springframework.cloud:spring-cloud-starter-zookeeper-config')\n" + " // Service discovery \ n "+ " compile ('org.springframework.cloud:spring-cloud-starter-zookeeper-discovery')\n" + " compile ('com.netflix.hystrix:hystrix-javanica')\n" + "\n" + " // Spring Boot\n" + " compile('org.springframework.boot:spring-boot-starter-web')\n" + " compile('org.springframework.boot:spring-boot-starter-aop')\n" + " compile('org.springframework.boot:spring-boot-starter-data-redis')\n" + "\n" + " // Spring\n" + " compile('org.springframework:spring-orm')\n" + " compile('org.springframework:spring-jdbc')\n" + " compile('org.springframework:spring-aop')\n" + "\n" + " // mongodb\n" + " compile('org.springframework.data:spring-data-mongodb:1.10.23.RELEASE')\n" + "\n" + " // Mysql\n" + " runtime('mysql:mysql-connector-java')\n" + " compile('com.zaxxer:HikariCP')\n" + " compile('org.mybatis.spring.boot:mybatis-spring-boot-starter')\n" + " compile('com.github.pagehelper:pagehelper-spring-boot-starter')\n" + "\n" + " //redisson\n" + " compile('io.projectreactor:reactor-core:3.2.8.RELEASE')\n" + "\n" + " // Jackson\n" + " compile('com.fasterxml.jackson.core:jackson-core')\n" + " compile('com.fasterxml.jackson.core:jackson-annotations')\n" + " compile('com.fasterxml.jackson.core:jackson-databind')\n" + " compile('org.codehaus.jackson:jackson-core-asl')\n" + "\n" + " compile('joda-time:joda-time')\n" + " compile('commons-io:commons-io:2.5')\n" + " compile('org.apache.commons:commons-lang3:3.5')\n" + " compile('org.apache.commons:commons-collections4:4.1')\n" + " compile('cglib:cglib:3.2.5')\n" + " compile('net.java.dev.jna:jna:5.8.0')\n" + " compile('org.apache.calcite:calcite-core:1.26.0')\n" + "\n" + " // Test\n" + " testCompile('org.mockito:mockito-core:2.13.0')\n" + " testCompile('org.springframework:spring-test')\n" + " testCompile('org.springframework.boot:spring-boot-starter-test')\n" + "\n" + " // string-similarity\n" + " compile('info.debatty:java-string-similarity:0.24')\n" + "\n" + " compile('com.jayway.jsonpath:json-path')\n" + "\n" + " compile('com.qt.qt-common:cron-lib:1.0.0')\n" + "\n" + "}"; @Test public void tsetHighlight() { String regex = "org\\.apache"; List<MatchedFileContent> matched = HighLightUtil.highlight(content, regex); Assert.assertTrue(matched.size() > 0); } @Test public void testHighlightBase64() { String content = "MG1laW5hMiAxbWVpbmEyCjBtZWluYTIgMW1laW5hMgo="; String regex = "meina2"; List<MatchedFileContent> matchedFileContents = HighLightUtil.highlightBase64(content, regex); Assert.assertEquals(1, matchedFileContents.size()); Assert.assertEquals("[MatchedFileContent(startLine=1, partContent=0<qthighlight--meina2--qthighlight> 1<qthighlight--meina2--qthighlight>\n" + "0<qthighlight--meina2--qthighlight> 1<qthighlight--meina2--qthighlight>\n" + ")]", matchedFileContents.toString()); } @Test public void testHighLight2() { String content = "customdir2 1\n" + "customdir2 2\n" + "customdir2 3\n" + "customdir2 4\n" + "customdir2 5\n" + "customdir2 6\n" + "customdir2 7 customdir2 7 customdir2 7 customdir2 7\n" + "customdir2 8 customdir2 8 customdir2 8 customdir2 8"; String regex = "customdir2"; List<MatchedFileContent> matchedFileContents = HighLightUtil.highlight(content, regex); Assert.assertEquals(2, matchedFileContents.size()); } @Test public void testHighLight3() { String content = "customdir2 1\n" + "customdir2 2\n" + "customdir3 3\n" + "customdir5 4\n" + "customdir6 5\n" + "customdir9 6\n" + "customdir10 7 customdir8 7 customdird 7 customdiro 7\n" + "customdir2 8 customdir2 8 customdir2 8 customdir2 8"; String regex = "customdir2"; List<MatchedFileContent> matchedFileContents = HighLightUtil.highlight(content, regex); Assert.assertEquals(2, matchedFileContents.size()); } }
Summary
This paper explains how to use the idea of "separation of construction and use" to reconstruct and improve the algorithm implementation of highlighting hit text content. The separation of construction and use is to extract the necessary information during construction and build the required functions during use, rather than coupling construction and use. If there are subsequent requirements changes, the changes will be more troublesome.