preface
The security vulnerabilities in the Json serialization framework have always been a topic of ridicule by programmers. In particular, fastjson has frequently reported vulnerabilities due to targeted research in the past two years. It doesn't matter if a vulnerability is found, but the security team always urges online applications to upgrade dependency by email, which is fatal, I believe that many small partners are also suffering. They have considered using other serialization frameworks to replace fastjson. This is not true. Recently, we have a project to replace fastjson with gson, which has caused an online problem. Share this experience so that we don't step into the same pit. Here's a warning to everyone: ten million regulations, safety first, non-standard upgrading and two lines of tears online.
Problem description
A very simple logic on the line is to serialize the objects into fastjson, and then send the string using HTTP request. It worked well. After replacing fastjson with gson, it triggered online OOM. Through the memory dump analysis, it is found that a 400m + message was sent. Because the HTTP tool did not verify the sending size, it forced the transmission, which directly led to the overall unavailability of the online service.
problem analysis
Why is it that there is no problem with fastjson when it is serialized by Json, but it is exposed immediately after it is replaced by gson? By analyzing the data of dump in memory, it is found that the values of many fields are duplicate. Combined with the characteristics of our business data, the problem is located at once - there is a serious defect in gson serializing duplicate objects.
Directly use a simple example to illustrate the problem at that time. Simulate the data properties on the line and add them to the same reference object using list < foo >
Foo foo = new Foo(); Bar bar = new Bar(); List<Foo> foos = new ArrayList<>(); for(int i=0;i<3;i++){ foos.add(foo); } bar.setFoos(foos); Gson gson = new Gson(); String gsonStr = gson.toJson(bar); System.out.println(gsonStr); String fastjsonStr = JSON.toJSONString(bar); System.out.println(fastjsonStr);
Observe the print results:
gson:
{"foos":[{"a":"aaaaa"},{"a":"aaaaa"},{"a":"aaaaa"}]}
fastjson:
{"foos":[{"a":"aaaaa"},{"$ref":"$.foos[0]"},{"$ref":"$.foos[0]"}]}
It can be found that gson handles duplicate objects by serializing each object, while fastjson handles duplicate objects by marking other objects except the first object with the reference symbol $ref.
When the number of single duplicate objects is very large and the submission of single objects is large, two different serialization strategies will lead to a qualitative change. We might as well compare them for special scenarios.
Compression ratio test
-
Serialized object: contains a large number of properties. To simulate online business data.
-
Number of repetitions: 200. That is, the List contains 200 objects with the same reference to simulate the complex object structure on the line and expand the difference.
-
Serialization method: gson, fastjson, Java, Hessian2. In addition, the control group of Java and Hessian2 is introduced to facilitate us to understand the performance of each serialization framework in this special scenario.
-
Mainly observe the byte size compressed by each serialization method, because it is related to the size of network transmission; After the secondary observation of the inverse sequence, the List is still not the same object
public class Main { public static void main(String[] args) throws IOException, ClassNotFoundException { Foo foo = new Foo(); Bar bar = new Bar(); List<Foo> foos = new ArrayList<>(); for(int i=0;i<200;i++){ foos.add(foo); } bar.setFoos(foos); // gson Gson gson = new Gson(); String gsonStr = gson.toJson(bar); System.out.println(gsonStr.length()); Bar gsonBar = gson.fromJson(fastjsonStr, Bar.class); System.out.println(gsonBar.getFoos().get(0) == gsonBar.getFoos().get(1)); // fastjson String fastjsonStr = JSON.toJSONString(bar); System.out.println(fastjsonStr.length()); Bar fastjsonBar = JSON.parseObject(fastjsonStr, Bar.class); System.out.println(fastjsonBar.getFoos().get(0) == fastjsonBar.getFoos().get(1)); // java ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream); oos.writeObject(bar); oos.close(); System.out.println(byteArrayOutputStream.toByteArray().length); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray())); Bar javaBar = (Bar) ois.readObject(); ois.close(); System.out.println(javaBar.getFoos().get(0) == javaBar.getFoos().get(1)); // hessian2 ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream(); Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos); hessian2Output.writeObject(bar); hessian2Output.close(); System.out.println(hessian2Baos.toByteArray().length); ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray()); Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais); Bar hessian2Bar = (Bar) hessian2Input.readObject(); hessian2Input.close(); System.out.println(hessian2Bar.getFoos().get(0) == hessian2Bar.getFoos().get(1)); } }
Output results:
gson: 62810 false fastjson: 4503 true Java: 1540 true Hessian2: 686 true
Conclusion analysis: due to the large volume of a single object after serialization, the volume can be reduced by using reference representation. It can be found that gson does not adopt this serialization optimization strategy, resulting in volume expansion. Even Java serialization, which has never been favored, is much better than it, and Hessian2 is exaggerated, which is directly optimized by two orders of magnitude than gson. After deserialization, gson can't restore the same referenced object, which can be achieved by other serialization frameworks.
Throughput test
In addition to focusing on the size of the data volume after serialization, we are also concerned about the throughput of each serialization. The throughput of each serialization mode can be accurately tested by benchmarking.
@BenchmarkMode({Mode.Throughput}) @State(Scope.Benchmark) public class MicroBenchmark { private Bar bar; @Setup public void prepare() { Foo foo = new Foo(); Bar bar = new Bar(); List<Foo> foos = new ArrayList<>(); for(int i=0;i<200;i++){ foos.add(foo); } bar.setFoos(foos); } Gson gson = new Gson(); @Benchmark public void gson(){ String gsonStr = gson.toJson(bar); gson.fromJson(gsonStr, Bar.class); } @Benchmark public void fastjson(){ String fastjsonStr = JSON.toJSONString(bar); JSON.parseObject(fastjsonStr, Bar.class); } @Benchmark public void java() throws Exception { ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream); oos.writeObject(bar); oos.close(); ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray())); Bar javaBar = (Bar) ois.readObject(); ois.close(); } @Benchmark public void hessian2() throws Exception { ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream(); Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos); hessian2Output.writeObject(bar); hessian2Output.close(); ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray()); Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais); Bar hessian2Bar = (Bar) hessian2Input.readObject(); hessian2Input.close(); } public static void main(String[] args) throws RunnerException { Options opt = new OptionsBuilder() .include(MicroBenchmark.class.getSimpleName()) .build(); new Runner(opt).run(); } }
Throughput report:
Benchmark Mode Cnt Score Error Units MicroBenchmark.fastjson thrpt 25 6724809.416 ± 1542197.448 ops/s MicroBenchmark.gson thrpt 25 1508825.440 ± 194148.657 ops/s MicroBenchmark.hessian2 thrpt 25 758643.567 ± 239754.709 ops/s MicroBenchmark.java thrpt 25 734624.615 ± 66892.728 ops/s
Isn't it a little unexpected that fastjson should take the lead. The throughput of text class serialization is an order of magnitude higher than that of binary serialization, which are million and 100000 levels per second respectively.
Overall test conclusion
- After fast JSON serialization, the reference tag with $can also be deserialized correctly by gson, but the author did not find the configuration to convert gson into a reference when serializing
- fastjson, hessian and java all support circular reference parsing; gson does not support
- fastjson can set DisableCircularReferenceDetect to turn off the detection of circular references and repeated references
- The same referenced object before gson deserialization will not be considered as the same object after serialization and deserialization, which may lead to the expansion of the number of memory objects; The serialization methods such as fastjson, java and hessian2 do not have this problem because they record reference tags
- Taking the author's test case as an example, Hessian 2 has a very strong serialization compression ratio, which is suitable for the scenario of large message serialization for network transmission
- Taking the author's test case as an example, fastjson has very high throughput and is worthy of its fast. It is suitable for scenarios requiring high throughput
- Serialization also needs to consider whether it supports circular reference, circular object optimization, enumeration types, sets, arrays, subclasses, polymorphisms, internal classes, generics and other comprehensive scenarios, visualization and other comparison scenarios, compatibility after adding and deleting fields, and so on. To sum up, the author recommends two serialization methods, Hessian 2 and fastjson
summary
As we all know, fastjson has made some relatively hack logic in order to be fast, which also leads to many vulnerabilities. However, I think the coding is carried out in trade off. If there was a perfect framework, other competing frameworks would not exist for a long time. The author doesn't have a deep research on various serialization frameworks. You may say jackson is better. I can only say that it can solve the problems encountered in your scenario, that is, the appropriate framework.
Finally, when you want to replace the serialization framework, you must be careful to understand the characteristics of the alternative framework. It may solve the problems solved by the original framework, and the new framework may not cover well.