Analysis of online problems caused by replacing fastjson with gson

preface

The security vulnerabilities in the Json serialization framework have always been a topic of ridicule by programmers. In particular, fastjson has frequently reported vulnerabilities due to targeted research in the past two years. It doesn't matter if a vulnerability is found, but the security team always urges online applications to upgrade dependency by email, which is fatal, I believe that many small partners are also suffering. They have considered using other serialization frameworks to replace fastjson. This is not true. Recently, we have a project to replace fastjson with gson, which has caused an online problem. Share this experience so that we don't step into the same pit. Here's a warning to everyone: ten million regulations, safety first, non-standard upgrading and two lines of tears online.

Problem description

A very simple logic on the line is to serialize the objects into fastjson, and then send the string using HTTP request. It worked well. After replacing fastjson with gson, it triggered online OOM. Through the memory dump analysis, it is found that a 400m + message was sent. Because the HTTP tool did not verify the sending size, it forced the transmission, which directly led to the overall unavailability of the online service.

problem analysis

Why is it that there is no problem with fastjson when it is serialized by Json, but it is exposed immediately after it is replaced by gson? By analyzing the data of dump in memory, it is found that the values of many fields are duplicate. Combined with the characteristics of our business data, the problem is located at once - there is a serious defect in gson serializing duplicate objects.

Directly use a simple example to illustrate the problem at that time. Simulate the data properties on the line and add them to the same reference object using list < foo >

Foo foo = new Foo();
Bar bar = new Bar();
List<Foo> foos = new ArrayList<>();
for(int i=0;i<3;i++){
    foos.add(foo);
}
bar.setFoos(foos);

Gson gson = new Gson();
String gsonStr = gson.toJson(bar);
System.out.println(gsonStr);

String fastjsonStr = JSON.toJSONString(bar);
System.out.println(fastjsonStr); 

Observe the print results:

gson:

{"foos":[{"a":"aaaaa"},{"a":"aaaaa"},{"a":"aaaaa"}]} 

fastjson:

{"foos":[{"a":"aaaaa"},{"$ref":"$.foos[0]"},{"$ref":"$.foos[0]"}]} 

It can be found that gson handles duplicate objects by serializing each object, while fastjson handles duplicate objects by marking other objects except the first object with the reference symbol $ref.

When the number of single duplicate objects is very large and the submission of single objects is large, two different serialization strategies will lead to a qualitative change. We might as well compare them for special scenarios.

Compression ratio test

  • Serialized object: contains a large number of properties. To simulate online business data.

  • Number of repetitions: 200. That is, the List contains 200 objects with the same reference to simulate the complex object structure on the line and expand the difference.

  • Serialization method: gson, fastjson, Java, Hessian2. In addition, the control group of Java and Hessian2 is introduced to facilitate us to understand the performance of each serialization framework in this special scenario.

  • Mainly observe the byte size compressed by each serialization method, because it is related to the size of network transmission; After the secondary observation of the inverse sequence, the List is still not the same object

public class Main {

    public static void main(String[] args) throws IOException, ClassNotFoundException {
        Foo foo = new Foo();
        Bar bar = new Bar();
        List<Foo> foos = new ArrayList<>();
        for(int i=0;i<200;i++){
            foos.add(foo);
        }
        bar.setFoos(foos);
        // gson
        Gson gson = new Gson();
        String gsonStr = gson.toJson(bar);
        System.out.println(gsonStr.length());
        Bar gsonBar = gson.fromJson(fastjsonStr, Bar.class);
        System.out.println(gsonBar.getFoos().get(0) == gsonBar.getFoos().get(1));  
        // fastjson
        String fastjsonStr = JSON.toJSONString(bar);
        System.out.println(fastjsonStr.length());
        Bar fastjsonBar = JSON.parseObject(fastjsonStr, Bar.class);
        System.out.println(fastjsonBar.getFoos().get(0) == fastjsonBar.getFoos().get(1));
				// java
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
        oos.writeObject(bar);
        oos.close();
        System.out.println(byteArrayOutputStream.toByteArray().length);
        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
        Bar javaBar = (Bar) ois.readObject();
        ois.close();
        System.out.println(javaBar.getFoos().get(0) == javaBar.getFoos().get(1));
        // hessian2
        ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
        Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
        hessian2Output.writeObject(bar);
        hessian2Output.close();
        System.out.println(hessian2Baos.toByteArray().length);
        ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
        Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
        Bar hessian2Bar = (Bar) hessian2Input.readObject();
        hessian2Input.close();
        System.out.println(hessian2Bar.getFoos().get(0) == hessian2Bar.getFoos().get(1));
    }

} 

Output results:

gson:
62810
false

fastjson:
4503
true

Java:
1540
true

Hessian2:
686
true 

Conclusion analysis: due to the large volume of a single object after serialization, the volume can be reduced by using reference representation. It can be found that gson does not adopt this serialization optimization strategy, resulting in volume expansion. Even Java serialization, which has never been favored, is much better than it, and Hessian2 is exaggerated, which is directly optimized by two orders of magnitude than gson. After deserialization, gson can't restore the same referenced object, which can be achieved by other serialization frameworks.

Throughput test

In addition to focusing on the size of the data volume after serialization, we are also concerned about the throughput of each serialization. The throughput of each serialization mode can be accurately tested by benchmarking.

@BenchmarkMode({Mode.Throughput})
@State(Scope.Benchmark)
public class MicroBenchmark {

    private Bar bar;

    @Setup
    public void prepare() {
        Foo foo = new Foo();
        Bar bar = new Bar();
        List<Foo> foos = new ArrayList<>();
        for(int i=0;i<200;i++){
            foos.add(foo);
        }
        bar.setFoos(foos);
    }

    Gson gson = new Gson();

    @Benchmark
    public void gson(){
        String gsonStr = gson.toJson(bar);
        gson.fromJson(gsonStr, Bar.class);
    }

    @Benchmark
    public void fastjson(){
        String fastjsonStr = JSON.toJSONString(bar);
        JSON.parseObject(fastjsonStr, Bar.class);
    }

    @Benchmark
    public void java() throws Exception {
        ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
        ObjectOutputStream oos = new ObjectOutputStream(byteArrayOutputStream);
        oos.writeObject(bar);
        oos.close();

        ObjectInputStream ois = new ObjectInputStream(new ByteArrayInputStream(byteArrayOutputStream.toByteArray()));
        Bar javaBar = (Bar) ois.readObject();
        ois.close();
    }

    @Benchmark
    public void hessian2() throws Exception {
        ByteArrayOutputStream hessian2Baos = new ByteArrayOutputStream();
        Hessian2Output hessian2Output = new Hessian2Output(hessian2Baos);
        hessian2Output.writeObject(bar);
        hessian2Output.close();


        ByteArrayInputStream hessian2Bais = new ByteArrayInputStream(hessian2Baos.toByteArray());
        Hessian2Input hessian2Input = new Hessian2Input(hessian2Bais);
        Bar hessian2Bar = (Bar) hessian2Input.readObject();
        hessian2Input.close();
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
            .include(MicroBenchmark.class.getSimpleName())
            .build();

        new Runner(opt).run();
    }

} 

Throughput report:

Benchmark                 Mode  Cnt        Score         Error  Units
MicroBenchmark.fastjson  thrpt   25  6724809.416 ± 1542197.448  ops/s
MicroBenchmark.gson      thrpt   25  1508825.440 ±  194148.657  ops/s
MicroBenchmark.hessian2  thrpt   25   758643.567 ±  239754.709  ops/s
MicroBenchmark.java      thrpt   25   734624.615 ±   66892.728  ops/s 

Isn't it a little unexpected that fastjson should take the lead. The throughput of text class serialization is an order of magnitude higher than that of binary serialization, which are million and 100000 levels per second respectively.

Overall test conclusion

  • After fast JSON serialization, the reference tag with $can also be deserialized correctly by gson, but the author did not find the configuration to convert gson into a reference when serializing
  • fastjson, hessian and java all support circular reference parsing; gson does not support
  • fastjson can set DisableCircularReferenceDetect to turn off the detection of circular references and repeated references
  • The same referenced object before gson deserialization will not be considered as the same object after serialization and deserialization, which may lead to the expansion of the number of memory objects; The serialization methods such as fastjson, java and hessian2 do not have this problem because they record reference tags
  • Taking the author's test case as an example, Hessian 2 has a very strong serialization compression ratio, which is suitable for the scenario of large message serialization for network transmission
  • Taking the author's test case as an example, fastjson has very high throughput and is worthy of its fast. It is suitable for scenarios requiring high throughput
  • Serialization also needs to consider whether it supports circular reference, circular object optimization, enumeration types, sets, arrays, subclasses, polymorphisms, internal classes, generics and other comprehensive scenarios, visualization and other comparison scenarios, compatibility after adding and deleting fields, and so on. To sum up, the author recommends two serialization methods, Hessian 2 and fastjson

summary

As we all know, fastjson has made some relatively hack logic in order to be fast, which also leads to many vulnerabilities. However, I think the coding is carried out in trade off. If there was a perfect framework, other competing frameworks would not exist for a long time. The author doesn't have a deep research on various serialization frameworks. You may say jackson is better. I can only say that it can solve the problems encountered in your scenario, that is, the appropriate framework.

Finally, when you want to replace the serialization framework, you must be careful to understand the characteristics of the alternative framework. It may solve the problems solved by the original framework, and the new framework may not cover well.

Keywords: Java Programmer

Added by lispwriter on Sat, 15 Jan 2022 14:25:49 +0200