1 Introduction
In daily development, ArrayList and HashSet are very common collection classes in Java.
- ArrayList is the most commonly used implementation class of List interface.
- HashSet is the implementation that holds the unique element Set.
This paper mainly makes a simple discussion on the common method contains(), mainly the performance comparison, and uses JMH(ava Microbenchmark Harness) to test and compare.
2. See JMH test results first
We use a Micro Benchmark Framework developed by the big bulls who have developed Java compilers in OpenJDK/Oracle to test. The following is a brief demonstration of the use process.
2.1 Maven import dependency
To import JMH related dependencies, you can go to the official website to view the latest version:
<dependencies> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-core</artifactId> <version>${openjdk.jmh.version}</version> </dependency> <dependency> <groupId>org.openjdk.jmh</groupId> <artifactId>jmh-generator-annprocess</artifactId> <version>${openjdk.jmh.version}</version> </dependency> </dependencies> <properties> <openjdk.jmh.version>1.19</openjdk.jmh.version> </properties>
2.2 create test related classes
2.2.1 class of collection storage object
Because we want to test the methods of the collection class, we create a class to represent the objects stored in the collection. As follows:
@Data @AllArgsConstructor(staticName = "of") public class Student { private Long id; private String name; }
2.2.2 JMH test
Next, we will write the test performance comparison class, the code is as follows:
@BenchmarkMode(Mode.AverageTime) @OutputTimeUnit(TimeUnit.NANOSECONDS) public class ContainsPerformanceTest { @State(Scope.Thread) public static class MyState { private Set<Student> studentSet = new HashSet<>(); private List<Student> studentList = new ArrayList<>(); private Student targetStudent = Student.of(99L, "Larry"); @Setup(Level.Trial) public void prepare() { long MAX_COUNT = 10000; for (long i = 0; i < MAX_COUNT; i++) { studentSet.add(Student.of(i, "MQ")); studentList.add(Student.of(i, "MQ")); } studentList.add(targetStudent); studentSet.add(targetStudent); } } @Benchmark public boolean arrayList(MyState state) { return state.studentList.contains(state.targetStudent); } @Benchmark public boolean hashSet(MyState state) { return state.studentSet.contains(state.targetStudent); } public static void main(String[] args) throws Exception { Options options = new OptionsBuilder() .include(ContainsPerformanceTest.class.getSimpleName()) .threads(6) .forks(1) .warmupIterations(3) .measurementIterations(6) .shouldFailOnError(true) .shouldDoGC(true) .build(); new Runner(options).run(); } }
Test class notes:
- @Benchmark mode: indicates the mode used for benchmarking; AverageTime indicates the average time of test calls.
- @OutputTimeUnit: the unit of measurement time for the test; NANOSECONDS means NANOSECONDS.
- @State: accept a Scope parameter to represent the shared Scope of the state; Scope.Thread means that each thread is exclusive.
- @Setup: execute before Benchmark, similar to JUnit's @ BeforeAll.
- @Benchmark: the object for benchmark, similar to JUnit's @ Test.
Test class startup parameter Options Description:
- include: the class name of the benchmark;
- Threads: the number of test threads in each process;
- Fork: number of processes. If it is 3, JMH will fork out 3 processes to test.
- warmupIterations: number of iterations to warm up,
- Measurement iterations: the number of iterations actually measured.
2.3 test results
After setting the parameters, you can run the test. The test results are as follows:
# Benchmark: ContainsPerformanceTest.arrayList # Run progress: 0.00% complete, ETA 00:00:18 # Fork: 1 of 1 # Warmup Iteration 1: 42530.408 ±(99.9%) 2723.999 ns/op # Warmup Iteration 2: 17841.988 ±(99.9%) 1882.026 ns/op # Warmup Iteration 3: 18561.513 ±(99.9%) 2021.506 ns/op Iteration 1: 18499.568 ±(99.9%) 2126.172 ns/op Iteration 2: 18975.407 ±(99.9%) 2004.509 ns/op Iteration 3: 19386.851 ±(99.9%) 2248.536 ns/op Iteration 4: 19279.722 ±(99.9%) 2102.846 ns/op Iteration 5: 19796.495 ±(99.9%) 1974.987 ns/op Iteration 6: 21363.962 ±(99.9%) 2175.961 ns/op Result "ContainsPerformanceTest.arrayList": 19550.334 ±(99.9%) 2771.595 ns/op [Average] (min, avg, max) = (18499.568, 19550.334, 21363.962), stdev = 988.377 CI (99.9%): [16778.739, 22321.929] (assumes normal distribution) # Benchmark: ContainsPerformanceTest.hashSet # Run progress: 50.00% complete, ETA 00:00:16 # Fork: 1 of 1 # Warmup Iteration 1: 10.662 ±(99.9%) 0.209 ns/op # Warmup Iteration 2: 11.177 ±(99.9%) 1.077 ns/op # Warmup Iteration 3: 9.467 ±(99.9%) 1.462 ns/op Iteration 1: 9.540 ±(99.9%) 0.535 ns/op Iteration 2: 9.388 ±(99.9%) 0.365 ns/op Iteration 3: 10.604 ±(99.9%) 1.008 ns/op Iteration 4: 9.361 ±(99.9%) 0.154 ns/op Iteration 5: 9.366 ±(99.9%) 0.458 ns/op Iteration 6: 9.274 ±(99.9%) 0.237 ns/op Result "ContainsPerformanceTest.hashSet": 9.589 ±(99.9%) 1.415 ns/op [Average] (min, avg, max) = (9.274, 9.589, 10.604), stdev = 0.505 CI (99.9%): [8.174, 11.004] (assumes normal distribution) # Run complete. Total time: 00:00:32 Benchmark Mode Cnt Score Error Units ContainsPerformanceTest.arrayList avgt 6 19550.334 ± 2771.595 ns/op ContainsPerformanceTest.hashSet avgt 6 9.589 ± 1.415 ns/op
After testing, it is found that the time-consuming difference between the two is great. The ArrayList is about 20K nanoseconds, while the HashSet is about 10 nanoseconds. They are not in the same order of magnitude at all.
3 source code analysis
Through the test, we know that there are great differences between the two. Let's take a look at the source code analysis.
3.1 contains() of ArrayList
The bottom layer of ArrayList uses array as data storage. When an Object is given to judge whether it exists, it needs to traverse the array and compare with each element.
public boolean contains(Object o) { return indexOf(o) >= 0; } public int indexOf(Object o) { if (o == null) { for (int i = 0; i < size; i++) if (elementData[i]==null) return i; } else { for (int i = 0; i < size; i++) if (o.equals(elementData[i])) return i; } return -1; }
It can be found from the source code that the contains() method is judged by calling indexOf(), which is to traverse the array until the element equal to the input parameter is found. Because, the time complexity of the contains() method of ArrayList is O(n), that is to say, time depends on the length and is proportional.
3.2 contents() of HashSet
The bottom layer of HashSet is implemented by HashMap, and the bottom structure of HashMap is array + linked list. After JDK 8, it is changed to array + linked list + red black tree.
The relevant codes of HashMap are as follows:
public boolean containsKey(Object key) { return getNode(hash(key), key) != null; } final Node<K,V> getNode(int hash, Object key) { Node<K,V>[] tab; Node<K,V> first, e; int n; K k; if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) { if (first.hash == hash && // always check first node ((k = first.key) == key || (key != null && key.equals(k)))) return first; if ((e = first.next) != null) { if (first instanceof TreeNode) return ((TreeNode<K,V>)first).getTreeNode(hash, key); do { if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null); } } return null; }
First, find it by getting the Hash value. If the Hash value is equal and the object is equal, find it. Generally speaking, in the case that the implementation of hashCode() method is OK, there are less Hash conflicts. Therefore, in most cases, the time complexity of contains() is O(1), and the number of elements does not affect its speed. In case of Hash conflict, when the length of the linked list is less than 8, the time complexity is O(n); when the length of the linked list is greater than 8, the time complexity is O(logn).
Generally, we think that the time complexity of HashSet/HashMap search is O(1).
4 Summary
Through JMH test, we find that the performance of contains() method of ArrayList and HashSet is quite different. According to the source code analysis, the time complexity of ArrayList is O(n), while that of HashSet is O(1).
Welcome to the public account "pumpkin slow talk", will continue to update for you...