list object de duplication and de duplication according to object attributes. Big guy is awesome. Record it

Eight methods of List collection object de duplication and attribute de duplication - Fage talk Java - blog Garden

1, Outline of this paper

In this article, I want to write about eight methods of de duplication of List set elements. In fact, through flexible application, arrangement and combination, there are not necessarily eight, but there may be 18 methods.

  • Four methods of overall de duplication of object elements
  • Four methods of de duplication according to object attributes

In order to explain the test content below, let's do some initialization data first

public class ListRmDuplicate {
  private List<String> list;
  private List<Player> playerList;


@BeforeEach

public void setup() {

list  =  new ArrayList<>();

list.add("kobe");

list.add("james");

list.add("curry");

list.add("zimug");

list.add("zimug");



playerList= new ArrayList&lt;&gt;();
playerList.add(new Player("kobe","10000"));  //Long live Kobe
playerList.add(new Player("james","32"));
playerList.add(new Player("curry","30"));
playerList.add(new Player("zimug","27"));   // Note that the name is repeated here
playerList.add(new Player("zimug","18"));   //Note that the name and age are repeated here
playerList.add(new Player("zimug","18")); //Note that the name and age are repeated here


playerList= new ArrayList&lt;&gt;();
playerList.add(new Player("kobe","10000"));  //Long live Kobe
playerList.add(new Player("james","32"));
playerList.add(new Player("curry","30"));
playerList.add(new Player("zimug","27"));   // Note that the name is repeated here
playerList.add(new Player("zimug","18"));   //Note that the name and age are repeated here
playerList.add(new Player("zimug","18")); //Note that the name and age are repeated here
}

}

The Player object is an ordinary java object with two member variables name and age. It implements the constructor with parameters, toString, equals and hashCode methods, and GET/SET methods.

2, Overall de duplication of set elements

In the following four methods, the String type in the List is de duplicated as a whole in the unit of set element Object. If you put an Object object into your List, you need to implement the equals and hashCode methods of the Object. The code implementation method of de duplication is the same as that of List < String >.

The first method

It's the easiest thing for everyone to think of. First put the List data into the set. Because the set data structure itself has the function of de duplication, the result of de duplication is after converting set to List. This method will change the original order of List elements after de duplication, because HashSet itself is disordered, and TreeSet sorting is not the original order of List elements.

@Test
void testRemove1()  {
  /*Set<String> set = new HashSet<>(list);
  List<String> newList = new ArrayList<>(set);*/


//De duplication and sorting method (if it is a string, sort according to the alphabet. If it is an object, sort according to the Comparable interface)

//List<String> newList = new ArrayList<>(new TreeSet<>(list));




//Abbreviated method

List<String> newList = new ArrayList<>(new HashSet<>(list));


System.out.println( "Collection after de duplication: " + newList);

}

The console print results are as follows:

Collection after de duplication: [kobe, james, zimug, curry]

The second method

It is relatively simple to use. First, use the stream method to convert the collection into a stream, then distinct to remove the duplication, and finally collect the stream stream stream as a List.

@Test
void testRemove2()  {
  List<String> newList = list.stream().distinct().collect(Collectors.toList());
System.out.println( "Collection after de duplication: " + newList);

}

The console print results are as follows:

Collection after de duplication: [kobe, james, curry, zimug]

The third method
This method utilizes set Add (T). If the T element already exists in the collection, it returns false. Use this method to judge whether the data is repeated. If it is not repeated, it will be put into a new newList, which is the final de duplication result

//The three collection classes list, newList and set can ensure the order
@Test
void testRemove3()  {


Set<String> set = new HashSet<>();

List<String> newList = new  ArrayList<>();

for (String str :list) {

if(set.add(str)){ //false if repeated

newList.add(str);

}

}

System.out.println( "Collection after de duplication: " + newList);


}

The console print result is consistent with the second method.

The fourth method
This method has deviated from the idea of using Set set for de duplication, and uses NEWLIST instead The contains (T) method determines whether the data already exists when adding data to a new List. If it exists, it will not be added, so as to achieve the effect of de duplication.

//Optimize List, newList and set to ensure the order
@Test
void testRemove4() {


List<String> newList = new  ArrayList<>();

for (String cd:list) {

if(!newList.contains(cd)){  //Actively determine whether duplicate elements are included

newList.add(cd);

}

}

System.out.println( "Collection after de duplication: " + newList);


}

The console print result is consistent with the second method.

3, De duplication according to collection element object attributes

In fact, in the actual work, the application of overall de duplication according to the set element object is still relatively small, and more requires us to de duplicate according to some attributes of the element object.
After seeing this, please go back to the initialization data playerList constructed above, and pay special attention to some duplicate elements and duplicate member variables.

The first method
Implement the Comparator interface for TreeSet. If we want to de duplicate according to the name attribute of Player, we will compare the name in the Comparator interface. Two methods to implement Comparator interface are described below:

  • lambda expression: (O1, O2) - > O1 getName(). compareTo(o2.getName())
  • Method reference: comparator comparing(Player::getName)
@Test
void testRemove5() {
  //Set<Player> playerSet = new TreeSet<>((o1, o2) -> o1.getName().compareTo(o2.getName()));
  Set<Player> playerSet = new TreeSet<>(Comparator.comparing(Player::getName));
  playerSet.addAll(playerList);
/new ArrayList<>(playerSet).forEach(player->{

System.out.println(player.toString());

});/

//Print out the results after weight removal

new ArrayList<>(playerSet).forEach(System.out::println);

}

The output results are as follows: three zimug are duplicated because of the name, and the other two are de duplicated. However, because of the use of TreeSet, the elements in the list are reordered.

Player{name='curry', age='30'}
Player{name='james', age='32'}
Player{name='kobe', age='10000'}
Player{name='zimug', age='27'}

The second method
This method is used in many articles on the Internet to show that they are very good, but in the author's opinion, it's unnecessary to take off their pants and fart. Since everyone says there is such a method, I don't write as if I'm not a cow. Why do I say this method is "take off your pants and fart"?

  • First, use stream() to convert the list set into a stream
  • Then use collect and toCollection to convert the stream into a collection
  • Then the rest is the same as the first method

Didn't you take off your pants and fart in the first two steps? Just have a look. The practical application is of little significance, but if it is to learn the use of Stream flow, it is advisable to make such an example.

@Test
void testRemove6() {
  List<Player> newList = playerList.stream().collect(Collectors
          .collectingAndThen(
                  Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Player::getName))),
                  ArrayList::new));
newList.forEach(System.out::println);

}

The console printout is the same as the first method.

The third method

This method is also a method recommended by the author. At first glance, it seems that the amount of code is larger, but in fact, this method is a relatively simple method.

Some people call this assertion. From the perspective of English, it can be translated into predicate as a noun and assertion as a verb. Predicate is used to modify the subject. For example, a bird who likes singing is a predicate, which is used to limit the scope of the subject. Therefore, we are here to filter and limit the subject range, so I think it is more appropriate to translate it into predicate. Whatever. It depends on how you think it is reasonable and easy to remember.

  • First, we define a Predicate to filter. The filter condition is distinct bykey. The Predicate returns true, the element is retained, and the element returned false is filtered out.
  • Of course, our requirement is to filter out duplicate elements. Our de duplication logic is implemented by putIfAbsent of map. putIfAbsent method adds a key value pair. If there is no value corresponding to the key in the map set, it will be added directly and return null. If the corresponding value already exists, it will still be the original value.
  • If putIfAbsent returns null, it indicates that the data is added successfully (without repetition). If putIfAbsent returns value (value = = null: false), the conditional elements that meet the distinguishbykey predicate are filtered out.

Although this method seems to increase the amount of code, the distinguishbykey predicate method only needs to be defined once and can be reused indefinitely.

@Test
void testRemove7() {
  List<Player> newList = new ArrayList<>();
  playerList.stream().filter(distinctByKey(p -> p.getName()))  //filter keeps the value of true
          .forEach(newList::add);


newList.forEach(System.out::println);

}


static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {

Map<Object,Boolean> seen = new ConcurrentHashMap<>();

//putIfAbsent method adds a key value pair. If there is no value corresponding to the key in the map set, it will be added directly and return null. If the corresponding value already exists, it will still be the original value.

//If NULL is returned, it indicates that the data is added successfully (no repetition), and no repetition (null = = null: true)

return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;

}

The output results are as follows: three zimug are duplicated because of the name, and the other two are de duplicated. And did not disturb the original order of the List

Player{name='kobe', age='10000'}
Player{name='james', age='32'}
Player{name='curry', age='30'}
Player{name='zimug', age='27'}

The fourth method
The fourth method is not actually a new method. The above examples are de duplicated according to an object attribute. If we want to de duplicate according to several elements, we need to transform the above three methods.
I only modify one of them. The principle of other modifications is the same, that is, add multiple comparison attributes and compare them as a String attribute.

@Test
void testRemove8() {
  Set<Player> playerSet = new TreeSet<>(Comparator.comparing(o -> (o.getName() + "" + o.getAge())));


playerSet.addAll(playerList);


new ArrayList<>(playerSet).forEach(System.out::println);

}

Keywords: Java data structure list

Added by ronthu on Wed, 16 Feb 2022 09:35:30 +0200