Six skills to improve the vector performance of C++11

Vector is like the Swiss Army knife of C++ STL container. Bjarne Stoutsoup has a saying – "generally, if you need a container, use vector". Ordinary people like us regard this sentence as truth and just need to do the same. However, like other tools, vector is just a tool. It can improve efficiency and reduce efficiency.

In this article, we can see six ways to optimize the use of vector. We will see effective methods and ineffective methods in the most common development tasks using vector, measure the performance improvement brought by the effective use of vector, and try to understand why such performance improvement can be achieved.

Construction and method of performance test:

  • All tests are run in my Surface Book. This notebook has a Core i7 processor with a main frequency of 2.6Ghz, 8 GB memory, Windows 10 operating system installed and compiled and run with VS2015 C + + compiler.
  • We will use Stopwatch. This tool was created by Kjell in https://github.com/KjellKod/Stopwatch You can find it.
  • We will run each test 100 times, and then calculate the average running time as a basis. The code to run the test is in here . You can download it freely to evaluate the performance of vector in your own system. The code snippet provided there reflects only one loop, which makes the event simple.
  • We store the data of TestStruct structure in the vector and use FillVector() to fill the vector. They are defined as follows.
// Test struct to be inserted/removed from vector
struct BigTestStruct
{
  int iValue = 1;
  float fValue;
  long lValue;
  double dValue;
  char cNameArr[10];
  int iValArr[100];
};
// Helper function to populate the test vectors
void FillVector(vector<BigTestStruct>& testVector)
{
  for (int i = 0; i < 10000; i++)
  {
    BigTestStruct bt;
    testVector.push_back(bt);
  }
}

Start the introduction of optimizing vector usage in C++ 11 immediately.

#1 allocate enough space in advance to avoid unnecessary reallocation and replication cycles

Programmers like to use vector because they only need to add elements to the container without worrying about the size of the container in advance. However, if you start with a vector with a capacity of 0, adding elements to it will cost a lot of running performance. If you know how many elements a vector needs to store, you should allocate enough space for them in advance.

Here is a simple example. Add 10000 test structure instances to the vector - first test without pre allocated space, and then test with pre allocated space.

vector<BigTestStruct> testVector1;
vector<BigTestStruct> testVector2;
sw.Restart();
FillVector(testVector1);
cout << "Time to Fill Vector Without Reservation:" << sw.ElapsedUs() << endl;
sw.Restart();
testVector2.reserve(10000);
FillVector(testVector2);
cout << "Time to Fill Vector With Reservation:" << sw.ElapsedUs() << endl;

In my computer, 5145 microseconds (us) were used in the case of no pre allocated space, while 1279 microseconds were used in the case of pre allocated space, and the performance was improved by 75.14%!!!

This situation is well explained in Scott Meyers' book called Effective STL-Article 50 experience in effective use of STL:

"For vector and string, when more space is needed, it will do something equivalent to realloc. This realloc like operation has four steps:

1. Create a new memory block whose capacity is several times of the current capacity of the container. In most implementations, the capacity improvement factor of vector and string is between 1.5 and 2.

2. Copy the elements from the memory originally occupied by the container to the newly allocated memory.

3. Release the objects in the original memory.

4. Release the original memory.

With all these operations: allocate, recycle, copy and release, you shouldn't be surprised if these steps (for performance) are extremely expensive. Of course, you certainly don't want to do this frequently. If this doesn't impress you, think about all the iterators, pointers and references in vector and string will fail every time you perform these steps. This means that a simple insert operation may cause updates to other data structures that use iterators, pointers or references in the current vector or string. * * "

#2. Using shrink_to_fit() releases the memory occupied by the vector, and – clear() or erase() does not.

Contrary to what you might think, deleting elements from a vector using erase() or clear() does not release the memory allocated to the vector. A simple experiment can prove this. We add 100 elements to a vector, and then call clear() and erase() on the vector. Then we can let the capacity() function tell us how many elements can be stored in the memory allocated for this container.

FillVector(testVector1);
size_t capacity = testVector1.capacity();
cout << "Capacity Before Erasing Elements:" << capacity << endl;
  
testVector1.erase(testVector1.begin(), testVector1.begin() + 3); //
capacity = testVector1.capacity();
cout << "Capacity After Erasing 3 elements Elements:" << capacity << endl;
testVector1.clear();
capacity = testVector1.capacity();
cout << "Capacity After clearing all emements:" << capacity << endl;
testVector1.shrink_to_fit();
capacity = testVector1.capacity();
cout << "Capacity After shrinking the Vector:" << capacity << endl;

Here is the output:

Capacity Before Erasing Elements:12138
Capacity After Erasing 3 elements Elements:12138
Capacity After clearing all emements:12138
Capacity After shrinking the Vector:0

As you can see from the above output, erase() or clear() will not reduce the memory occupied by the vector. If you reach a certain point in the code and no longer need a vector, use STD:: vector:: shrink_ to_ The fit () method frees up the memory it occupies.

Note that shrink_to_fit() may not be fully supported by all compiler vendors. In this case, you can use "Swap idiom" to empty the vector. The code is as follows:

container( c ). swap( c ); // Shrink to fit idiom, used to empty storage space

container().swap( c ); // Idiom for emptying all content and storage space

If you are interested in this, please check it out“ C++ Coding Standards: 101 Rules, Guidelines, and Best Practices ”Article # 82 of the book, which contains details on the idiomatic usage of swap.

#3 when filling or copying to vector, assignment should be used instead of insert() or push_back().

There are often three ways to take elements from one vector to fill another vector - assign the old vector to the new vector, use iterator based std::vector::insert() or use loop based * * std::vector::push_back(). ** These methods are shown below:

vector<BigTestStruct> sourceVector, destinationVector;
FillVector(sourceVector);
// Assign sourceVector to destination vector
sw.Restart();
destinationVector = sourceVector;
cout << "Assigning Vector :" << sw.ElapsedUs() << endl;
//Using std::vector::insert()
vector<BigTestStruct> sourceVector1, destinationVector1;
FillVector(sourceVector1);
sw.Restart();
destinationVector1.insert(destinationVector1.end(),
  sourceVector1.begin(),
  sourceVector1.end());
cout << "Using insert() :" << sw.ElapsedUs() << endl;

This is their performance:

Assignment: 589.54 us

insert(): 1321.27 us

push_back(): 5354.70 us

We can see that the vector assignment is 55.38% faster than insert(), and is faster than push()_ Back () is 89% faster.

Why is that?

Assignment is very efficient because it knows how big the vector to be copied is, and then only needs to manage the cache inside the vector at one time through memory.

Therefore, to efficiently populate a vector, you should first try to use assignment, then consider the iterator based insert(), and finally consider push_back. Of course, if you need to copy elements from other types of containers into the vector, the assignment method is not feasible. In this case, we have to consider the iterator based insert().

#4 avoid using the std::vector::at() function when traversing the std::vector element.

There are three ways to traverse a vector:

  1. Using Iterators
  2. Using the std::vector::at() member function
  3. Use subscript – [] operator

Each usage is shown below:

//Using an iterator
vector<BigTestStruct> testVectorSum;
FillVector(testVectorSum);
sw.Restart();
int sum = 0;
for (auto it = testVectorSum.begin(); it != testVectorSum.end(); ++it)
{
  sum = sum + it->iValue;
}
cout << "Using Iterator:" << sw.ElapsedUs() << endl;
  
//Using the at() member function
sw.Restart();
sum = 0;
for (unsigned i = 0; i < testVectorSum.size(); ++i)
{
  sum = sum + testVectorSum.at(i).iValue;
}
cout << "Using at() :" << sw.ElapsedUs() << endl;
  
// Using the subscript notation
sw.Restart();
sum = 0;
for (unsigned i = 0; i < testVectorSum.size(); ++i)
{
  sum = sum + testVectorSum[i].iValue;
}
cout << "Using subscripting:" << sw.ElapsedUs() << endl;

The output is:

Using Iterator:0
Using at() :3.73
Using subscripting:0

Obviously, accessing the vector element with the std::vector::at() function is the slowest.

#5 try to avoid inserting elements in front of the vector

The complexity of any insertion operation in the front of the vector is O(n). Inserting data at the front is inefficient because each element item in the vector must make room for a new item to be copied. If you insert multiple times in front of the vector, you may need to reassess your overall architecture.

Make an interesting attempt. The following is a comparison between inserting in front of std::vector and inserting in front of std::list:

vector<BigTestStruct> sourceVector3, pushFrontTestVector;
FillVector(sourceVector3);
list<BigTestStruct> pushFrontTestList;
//Push 100k elements in front of the new vector -- this is horrible code !!! 
sw.Restart();
for (unsigned i = 1; i < sourceVector3.size(); ++i)
{
  pushFrontTestVector.insert(pushFrontTestVector.begin(), sourceVector3[i]);
}
cout << "Pushing in front of Vector :" << sw.ElapsedUs() << endl;
// push in front of a list
sw.Restart();
for (unsigned i = 0; i < sourceVector3.size(); ++i)
{
  pushFrontTestList.push_front(sourceVector3[i]);
}
cout << "Pushing in front of list :" << sw.ElapsedUs() << endl;

If I run this test 10 and use a vector containing 100 elements, the output is as follows:

Average of Pushing in front of Vector :11999.4
Average of Pushing in front of list :20.36

**The insertion operation in the front part of the list is about 58836% faster than that in the front part of the vector** Don't be surprised, because the complexity of the algorithm for inserting elements in front of the list is O(1). Obviously, the more elements the vector contains, the worse the result of this performance test will be.

#6 use empty when inserting elements into a vector_ Back () instead of push_back().

Almost everyone who catches up with the trend of C++11 clearly agrees with the method of "placement" to insert elements into STL containers. In theory, "resettlement" is more efficient. However, all practices show that sometimes the performance difference is even negligible.

Consider the following code:

vector<BigTestStruct> sourceVector4, pushBackTestVector, emplaceBackTestVector;
FillVector(sourceVector4);
//Test push back performance
sw.Restart();
for (unsigned i = 0; i < sourceVector4.size(); ++i)
{
  pushBackTestVector.push_back(sourceVector4[i]);
}
cout << "Using push_back :" << sw.ElapsedUs() << endl;
//Test emplace_back()
sw.Restart();
for (unsigned i = 0; i < sourceVector4.size(); ++i)
{
  emplaceBackTestVector.emplace_back(sourceVector4[i]);
}
cout << "Using emplace_back :" << sw.ElapsedUs() << endl;

If you run it 100 times, you will get the following output:

Average Using push_back :5431.58
Average Using emplace_back :5254.64

It is clear that the "placement" function performs better than the insertion function - but only 177 microseconds. In all cases, they are roughly equivalent.

The Emplacement function may be faster only if:

  1. The value to be added is constructed in the vector, not assigned.
  2. The type of parameter passed is different from that saved in the vector. For example, if a vector contains STD:: string, but we pass a string value to the vector.

Even if neither of the above conditions is true, as shown in this example, you should not be taken lightly by using the replacement during insertion.
For more details on replacement vs. insertion, please check Scott Meyer's“ Effective Modern C++: 42 Specific Ways to Improve Your Use of C++11 and C++14 "Entry 42 in the book.

epilogue

Like any third-party statistics, you should not blindly rely on the results and suggestions provided here. You may encounter many uncertainties when testing on different operating systems, processor architectures and compiler settings. Therefore, you need to make your own measurement based on the actual data.

Keywords: C++ Back-end

Added by tommy445 on Sun, 30 Jan 2022 11:13:38 +0200