Saturday, May 30, 2009

Move Construtors

For simplicity sake, we assume the name space 'std' is used here. Given the following:

vector<vector<string> > v;

We now have an expandable array of string arrays. Now I don't want to poke inside the possible different implementations the string class could have here. We assume the worst, that string is implemented using a single C array of characters and leave it at that.

However, also for the sake of this discussion, we assume that the vector template class capacity is always at least bigger than its size. So we can assert the following:

assert(v.size() <= v.capacity());

Where the capacity holds numerous empty slots at the end of the array for future allocations. So, if we have the following:

vector<string> v1, v2;

v.reserve(32);
v.push_back(v1);
v.push_back(v2);

We are can safely state that the insertion of v2 inside of v did not cause v1 to be re-created, copied and destroyed.

The problem comes from the following:

v.push_back(v1);
// v.push_back(v2..v(n - 1) where n == capacity == size
assert(v.capacity() == v.size());

The next time we insert into this vector, the capacity will be expanded, most implementations will double the size to avoid any extra copying that might be caused by being over-conservative.

This means we have the following algorithm going on inside v the next time we insert:

1. Save the capacity to a local variable and double it, we call this n
2. Allocate a new array of vector of size n.
3. Copy construct a copy of every element found in the original array
4. Destroy every element found in the original array
5. Delete the original array
6. Capacity is now equal to n
7. The new array replaces the old one
8. The new element is inserted at the position "size" and size is incremented by 1

The problem here are steps #3, #4 and #5. Vector simply can't move the pointers found in the array because ISO C++98 dictates that vector (and all standard C++ containers) be safe for storing objects, i.e., data types that have a virtual pointer table. In fact, in this case, vector can't even hold pointers to its elements because the standard also dictates that vector, like all other standard based containers, is value based for maximum flexibility.

So the problem is that all these vectors found inside 'v' are copied element by element, string by string and then destroyed.

Now, considering that capacity is usually expanded by 2 every time this limit is reached, this shouldn't be too bad right? Well, unfortunately no, without any compiler optimizations, returning the value from a function or method is sufficient to trigger a copy. So simply put:

v = do_something(v);

where do_something is defined to be:

vector<vector<string> > do_something(vector<vector<string> > v) {
return v;
}

is sufficient to trigger 2 copies, one by the value argument, and one by returning the value. Suffice it to say this is not very efficient especially considering how widespread the standard containers have now become in the world of C++.

The new ISO C++0x standard introduces a new feature to address this shortcoming, a move constructor. In the previous example, our vector template class defined a copy constructor using the following syntax in order to perform copies:

template<typename T>
vector<T> (const vector<T>& other) {
// perform the copy
}

However, we can now complement this constructor with a move constructor following this new syntax:

template<typename T>
vector<T> (vector<T>&& other) {
// OK, let's steal some memory, so for example
m_rawArray = other.m_rawArray;
other.m_rawArray = 0;
m_size = other.m_size;
other.m_size = 0;
}

Here, the move constructor only gets called when a copy is performed and the previous copy is also known to be at the end of its life. So that means it's OK for us to "steal" memory that belonged to our forebearer.

This is a simple feature that will address the main performance concern of the standard C++ based containers. Even if end users are unfamiliar with this new feature, they will still reap the benefits of it the next time they compile since most standard C++ library implementations (and the Boost libraries) will be enhanced to use move constructors. By using a move constructor here, vector can now avoid all the expensive copy operations it needed to perform in the previously showcased algorithm.

Monday, May 25, 2009

C++0x Feature Focus

Starting a new series of blog posts related to language changes found in the upcoming C++0X standard. These will focus on core language changes and not on any libraries. You will need gcc from the head branch or the Visual Studio 2010 beta 1 to compile any of the upcoming samples. Even then, some of the samples will not build even with these compilers since some features have yet to be implemented.

I want to highlight the following new features:
  • Move Contructors
  • Perfect Forwarding
  • Variadic Templates
  • Closures
Closures, a.k.a lambdas, are already implemented today in the libraries but the ISO committee decided it was time to put this in the core to have a unique syntax across all systems.