Silmor . de
Site Links:
Impressum / Publisher

Readable C++

A few code beauty enhancing hints and tricks for beginners and experts.

C++ is often seen as a language that is inherently ugly and unreadable. It is true, it is very easy to write bad C++ code and sometimes rather hard to write good C++ code. For almost any definition of "good" and "bad". This article describes a few techniques and practices that should make things easier. As usual a lot of real and self-proclaimed experts will disagree with my opinions and doubt my experiences. This is entirely expected. Feel free to disagree.

Man versus Machine?

Almost all languages have the purpose of facilitating communication. When you use a programming language you normally do not use it in a vacuum - even if it feels that way sometimes (we've all had these "choice assignments") - you are communicating with both the machine and with fellow programmers (including yourself a few hours or a year later). There are at least three groups "listening" to your programming:

  1. The Machine
    • you are telling the compiler to create an executable program or a library
    • you are negotiating with the Operating System and with system libraries to get resources (files, various bits of data, space on the display, ...) and actions (changing the color of some pixel, manipulating resources, ...)
  2. The Users
    • (philosophically) the program is communicating with users on your behalf - it shows exactly the behavior you gave it
    • you are providing functionality to users by making the program take their input and handling it in a specific way
    • you are restricting users by not giving them some functionality and/or by telling them that they lack some privilege
  3. Fellow Programmers
    • ...including yourself a few hours to a few years later...
    • ...including co-workers who work in other modules of the program, QA people, maintenance staff, etc.
    • you give them code they need to interpret whenever there is a problem or whenever some change is required
    • you give them interpretations of your own code whenever you comment it (hopefully you do comment it)
    • you give them new vocabulary by implementing interfaces, structures, methods, ...
    • you restrict the that same vocabulary by making narrow definitions of what it can do - sometimes this is good, sometimes they'll hate you for it (sometimes both at the same time)
    • ...you receive the same kind of communication from co-workers or yourself a few hours/years ago!

Above are just a few examples of the communication going on while you program. In the following article I'll concentrate on making communication with group three easier with a few hints thrown in for reducing the misunderstandings the compiler may have reading your code.

So why should you care? Because you are a big part of group three! If you fail to tell yourself what you were thinking when you wrote some piece of code you'll be forcing yourself to write it again and again...

C++-98, -03 and -11

The C++ language is standardised. C++-98 (sometimes known as C++-03) is the standard supported by most C++ compilers - the ones not supporting this standard have become increasingly rare. After all they had a few years to support it - it was released in 1998 and revised in 2003 (so I'll use C++-98 and C++-03 interchangably in this article). C++-11 is the newest standard for C++ - originally planned to be C++-07, then -09, -10 and finally released in November 2011. The newest versions of GCC, LLVM, and MSVC support it at least to some degree.

I'll mark the code examples below that only work with C++-11.

I will not waste much space on praising the most visible new features of C++-11 (e.g. lambdas and variadic templates) - you won't need them nearly as often as you might think and others have written about them in great detail already.

Featuritis and Legacy

C++ is a very rich language with dozens of features in hundreds of combinations. Each of those features exists for a specific purpose - classes exist to encapsulate data with the code that manipulates it, templates exist to enable type safe programming of generic algorithms.

Features should not be abused - e.g. just because you know how to solve a linear equation using nothing but templates it does not mean that you should actually do it when there are easier ways of accomplishing your task. On the other hand it is certainly possible to construct a sorting algorithm with some labyrinthine mix of classes and helper classes, but usually a ten-line template will be much easier to write and use.

C++ also has (almost) full backwards compatibility to ANSI C. Including all the "preprocessor magic" that earns you the "wizard most wise" status among ANSI C programmers. And including the ability to program in a purely imperative style without any hint of using an object oriented language. All this exists so that C++ code can easily interface with C libraries and be exported to interfaces of other languages (which usually interface with C).

This is not to say that there are no valid use cases for C features in C++ outside interfacing with C code. But you should always check whether it is the most effective method of doing something.

Avoiding #define

There are three very frequent uses of #define: defining a constant, re-use of code snippets, and adaptive code. Each of those has equivalents in pure C++.

Defining constants is the easiest candidate for replacement - you can use the const statement to declare a constant value:

const int myconstant = 42;

The biggest advantage of constants is that they are type safe and that they can be used for complex class instances.

An often heard argument against constants is that they use up more memory than defines. In fact the memory use of the above is exactly the same, even when the code is not optimized (it may create more entries in the debugger symbol table though). Both using a #define and using a constant creates an instance in the memory of that compilation unit. This is reasonable for simple types like the above, which behave faster when instantiated and optimized in place. For more complex classes you may want to optimize for memory though:

//header file (myclass.h):
// only declare the class and name of the instance, no initialization
extern const MyClass myconstant;

//implementation file (myclass.cpp):
// do the initialization
const MyClass myconstant(42);

When used like this the memory is only reserved once and the constant only initialized once. The downside is that the compiler cannot optimize access to the instance, but the potential for optimization is quite low with access to most classes. You do however save the initialization of the instances that do not need to be created.

Another use of defines is to create shared code snippets - most of these can more effectively be implemented by inline functions:

//C-style code:
#define MYCODE c += a + b;

//C++-style code:
inline void mycode(int&c, int a, int b)
{
  c += a + b;
}

The advantage of the inline method is that it is type safe and the compiler may generate an actual instance of the function if a pointer to the function is needed. If you do not want to generate an exported symbol you may add the static keyword to the inline ... declaration.

Define preprocessor statements are sometimes used to create algorithms that are type agnostic. In those cases templates can be used as a type safe alternative:

//C-style macro:
#define LESSTHAN(x,y) (x)<(y)

//C++-style algorithm
template <class T> 
bool lessThan(const T&x, const T&y)
{
  return x < y ;
}

The Small Things

C++ has several "small" features that make life easier and often less surprising. Unfortunately they are often not well known.

Explicit Constructors

Constructors with a single argument are automatically used as casts. That is not always wanted.

//my class
class C{ public:
  //default constructor (no argument)
  C(){}
  //initialize with some parameter
  C(int size){/* ... */}
};

//build a C with size 4
C c(4);
//oops. we never intended for converting int to C
c = 42;

This can be avoided by declaring those constructors as explicit:

//my class
class C{ public:
  //default constructor (no argument)
  C(){}
  //initialize with some parameter
  explicit C(int size){/* ... */}
};

//build a C with size 4
C c(4);
//error, no implicit cast:
c = 42;

Hint: for most Qt derived classes you will define a constructor that takes QObject*parent as its only (often optional) argument: always define this constructor as explicit - the argument is a parent, not something to convert from.

Const, Const Everywhere and not a Bit to Change!

The const keyword is normally used to declare a classic constant - some value that is re-used in several places - as explained above in the section about replacing #define. But const can also be used more literally - making a value read-only. This can be used to avoid a very common error when comparing values:

QString value=getSomeValue();
//this is how a comparison should work:
if(value=="hello")doSomething();

//this is a typo - oops!
//it actually assigns "world" to value and evaluates to true
if(value="world")doSomethingElse();

Using const this mistake can be avoided:

const QString value=getSomeValue();
//this is how a comparison should work:
if(value=="hello")doSomething();

//this is a typo - and generates a compiler error!
if(value="world")doSomethingElse();

Use const as often as possible - this ensures that you do not accidentally overwrite read-only values when you really want to compare them.

Default and Delete Constructors/Operators

C++ creates some constructors and the copy assignment operator by itself, but it does not always do so - it does not create constructors if any other constructor is present, no operators are automatically created if another assignment operator exists. If you need them you have to implement them yourself with C++-98.

With C++-11 you can explicitly ask the compiler to create or not create these constructors:

class C{
public:
  //this would normally stop the compiler from creating C()
  C(int);
  //force automatic creation of the default constructor
  C()=default;
  //explicitly delete the copy constructor
  C(const C&)=delete;
  
  //force creation of the assignment operator
  C& operator=(const C&)=default;
};

Override

How often does it happen that you want to override a virtual function and then hunt for hours after it just to find out you mistyped? Unless you have never attempted to override virtual functions the answer is probably "very often".

C++-11 allows you to explicitly state that you want to override a function:

class Base {
public:
  virtual void myfunc();
};

class Derived:public Base {
public:
  //ok, override Base::myfunc()
  virtual void myfunc() override ;
  
  //oops, a typo, will cause a compiler error
  virtual void mytypo() override ;
  
  //also error: wrong argument type
  virtual void myfunc(int) override ;
  
  //ok, no override statement
  virtual void mynewfunc();
};

The override keyword tells the compiler to check that the exact same method exists in one of the parent classes - if it does not exist, the compiler stops with an error.

nullptr

In random intervals you'll find some code like x=0; and you'll find yourself wondering whether x is a number (int, double?), a boolean, or a pointer.

C++-98 and earlier versions already solve the boolean problem by providing the bool type and its two possible values true and false. Integers and Pointers automatically cast to bool to allow backwards compatibility.

The remaining issue is that the same constant is used for integer zero and the null pointer. So for example the call foo(0); could mean foo(int) or foo(char*). Another problem is that the constant 0 cannot be simply replaced by the traditional ANSI-C value NULL (an alias for (void*)0), because the poiter types should ot be cast automatically into each other (hence void* cannot be cast automatically into another pointer type).

C++-11 solves that problem by introducing the type nullptr_t and its sole value nullptr. The value nullptr represents the null pointer - replacing the ANSI-C NULL and the C++ constant 0 for pointers. The new nullptr value automatically casts to a null pointer of any pointer type and to boolean false, but not to integer zero. So it can be used as a truth statement (if(nullptr)...), but a call like foo(nullptr) will not accidentally call foo(int).

Range-Based For-Loops

//get some list temporary and iterate through it
const QStringList templist=somecallreturninglist();
for(int i=0;i<templist.size();i++)
  dosomething(templist[i]);

The above construct is quite cumbersome - all you want to do is iterate through a temporary list of values, instead we get a not so temporary list variable and an additional loop index variable. Especially C++ beginners tend to stumble over whether to start counting with 0 or 1 and whether to count to the full size() or size()-1 - those mistakes are called "off-by-one" errors.

C++-11 gives us range-based loops for this. These allow to keep the list private, save a few calls (specifically to the size() method) and work more intuitively with lists:

QStringList somecallreturninglist();

//get some list temporary and iterate through it
for( const QString&temp : somecallreturninglist() )
  dosomething(temp);

Range-based loops work with standard C-type arrays (at least as long as the compiler knows their size) and with any collection class that follows the standard pattern of providing a begin() and end() methods that return a kind of iterator that can be increased, compared and de-referenced. All Qt collection classes fit this description, so do the collections in the C++ Standard Template Library.

Auto-Typing

Often it is tricky to the exact type of a temporary instance or you do not care much about it - you just want the content. Every expression has a result type, if it is assigned to a variable of a different type or handed to another function expecting a different type then the compiler tries to cast the value automatically. For example:

double foo(){return 1.0;}
int bar(int x){return x*2;}

char y = bar( foo() );

Above the expression foo() returns a floating point number (double). When handing this result to bar(int) it is automatically casted to integer. When assigning this (32bit) integer to a char (8bit) it is again casted to the new type.

C++-11 provides automatic type deduction. We can tell the compiler to use the exact same type for a variable that the expression returns:

auto z = foo();

The variable z will have the same type that is returned by foo() - i.e. it will be of the type int. We can even extend the declaration with a const declaration or specify it as a reference. With this we can make the range-based loop above even more efficient:

QStringList somecallreturninglist();

for( const auto&temp : somecallreturninglist() )
  dosomething(temp);

The result of the somecallreturninglist() is a QStringList - the range based for loop uses the begin() and end() functions to get iterators, the result of de-referencing those iterators is a QString - declaring the loop variable as const auto& makes it effectively a const QString&.

We can go further and declare a variable or parameter to have the same type as some expression:

//define a function
double foo(){return 1.02;}
//declare a type foo_t to have the return type of foo()
typedef decltype(foo()) foo_t;

//use it to define another function
foo_t bar(foo_t x){return x*2;}

Or with more complex expressions:

//define a variable to have the right type for part of a string:
decltype(mystring[5]) mychar;

Initially...

Another C++-11 feature is direct member variable assignment:

class C {
  private:
    //fallback value for direct initialization
    int myint=42;
  public:
    //implicitly assign myint=42
    C(){}
    //explicitly assign myint=i
    C(int i):myint(i){}
};

Unless a variable member is explicitly initialized in a constructor the direct assignment is used by the compiler as a fallback. This way you can ensure variable members are initialized even if this is not done explicitly in all of the constructors.

Enumerating the Stars...

Booleans and integers are often abused to select between equally acceptable states - in the "readable" version of this using constants to distinguish between choices. This is done especially often by former C programmers (it was kind of correct there). But you should ask yourself some simple questions:

Integers: computers, and in extension computer programmers, are very comfortable with simple whole numbers - after all that's the easiest thing one can interpret into a group of bits and bits are the only thing computers truly understand. The danger is to overuse them. Integers are very good for expressing whole numbers. Simple ordered values that can be added, subtracted, multiplied, divided, and subjected to a bunch of other mathematical operations. They are an odd thing to use when they are simply used to enumerate values that are not perfectly ordered. For example state machines are much easier to understand if the states have actual names (like "Start", "ParseInputs", "Calculate", "OutputStuff", "Done") instead of numericals (0, 1, 2, 7, ...). Likewise levels, flags, hints all become more readable when they have names instead of a simple number - it may be obvious when you write a logger class that an Error is "higher" than a "Warning", but a few days later it will be hard to remember whether "worse" was a lower or higher number. Use enums whenever the value is not primarily meant for calculations or switching. Use bool when it is a simple switch (see next paragraph).

Boolean: can I describe that choice in terms of switching one specific feature on/off? Or is it more correctly described by "I chose between equally valid alternatives A or B"? Even if it is a switch: might the sources become more readable if I have a symbolic name for it? (See for example the QString::split method.) Only after deciding that is definitely an on/off switch and the code is perfectly readable and clear using a boolean should you actually use a boolean. An example would be the "enabled" property of QWidget which you normally access either as property or with the isEnabled() and setEnabled(bool) methods. If you would access it in the constructor or as an incidental parameter to some other method it is better to use an enum, since that is more readable.

Here are some simple examples:

Use Integers when you care about numerical values, you can count it, or your primary intent is to calculate with the value:

//counting:
int numberOfPieces();
void addRowsToTable(int numRows);

//numeric value:
int getMouseXCoordinate();
void setCursor(int row, int column);

//calculate:
int convertMouseLocalToGlobalX(int localX);
qint64 fileAgeInSecondsSinceEpoch(QString filename);

Use booleans for simple switches, when the context in which it is use already makes it clear for what it is used:

void setEnabled(bool);
bool isEnabled();
bool fileExists(QString filename);

Do not use bool when the context is ambiguous:

MyFile myfile("hello", true); //what does "true" mean?
myfile.startProcess("myproc", false); //what does "false" mean?

Instead use enums for those cases:

class MyFile {
  enum CreateMode { NonExclusiveCreate, ExclusiveCreate };
  enum RunMode { RunInBackground, RunInForeground };
  //....
};

MyFile myfile("hello", MyFile::ExclusiveCreate);
myfile.startProcess("myproc", MyFile::RunInBackground);

Also use enums in all other cases:

enum ParserState {
  StartState = 0,
  ParseInputsState,
  CalculateState,
  DebugState,
  OutputStuffState,
  DoneState
};

enum LogLevel {
  DebugLevel = 0,
  InfoLevel = 1,
  WarningLevel = 10,
  ErrorLevel = 20
};

As you can see above: it is quite possible to assign numeric constants to enum values (make sure they stay identical for a very long time, otherwise you have to recompile a lot). You can even use them internally for comparison, as long as the users do not have to bother:

void logSomeText(QString text, LogLevel level)
{
  // implicitly use the integer value of the enum
  if(level <= DebugLevel && doNotDebug)return;
  //...
}

//...
logSomeText("hello world!", InfoLevel);

One property of enums that is sometimes irritating is that values of several enum types share the same namespace if those types are defined in the same namespace (that's why I called them "DebugState" and "DebugLevel" instead of simply "Debug" above). Another one is that enums are automatically casted to integer or boolean, even when this was probably not intended (int x=InfoLevel+3;). C++-11 solves these problems by introducing enum class. The logging example above could now look a bit cleaner:

enum class LogLevel {
  Debug = 0, //no need to append "Level"
  Info = 1,
  Warning = 10,
  Error = 20
};

void logSomeText(QString text, LogLevel level)
{
  // use the type name to disambiguate values
  if( level == LogLevel::Error) runForTheHills();
  // explicitly cast to integer
  if((int)level <= (int)LogLevel::Debug && doNotDebug)return;
  //...
}

//...
logSomeText("hello world!", LogLevel::Info);

We now use the enum type name as namespace for its values, so we do not need to append or prefix additional information to disambiguate values. And we have to explicitly cast values to int if we want to use the numeric value - which makes it abundandly clear that something "hacky" is happening.

As a matter of good form I recommend you use the explicit cast-to-int even if you do not have to - this makes the source easier to comprehend for others.

Lambdas

C++-11 introduces lambda expressions. Lambdas are small inline functions that can be stored and used later. The most common use case is to hand them to outside algorithms to perform some customizable task - for example the comparison operator to a sorting algorithm.

Like templates lambdas make the code more readable when used at the right place for a task that is well-suited for them, and just like templates they make the code horribly complex and unreadable when used too often or in the wrong circumstances.

Lambdas have a syntax that is similar to a function declaration with an odd looking function name:

//define the lambda
auto mylambda = [](int x){ qDebug()<<x; } ;

//use the lambda
mylambda(42);
mylambda(6*9);

The brackets [] tell the compiler that this is a lambda expression, or "anonymous function". The parameters in parentheses, in this example (int x), work exactly like parameters to functions. The body of the lambda is also written just like for a normal function. The difference is that a lambda can be placed in the middle of an expression and be assigned to a variable or function parameter.

If you want to specify the return type of the lambda you have to use a rather unfamiliar syntax:

//define the lambda
auto mylambda = [](int x)->int{ qDebug()<<x; return x*2; } ;

//use the lambda
qDebug() << mylambda(42);

You have to specify the type if there is more than one return statement in the lambda expression, otherwise the compiler is able to figure it out on its own and you can leave it out.

Another convenience is that if the lambda takes no parameters and the compiler can determine the return type on its own, then you can leave out the return type specification (->int) as well as the empty parentheses:

auto mylambda = []{return 42;} ;

A lambda like this can encapsulate simple formulas and anonymous functions, but the lambda concept is more powerful than this. A lambda can use variables from its environment to change its behavior and/or the environment:

//some environment
int env=42;

//non-referencing lambda, which will not compile:
auto nonref = []{ qDebug() << env; } ; //error: no access to "env"

//copying lambda: it sees a copy of "env"
auto copy = [=]{
  qDebug() << "copy lambda sees env as" << env;
  env = 5; //this has no effect outside
  qDebug() << "copy lambda now sees env as" << env;
} ;

//referencing lambda: it has direct access to "env"
auto ref = [&]{
  qDebug() << "reference lambda sees env as" << env;
  env = 5; //this changes everything!
} ;

//now call them:
copy(); // "copy lambda sees env as 42"
        // "copy lambda now sees env as 5"
qDebug() << env; // 42
ref(); // "reference lambda sees env as 42"
qDebug() << env; // 5

Use case 1: Algorithms.
Often algorithms need to be parametrized - for example qSort can be used to sort lists of values that do not have a comparison operator by handing it our own implementation of "LessThan" or the sorting order can be changed by handing it a different operator.

//a complex type with no LessThan operator:
struct MyStruct {
  int x,y;
  //...
};

//define an "operator"
static bool MyStructLessThan(const MyStruct&a, const MyStruct&b)
{
  //sort the absolutes of x in ascending order
  return abs(a.x) < abs(b.x);
}

//define another "operator"
static bool MyStructGreaterThan(const MyStruct&a, const MyStruct&b)
{
  //sort the absolutes of x in ascending order
  return abs(a.x) > abs(b.x);
}

//sort a list...
QList<MyStruct> mylist = getListFromSomewhere();
qSort(mylist.begin(), mylist.end(), &MyStructLessThan );

//sort it in reverse order...
qSort(mylist.begin(), mylist.end(), &MyStructGreaterThan ); 

Using lambda expressions these can be simplified by defining the operators inline:

//sort a list...
QList<MyStruct> mylist = getListFromSomewhere();
qSort(mylist.begin(), mylist.end(), [](const MyStruct&a, const MyStruct&b){return abs(a.x)<abs(b.x);} );

//sort it in reverse order...
qSort(mylist.begin(), mylist.end(), [](const MyStruct&a, const MyStruct&b){return abs(a.x)>abs(b.x);} ); 

Using templates it is possible to define new algorithms that are adaptable by lambdas:

#include <functional>
#include <QList>

//an algorithm to extract elements from a list:
template<class T>
QList<T> myselect(const QList<T>&list , std::function<bool(const T&)> selector)
{
  QList<T>ret;
  for(const T&t:list)
    if(selector(t))
      ret.append(t);
  return ret;
}

//define a list
QList<My> my = getListFromSomewhere();
//use the new algorithm to extract all elements with an x less than 8
QList<My>result = myselect<My>(my,[](const My≈m){return m.x<8;} ); 

The std::function template comes from C++-11 and allows to wrap function pointers, pointers to object methods and lambdas equally - so all use cases of code snippets that parametrize other code are covered by it in a type safe manner.

Use case 2: Simple Reactions to Signals.
If you are using Qt5 together with a C++-11 capable compiler you can connect method pointers or even lambdas to signals. This can be very useful if you do not want to go to the trouble of subclassing just to (for example) display a small dialog.

As an example let's devise a very simple dialog that checks an entered value before it closes. With Qt4 and/or without lambdas it looks like this:

class MyDialog:public QDialog
{
  Q_OBJECT
  QLineEdit *line;
  public:
    MyDialog()
    {
       QVBoxLayout*vl;
       setLayout(vl=new QVBoxLayout);
       vl->addWidget(line = new QLineEdit);
       QPushButton*p;
       vl->addWidget(p=new QPushButton("Try"));
       connect(p,SIGNAL(clicked()), this,SLOT(tryit()));
    }
  private slots:
    void tryit()
    {
       //only accept it if the line contains "hello"
       if(line->text().contains("hello"))
         accept();
    }
};

//use it
void myDialogFunction()
{
  MyDialog d;
  if(d.exec()==QDialog::Accepted)
    doSomething();
}

Using lambdas this entire dialog can be defined inline:

void myDialogFunction()
{
  //no need to derive a subclass
  QDialog d;
  //this used to be the constructor
  QVBoxLayout*vl;
  d.setLayout(vl=new QVBoxLayout);
  QLineEdit *line;
  vl->addWidget(line = new QLineEdit);
  QPushButton*p;
  vl->addWidget(p=new QPushButton("Try"));
  //rolling the connect and tryit() method into one
  connect(p,&QPushButton::clicked, [&]{
    if(line->text().contains("hello"))
      d.accept();
  } );
  //original myDialogFunction code
  if(d.exec()==QDialog::Accepted)
    doSomething();
}

A Fair Warning: When NOT to Use Lambdas.
As you can see above it can be a bit confusing to read lambdas. So they should be kept as short as possible, especially when they are used directly as parameters in a function or method call. As a general guideline a lambda should contain exactly one semicolon, two in extreme cases - if you need more it is not a good candidate for a lambda, a method or function would be a better candidate and more readable.

You should also be careful about the life time of a lambda. A lambda can reference the stack frame it was created on - if the reference to the lambda lives longer than that stack frame you will end up with "very interesting" crashes when the lambda is called after its referenced objects have been deleted. The examples above are all non-critical: algorithms execute and forget the lambda before they return, so the referenced stack frame is guaranteed to exist during the execution of the algorithm; the dialog example above is uncritical because the function blocks on d.exec() and the reference to the lambda (i.e. the connection from the buttons signal to the lambda) is automatically removed at the same time as the lambdas original stack frame. If the dialog lived on after the function is finished you cannot use lambdas with references to that stack frame in a connect statement.


Webmaster: webmaster AT silmor DOT de