Category Archives: Programming

Accu 2014 Conference Notes

I had the chance to go to ACCU 2014 the other week (full conference schedule is here) and I have to say it was one of the best conferences I’ve had the pleasure to attend. And while it did confirm my idea that C++ is getting the point of saturation and ridiculous excess (C++11 was needed, as a result so was C++14, but C++17… Just use a more suitable language if these are the features you need), the talks I went to, on the whole, we’re great.

So I thought I may as well post up the notes I made from each talk – and while they might be a bit of a brain dump of the time, if there’s something that sparks your interest, I’m sure the full presentations will be posted up at some point soon.

 

Get Archaeology
Charles Bailey, Bloomberg LP

Interesting talk looking at some of the more esoteric ways of presenting and searching for information within an existing Git repository. Unfortunately, time was short so the speaker had to rush through the last part, which was, for me, the most relevant and interesting part.

Talk notes

 

Designing C++ Headers Best Practices
Alan Griffiths

For most experienced C++ developers the content of this talk is probably quite basic, as you’d have a large portion of this covered already through sheer trial and error over the years. But clarifying some good practices and interesting side effects was enjoyable.

Talk notes

 

Version Control – Patterns and Practices
Chris Oldwood, chrisoldwood.blogspot.com

Highlevel overview of the various patterns we use when using version control (focused mostly on DVC), some useful examples and some interesting discussion about trust issues…

Talk notes

 

Performance Choices
Dietmar Kuhl, Bloomberg LP

Proof that if you want to know about optimisation and performance, stick to asking people in the games industry.

I’m not posting these notes.

 

Crafting More Effective Technical Presentation
Dirk Haun, www.themobilepresenter.com

Really good presentation on how to craft good presentations – some interesting discussions on the make up of the human brain, why certain techniques work and why the vast majority of technical talks (or just talks in general to be honest) do what they do.

Talk notes

 

The Evolution of Good Code
Arjan van Leeuwen, Opera Software

Great talk, not telling us what good code is, but examining a few in-vougue books over the last decade to see where they sit on various contentious topics. Note that when the notes say “no-one argued for/against” it’s just referencing the books being discussed!

Talk notes

 

Software Quality Dashboard for Agile Teams
Alexander Bogush

Brilliant talk about the various metrics Alexander uses to measure the quality of their code base. If you’re sick of agile, don’t let the title put you off, this is valuable reading regardless of your development methodology.

Talk notes

 

Automated Test Hell (or There and Back Again)
Wojciech Seiga, JIRA product manager

Another great talk, this time discussing how the JIRA team took a (very) legacy project with a (very) large and fragile test structure into something much more suitable to quick iteration and testing. When someone says some of their tests took 8 days to complete, you have to wonder how they didn’t just throw it all away!

Talk notes

 

Why Agile Doesn’t Scale – and what you can do about it
Dan North, http://dannorth.net

Interesting talk, arguing that Agile is simply not designed to, nor was it ever imaged to, be a scalable development methodology (where scale is defined as bigger teams and wider problems). Excellently covered why and how agile adoption fails and how this can be avoided to the point where agile principles can be used on much larger and less flexible teams.

Talk notes

 

Biggest Mistakes in C++11
Nicolai M Josuttis, IT-communications.com

Entertaining talk where Nicolai, a member of the C++ Standard Committee library working group, covers various features of C++11 that simply didn’t work or shouldn’t have been included in the standard. When it gets to the point that Standard Committee members in the audience are arguing about how something should or does work, you know they’ve taken it to far.

Talk notes

 

Everything You Ever Wanted to Know About Move Semantics (and then some…)
Howard Hinnant, Ripple Labs

Detailed talk on move semantics which are never as complicated as they (and even the speaker was) are made out to seem. There’s some dumb issues, which seem to be increasing as the C++ standards committee don’t seem to understand the scale of the changes being made, but never-the-less it was a good overview of what is a key improvement to C++.

Talk notes

GLFW – Compiling Source and Creating an Xcode Project

GLFW is a great library – easy to use, great documentation covering it’s API design. But if you’re not on Windows where they provide pre-built libraries, going from a source download to a compiling Xcode project is _not_easy.

Since brew (as of writing) only provides GLFW 2, you need to build GLFW 3 yourself.

There is barely any information on it at all, it assumes you know exactly what you’ve downloaded (the source and some CMake files btw) and that you’ll be able to figure it all out (there is some additional information if you root around the GitHub forums but it’s not exactly easy to find).

So I’m posting up how I got GLFW to compile and running in Xcode here in-case anyone else starts banging their head against a brick wall.

First off, download the source followed by Cmake (I downloaded the 64/32-bit .dmg for 2.8.11.2).

Now to get it to compile you need to set up a few options. You can do this in the Cmake GUI, but I found it much easier to do it manually. Browse to GLFW/Src and open config.h.in.

This is a settings file for the compile you’re about to perform. There are a lot of settings in here you can play around with (they are explained in the README.md file in the root GLFW folder), but I enabled the following

  • _GLFW_COCOA
  • _GLFW_NSGL
  • _GLFW_NO_DLOAD_WINMM (this is a Windows only define, but I enabled it so adding it here anyway)
  • _GLFW_USE_OPENGL

 

Save the file and then you’re ready to compile.

Open the Terminal and browse to the root of the GLFW folder and enter the following commands

cmake .
make install

The Cmake command will set up the project ready to be built. It’ll take the config file we modified earlier and create a valid config.h file and it’ll carry out any other set up needed as well.  Calling make builds the project, the install command installs the library in it’s default location.

Now you’re ready to add GLFW to your Xcode project. To keep it simple I’ll step through adding it to a new project, once you can do that adding it to an existing one is pretty easy.

  • Open Xcode and create a new OSX Command Line Tool project
  • In the project settings add the path to the GLFW include file in the ‘Header Search Paths’ and the path to the libglfw3.a lib to the ‘Library Search Paths’ option (both of these paths are shown in the install make output)
  • Add libglfw3.a to the Link build phase
  • You also need to add the following Frameworks to the build phase to allow the project to compile
    • IOKit
    • Cocoa
    • OpenGL

 

You can now copy/paste the sample code provided on the GLFW website into the existing main.c file, hit compile and you have a running GLFW project.

Note that the sample code doesn’t call glclear before glfwSwapBuffers so what you get in the open window is undefined.

Ruby, Jenkins and Mac OS


I’ve been using Jenkins as my CI server for a while and though user permissions has always been a bit of an issue (I’ll explain why in another blog post covering my Unity build process) running command line tools has never been to much of a problem once it got going.

At least not until I tried to run Ruby 1.9.3 via RVM on our Mac Jenkins server.

I’d developed the Ruby scripts outside Jenkins so I knew they worked, but when I came to run them through Jenkins (using nothing more than the ‘Execute Shell’ build step) it started behaving strangely. Running the script caused the step to fail instantly claiming it couldn’t find any of my installed Gems.

A quick ‘gem query –local’ on the command line listed all the gems I needed were there.

As an experiment I added a build step that installed the Trollop gem (a fantastic gem, you should check it out!) to see if that made any difference, in effect forcing the install in the service that would run the Ruby script. I was surprised when the install worked, but it installed it for Ruby 1.8 rather than Ruby 1.9.

Adding ‘ruby –version’ as a build step showed that for some reason the Jenkins server was using Ruby 1.8.7 rather than 1.9.3.

It turns out that RVM is not available when run through a non-interactive shell script, which is a bit inconvenient when you need it run as part of an automated process.

Searching around I came across this Stack Overflow answer suggesting I make changes to my .bash_profile but those changes were already present meaning I had little luck in finding a solution.

Other suggestions involved rather convoluted steps to just get the thing working, something I neither had the time for nor thought should be required.

Luckily Jenkins provides a RVM Plugin which allows the build step to run inside an RVM managed environment meaning it will respect the RVM settings I’ve set outside of Jenkins…

Now when running ‘rvm list’ via a build script shows that we have multiple versions of Ruby available with 1.9.3 set as the default.

And all of a sudden my Ruby scripts work like a charm!

I’ll just refactor that…

I originally posted this to #AltDevBlogADay on Saturday November 12, 2011.

That's Enough

Refactor: Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

 

I pretty much wish that word had never been invented. The above definition (taken from Martin Fowler’s Refactoring Home Page) seems to have lost nearly all meaning when used in day to day programming conversations.

To further describe the original definition of refactoring:

Its heart is a series of small behavior preserving transformations. Each transformation does little, but a sequence of transformations can produce a significant restructuring.

 

But when someone comes to me and says “I’ll just refactor that…” I can no longer assume to know what they’ll be doing. Based on how people now use this term it could be any one of a number of things including (in order of violence)

  • Making small changes to make the system more understandable, simpler and easier to use without affecting it’s perceived behaviour
  • Optimising elements of the system which should hopefully(?) not effect the external behaviour
  • Making wide ranging changes to a system which will effect elements of its behaviour
  • Deleting the whole thing and starting again without even looking at what we have now

It seems like there is a reluctance to use the words ‘re-write’, ‘modify’, ‘break’ or (in a some cases) ‘trash’ when discussing intentions towards an existing system. As a result what was a very descriptive and clear term has lost all meaning and resulted in something what can no longer be used to clearly describe a useful and often necessary development process.

Do I wish the word hadn’t been invented? Maybe I just wish it hasn’t taken on an image that ‘refactoring’ is somehow cooler or more elite than simply saying what you mean when describing what you’ll be doing.
 

A Different Allocator Model

I originally posted this to #AltDevBlogADay on Saturday 30th July 2011.

Different... Quite a few years back I started developing a custom STL implementation which has eventually be adopted and used throughout our company. One of the most important things to get right when developing any kind of generic container library is the way memory is allocated. It needs to be lightweight, highly flexible and above all easy to understand so people are willing to experiment it.

But STL Allocators have a bad reputation and it’s for a good reason. They are complex, hard to understand and have some interesting behaviour that seems designed to confuse. As a result, I needed to look at how we were going to provide a custom allocation system that was both easy to use and simple to understand without restricting what people could do with them.

A Bit of History
A while back Bitsquid published a post entitled Custom Memory Allocation in C++. This excellent post covered how the BitSquid engine used an allocator model to perform memory allocation throughout their entire engine.

But the FTL requires a slightly different approach so I won’t be treading over already covered ground here.

FTL allocators are well hidden inside containers, the only control you have is specifying the allocator type which means it’s use, and how and when objects are allocated and created is completely fixed with only the allocator itself being able to effect the outcome. Because of this the allocator behaviour needs to be as customisable as possible without requiring any changes the container itself.

When the FTL was original started, it was quite small scale and only used by a couple of developers, so allocators were not a priority. The flexibility wasn’t needed so in-container malloc and free allowed us to concentrate on container creation but obviously this wasn’t going to last.

The follow just describes the various approaches we took, why we dismissed them and what we eventually ended up with.

The Initial Approach
Allocators should have a minimal over-head. What we didn’t want to do was increase the size of every container dramatically when we rolled out customisable allocators. As a result, we initially used an approach used by a couple of vendors and defined the allocator specification using only static members – removing the need for an internal allocator object.

// Completely made up code showing the use of a static based allocator
template <typename T, typename Alloc>
class vector
{
  void  push_back(void)
  {
    void* new_memory = Alloc::allocate( sizeof(T) );
    T* new_object = new(new_memory) T;
  }
};

I knew this would limit the flexibility of the allocators, but it had minimal over-head (especially using default template parameters) and wouldn’t effect those containers already being used. And besides, programmers are a creative bunch and I wanted to see what people did with this before resigning it to the scrap heap.

But while programmers were able to work around the main limitation of not having allocator state per container, they were having to jump through hoops to get the behaviour they wanted. Which made it less likely that other programmers would feel confident enough writing their own allocators and their ability to really customise their behaviour was too limited.

So we ended up adding allocator state on a per container basis, making it much easier for developers to do what they wanted though it did add at least 4 bytes per container. But I felt that flexibility and simplicity were much more important.

// Completely made up code showing the use of an instanced allocator
template <typename T, typename Alloc>
class vector
{
  Alloc m_allocator;

  void  push_back(void)
  {
    void* new_memory = m_allocator.allocate( sizeof(T) );
    T* new_object = new(new_memory) T;
  }
};

Allocator Specification
The allocator specification is complicated. While I’m sure there are good reasons for some of the decisions I wanted ours it to be much simpler. So removing the type information (since our allocators work on raw memory and nothing else), removing the rebind(!) and exception handling (which we don’t use) we ended up with the following.

typedef ftl::pair<void*, size_t>  allocated_memory;

class Alloc
{
public:
  allocated_memory   allocate(const size_t alloc_size);
  void               deallocate(void* ptr);
};

And for the basic allocator, that’s it, it doesn’t require anything else to work, but it does have one interesting aspect.

typedef ftl::pair<void*, size_t>  allocated_memory;

As we can have all sorts of allocator types, what it allocates might not be exactly what you asked for. If it can’t allocate all the memory you requested, that’s fine, it simply returns allocated_memory(nullptr, 0) but it can also return more than was requested (for example, fixed sized block allocators will do this). This return type simply allows the allocator to return not only the allocated memory, but also how much was allocated, which allows calling objects to take advantage of this additional memory if possible.

Most of the time this isn’t queried and only what was asked for is given, but it offers an additional level of information which might allow better memory usage in some containers.

So a container will most likely end up with something like the following when adding and removing elements.

// Creating a new object in a container
allocated_memory new_memory = m_allocator.allocate( sizeof(T) );
if (new_memory.first)
  T* new_object = new(new_memory.first) T;

// Losing the object now we’re done with it
old_object->~T();
m_allocator.deallocate(old_object);

This is fine and gives us a very simple entry point for allocators. But by forcing the use of placement new and the destructor call in the container itself, we are limiting what an allocator can do. While the allocators are required to return raw memory, that doesn’t mean that it has to internally. Some allocators might pre-create the objects before returning them so creation is front loaded but the forced placement new could mean we’re over-riding an object that has already been created.

Construction Functions
As a result, we want developers to be able to not only over-ride the memory allocation, but also the object creation. 99% of allocators won’t need to do this so we don’t want to add additional requirements to the allocator so instead we can create non-member, non-friend functions, specialised on the allocator, which will do the creation for us.

template <typename TConstruct, typename TAlloc>
TConstruct* construct(TAlloc& allocator);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(TAlloc& allocator, const TConstruct& obj);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(void* allocated_mem);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(void* allocated_mem, const TConstruct& obj);

template <typename TConstruct, typename TAlloc>
void destroy(TConstruct* ptr);

template <typename TConstruct, typename TAlloc>
void destroy(TConstruct* ptr, TAlloc& allocator);

So our point of allocation/creation now becomes something much simpler and much more powerful.

// Creating a new object in a container
T* new_object = alloc::construct<T>(m_alloc);

// Lose our existing object and return it back to the allocator
alloc::destroy(old_object, m_alloc);

The default version of construct performs the allocation and placement new within the construct function, but should the allocator need something more (or less), simply over-loading the function on the allocator type allows complete control over both memory allocation and object creation. The same goes for the destroy function and the automatic call of the objects destructor.

No Base Allocator
One thing I didn’t add to the allocators was a base allocator interface. The allocators are created within the container, with their type defined at the point of the containers declaration. The addition of a base interface (and the associated virtual functions) would have increase the size over-head of the allocator which is something I wanted to avoid and something I thought didn’t add enough to warrant the overhead. I was less worried about the overhead of calling a virtual function as that would be insignificant compared to the overhead of everything else what was going on.

Conclusion
So by introducing an allocator model that is much simpler (only 2 required functions) with more extensible customisation should an allocator should need it, developers have complete control over all the memory allocation and importantly the object creation itself. Due to it’s initial simplicity, developers have no problem creating new allocators that improve both memory and container use in a project and can start to experiment with better and more efficient allocators quickly.

Title image by Squeezy.  Used with permission.

Simplifying the Flag Set

In the last few posts, I’ve covered the development of a type safe and easy to use flag set, and as a result it now contains 3 required template parameters, two of which declare default values.

template <
          typename TEnum,
          int TMaxFlags = 8,
          typename TNames = union_names<bit_set_size<TMaxFlags>::Value>
         >
class flag_set

The last two properties have default values, but to be honest, people don’t want to have to add a Noof option to the declaration just to define the flag names.  And they don’t want to keep adding long type declarations, even when using typedef’s to cut down on the clutter.

We have the following coding standard at work which can provide an interesting structure defining both our flags and flag names

Enums should be defined within a structure to provide proper name scoping

struct MyProperty
{
  enum Enum
  {
    FlagEntry1,
    FlagEntry2,
  };
};
MyProperty::Enum property = MyProperty::FlagEntry1;

In the past enums were defined with the name of the enum appended to the names of its entries. While this does reduce name conflicts it is a poor way to scope a property.

Since we can guarantee that our enums are defined like the above, we can make the assumption that people will (or will be happy to) add their flag names as part of this structure.

// Structure defining the flag enums and also
// defining the flag names which we need
struct StateFlags
{
  // Flags
  enum Enum
  {
    HasTarget,
    CarryingWeapon,
    Drugged,

    Noof,
  };

  // Names
  struct Names
  {
    uint HasTarget:1;
    uint CarryingWeapon:1;
    uint Drugged:1;
  };
};

Making that assumption allows us to define the following structure

template <typename TFlags>
struct flag_set_ex
{
  typedef flag_set <
                     typename TFlags::Enum,
                     TFlags::Noof,
                     typename TFlags::Names
                   > flags;
};

This then allows users to define their flags sets quickly using all the available properties

flag_set_ex<StateFlags>::flags myStateFlags;

If programmers don’t want to use the flag names, they can obviously use the base flag set type. And to keep things consistent, we can add the a ‘::flags’ typedef to the base flag set allowing the definitions to be the same regardless of the flag set type being used.

template < typename EnumT, uint SizeT, typename NamesT >
class flag_set
{
public:
  typedef flag_set<EnumT, SizeT, NamesT>     flags;
};

// Define a normal flag set with the same syntax as the flag_set_ex making
// it much easier to interchange the types depending on their needs
flag_set<StateFlags::Enums>::flags myStateFlags;

Debugging Type Safe Flags

In the previous post I described how I implemented a type safe, template driven, system for declaring and using flags in C++. But one of the main issues with the solution was not being able to see what flags were set when debugging.

Which Flags Are Set?
When setting bits to store flags there are varying levels of output you can get from them in the debugger.

One of them absolutely sucks – you don’t want to be doing bit arithmetic to know whats been set

The next one is better, it gives you some idea of what’s been set, but not much.

Finally, there is the following, which is much better.

A normal way of getting this behaviour out of the debugger when using base type flags is declaring them alongside a structure containing the individual flag entries inside a union.

struct  StateFlags
{
  union
  {
    uint m_allFlags;
    struct
    {
      uint HasTarget:1;
      uint CarryingWeapon:1;
      uint Drugged:1;
    } m_names;
  };
};

This provides you with the best possible output in the debugger so it’s something I wanted to add to the flag set to make debugging much easier.

Flag Set ‘Names’
Since this is an interface driven implementation the ideal syntax starts out as the following

// The enums used to define the flag set
enum StateFlags
{
  HasTarget,
  CarryingWeapon,
  Drugged,
};

// The ‘names’ we’d like to be displayed in the debugger
struct StateNames
{
  uint HasTarget;
  uint CarryingWeapon;
  uint Drugged;
};

We need to be able to pass these names through to the container, and since we want the type defined on a per flag basis we can add these names to the flag set definition.

template <typename TEnum, int TMaxFlags, typename TNames>
class flag_set

Since we already have our own defined type to store the flags being set and we know that we can use a union to combine the storage type and the flag names, we simply append our name type along side the storage type so when a flag is set the name is automatically set alongside it.

template< int TMaxFlags, typename TNames >
struct bit_union
{
  // Container type with enough bits to store TMaxFlags flags
  typedef bit_set<TMaxFlags> container_type;
  typedef TNames name_type;

  union
  {
    container_type m_type;
    name_type m_name;
  };
};

I’ve extracted the bit_set into it’s own structure rather than including it directly inside the union for two reasons

  • It allows the bit set to be specialised based on the number of flags without worrying about the union. If we had a large number of specilised storage structures, we’d have to duplicate the union inside each one.
  • It allows us to strip out the union and the names in builds that don’t need it, without affecting the speclialised bit sets.

So the flag set now contains user defined names without having to worry about how they are implemented.  The templated utility functions used to set, remove and check the flags also don’t need to know anything about it so aren’t effected.

template < typename TEnum, int TMaxFlags, typename TNames >
class flag_set
{
public:
  void set(const TEnum flag)
  {
    // The utility functions still take the same
    // type regardless of the name union
    set_bit_entry(m_unionFlags.m_type.m_bitArray, flag, true)
  }

private:
  static const uint BitSetSize = ((( TMaxFlags + 7 ) & ~7 ) >> 3)*8;
  bit_union< BitSetSize, TNames > m_unionFlags;
};

Now we can simply define our flag set with a unique set of names and in the debugger we get exactly what we we’re looking for (annoyingly we can’t get around the need to define the integers in the structure as one bit entries if we’re going to add them to the union).

struct MyFlags
{
  enum Enum
  {
    UnderAttack,
    InCover,
    // etc.

    Noof,
  };

  struct Names
  {
    uint UnderAttack:1;
    uint InCover:1;
    // etc.
  };
};

flag_set<MyFlags::Enum, MyFlags::Noof, MyFlags::Names> flagSet;

flagSet.set(MyFlags::InFormation);
flagSet.set(MyFlags::KnockedDown);

Default Names
But we have one problem.  We don’t want to force people to have flag names if they don’t want them. Depending on the type and volatility of the flags they might not need the names in the debugger and it can be a lot of code to add for every single flag set in use.  And since we have a default template parameter for the number of flags, we need to provide a similar default parameter for the names.

Defining the default parameters isn’t much of a problem.  We can quickly define union name companions for 8+ flags (a few small macros make this much easier) but how to drop them into the flag set template as a multiple of 8?

Since we already calculate our multiple of 8 flag count when defining the bit_union type, we can do the same when declaring the default name structure.  But we certainly don’t want such an over-complicated calculation twice in the class.

Since we need the multiple of 8 value outside the class in the template declaration, we can’t just declare it inside the class, so by storing it in it’s own type we can use it where-ever we need it.

template<uint TSize>
struct bit_set_size
{
  enum
  {
    Value = ((( TSize + 7 ) & ~7 ) >> 3)*8,
  };
};

We can now use this to declare our default flag set name where union names is correctly specialised depending on how many flags the user has declared.

template <
           typename TEnum,
           int TMaxFlags = 8,
           typename TNames = union_names<bit_set_size<TMaxFlags>::Value>
         >
class flag_set
{
private:
  bit_union< bit_set_size<TMaxFlags>::Value > m_unionFlags;
};

So now a user can simply declare their flag set with the enum type and by default they will get something out of the debugger that’s more useful than just showing the single base type value.

flag_set<MyFlags::Enum> flagSet;

flagSet.set(MyFlags::InFormation);
flagSet.set(MyFlags::KnockedDown);

Flag Set Drawbacks
In adding custom names to the flag set, we’ve added a number of new templated structures.  The default name types, the bit_set_size and the bit_union and all this comes with a cost.  Obviously we’re generating a number of structures for all the flags that use this system and it certainly does effect compilation speed.

Based on compilation speeds of a couple of projects that use the flag set heavily over a few months, I’d estimate that adding the flag set (before the debug names were added) put an extra 20-30 seconds on a full rebuild.  After adding the debug names, it probably added another 30 seconds or more, bringing a rebuild of the entire project up from 3 minutes to around 4.

But I can live with that for a number of reasons

  • The flag set has introduced a more type safe environment for developers.  On a project that has a number of developers dipping into it, this is vital and does reduce the possibility of human errors causing simple but difficult to track down bugs
  • The code is clearer and easier to read, maintain and debug.  This is a bonus for all the same reasons above.
  • I rarely rebuild the entire project.  A well structured project reduces the number of rebuilds greatly and as such the longer rebuilds actually don’t effect me that much (an average build usually takes <10 seconds)

There’s one more thing I want to look in regards to the flag set, and that’s how we can take the basic template declaration and simplify it so users can quickly and easily declare a flag set with a large number of flags and their own flag names within a single structure.

But I’ll cover that in another post.

A C++ Flag Set Example

As was pointed out in the last post, I… ahem… forgot to actually put down some working examples of the flag set, so I ended up describing something but not actually showing off the final product.

So I thought a good start would be to include a few of the flag set unit tests (since unit tested code is known to be good ‘working’ documentation in itself) followed by a simple working example based on the examples used at the start of the last post.

So, here’s a load of code that you can feast you eyes on (the unit test library being used here is UnitTest++)

struct TestFlagSet_Small
{
  enum Enum
  {
    Flag1,
    Flag2,
    Flag3,
    Flag4,

    Noof,
  };
};

struct TestFlagSet_Large
{
  enum Enum
  {
    Flag1,
    Flag2,
    Flag3,
    Flag4,
    Flag5,
    Flag6,
    Flag7,
    Flag8,
    Flag9,
    Flag10,
    Flag11,
    Flag12,

    Noof,
  };
};

class ExampleFixture
{
public:
  // Define a couple of flags, one defaulting to 8 flags
  flag_set<TestFlagSet_Small::Enum> smallFlagSet;
  flag_set<TestFlagSet_Large::Enum, TestFlagSet_Large::Noof> largeFlagSet;

  ExampleFixture(void)
  {
  }

  ~ExampleFixture(void)
  {
  }
};

// Checks that the flag set starts in an empty state
TEST_FIXTURE(ExampleFixture, InitialFlagSet_Empty)
{
  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
  CHECK_EQUAL(largeFlagSet.is_clear(), TRUE);
}

// Tests that compilation works when defining flags with sizes < 8
TEST(DefaultTemplateSet)
{
  flag_set<TestFlagSet_Small::Enum, TestFlagSet_Small::Noof>    smallFlagSet;
  flag_set<TestFlagSet_Large::Enum, TestFlagSet_Large::Noof>    largeFlagSet;
}

// Checks if the flag set is clear when flag set removed
TEST_FIXTURE(ExampleFixture, SetThenRemoveIsClear)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.remove(TestFlagSet_Small::Flag1);

  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
}

// Checks if the flag set is clear when cleared or no flags are set
TEST_FIXTURE(ExampleFixture, ClearSetIsClear)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.clear();

  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
}

// Checks if the set function correctly sets the right flags
TEST_FIXTURE(ExampleFixture, SetFlagSetIsSet_IsSet)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag1), TRUE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag4), TRUE);
}

// Check that returned C type is correctly set
TEST_FIXTURE(ExampleFixture, AsCTypeFunction)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag2);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  // as_c_type allows the flag set to be explicitly passed
  // through to legacy functions and for the underlying
  // datatype to be stored as a base type is needed
  const uint8* flagSetBits = smallFlagSet.as_c_type();

  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag1) );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag2) );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag3) == FALSE );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag4) );
}

// Test that setting from a C type correctly sets the flags
TEST_FIXTURE(ExampleFixture, FromCTypeFunction)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag2);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  const uint8 tempStorage = *(smallFlagSet.as_c_type());

  smallFlagSet.clear();
  smallFlagSet.from_c_type(&tempStorage);

  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag1), FALSE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag2), FALSE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag4), FALSE);
}

// And a (very) simple real world example

// Define a couple of flag types
struct BattleProperties
{
  enum Enum
  {
    HasTarget,
    CarryingWeapon,
  };
};

struct CharacterState
{
  enum Enum
  {
    IsStunned,
    IsPoisoned,
  };
};

// Create a flag set for the character states
typedef flag_set<CharacterState::Enum> StateFlags;
StateFlags m_currentStateFlags;

void UpdateCharacter(void)
{
  // We’ve been stunned
  m_currentStateFlags.set(CharacterState::IsStunned);

  // ... do some more stuff

  // Whoops, I would have just made a mistake, but I can’t do this
  // m_currentStateFlags.set(BattleProperties::CarryingWeapon);

  // Update what ever has happened
  UpdateState(m_currentStateFlags);

  // We’ve dealt with our states now
  m_currentStateFlags.clear();
}

void UpdateState(const StateFlags& state)
{
  // Are we stunned?
  if (state.is_set(CharacterState::IsStunned)
  {
  }

  // Another mistake I can no longer do
  // if (state.is_set(BattleProperties::CarryingWeapon)
  // {
  // }
}

Type Safe Flags in C++

Using bits and shifting enums is a pretty standard way of storing and reading flags. It’s quick, it’s compact and the syntax isn’t to difficult to pick up for people who haven’t seen it before. But since a bit is only a bit and an enum is only an enum it’s far to easy to make mistakes with these. There is no type checking when you accidentally use the wrong enum and passing an unsigned int around loses all relevance to what these ‘flags’ actually are.

So I wanted to look for a solution that not only stopped us accidentally using the wrong flag with the wrong variable (and allow the compiler to let us know if it happens) but to also to remove the loss of relevance when the flags were passed between different systems.

The Problem
Anyone who’s taken Programming 101 will know what I’m talking about.

enum CharacterBattleProperties
{
  HasTarget = (1<<0),
  CarryingWeapon = (1<<1),
};

enum CharacterState
{
  IsStunned = (1<<0),
  IsPoisoned = (1<<1),
};

unsigned int battleFlags;
unsigned int stateFlags;

// Set or remove some flags based on whats happening
// Character has just lost sight of the player
battleFlags &= ~HasTarget;

// We've just picked up a weapon
battleFlags |= CarryingWeapon;

// ...

// Some more code, some more function calls and
// now we have a player who is stunned
battleFlags |= IsStunned; 

// Whoops, I just set the wrong flag on the wrong variable...

The First Solution
While investigating a couple of solutions I came across a post on Game Architect titled Evolution of a Flag Set Class which examined a lot of the questions I was asking myself. And it took it pretty far, coming up with a set of solutions that more-or-less did what I was looking for, but I wanted something that would be easier to scale and specialise based on the number of flags we had and not one based on an integer as the smallest possible storage unit.

So in trying to keep it as simple as possible the flag set initially ended up with a very simple interface

// Stripping it down to it's most basic behaviour
template < typename TEnum, typename TBaseType = unsigned char >
class flag_set
{
public:
    void set(const TEnum flag)
    {
      m_flags |= (1 << flag);
    }

    void remove(const TEnum flag)
    {
      m_flags &= ~(1 << flag);
    }

private:
    TBaseType m_flags;
};

You’ll notice a couple of things with this implementation (on top of ‘remove’ being highlighted in pink for some probably awesome reason)

  • The first is the default value of the second template parameter is set to an unsigned char, meaning that in the majority of cases the variable can be declared simply with the enum type but it’s easy to increase as needed (though it obviously has limits).
  • The second is that the object itself deals with the shifting of the enum value. This was a conscious decision to remove the idea that when setting a flag you are not ‘setting a bit’ rather you’re setting a property, something I wanted the interface to promote.

And this works pretty well, but contradicts something quite important.

I wanted the flag_set to move away from ‘setting bits’ so why do we have to specify a base type which makes us think about how many bits we need. And since we won’t know how many enum entries we have until we start to set them at runtime, we could end up with a flag_set that doesn’t have enough space to store all our flags and the compiler won’t be able to tell us about it (in real life it asserts when you try to use a flag outside the range of the storage type, but that’s not good enough).

A More Appropriate Solution
So I wanted to look at working with the number of flags, and work on making it have as small a footprint as possible. So taking a step back I started out with the interface I wanted.

template < typename TEnum, int TMaxFlags = 8 >
class flag_set

Where people could define them as follows

// This will assume there are 8 or fewer flags
flag_set< MyFlags::Enum > myDefaultSizeFlagSet;  

// This will create enough space for MyFlags::Noof flags
flag_set< MyFlags::Enum, MyFlags::Noof > myDeclaredFlagSet;

So we need a way to create enough storage space for the number of flags we have as compile time and since the base type of a bitset in the FTL is a 32bit unsigned int that’s not a good enough solution. There is no point storing 3 flags in an integer when it could drop into a char. If multiple flag sets were packed together, the smaller the storage space the better.

Creating enough storage space is pretty simple, we store an array of unsigned chars based on how many flags we have.

// Generate the array length based on an 8-bit type
unsigned char m_flags[ ((( TMaxFlags + 7 ) & ~7 ) >> 3) ];

// Figure out which bit to set/remove when needed 
// I'm not worrying about endian code in this example...
const int index = ((( (entry+1) + 7 ) & ~7 ) >> 3)-1;
const int offset = entry - (8*index);

// Set the bit
m_flags[index] |= 1 << offset;

Optimising For Certain Types
So we now have a flag set which allows us to have as many flags as we could possible want (and yes I have seen code with more than 128 'flags' in the past - don't ask me to justify that, I have no idea...) but with a slight problem. In the original version there were no arrays, no calculations when figuring out which bit to set and no loops to check it's state.

All of this over complicates what could be just 'one bit being set on one variable'.

We can easily get around this by specialising the flag set for any types of 8, 16 or 32 bit flags. But I didn't want to specialise the whole flag_set, especially when we might continue this specialisation up to 64 and 128 bit flags - and specialising a whole class just becomes a nightmare to maintain.

So we extract the storage type into its own structure and specialise that based on the number of flags. This allows the general flag_set behaviour to be defined within the one class, and the length based behaviour to be extracted into it's own implementation.

// Just a note that the following code is simplified to make it 
// easier to read on a blog post, but the general gist is the same.

template  < typename TBitCount >
struct bit_set
{
  enum
  {
    ArrayLength = ((( TBitCount + 7 ) & ~7 ) >> 3),
  }
  unsigned char m_bitArray[ArrayLength];
};

template <>
struct bit_set<8>
{
  enum
  {
    ArrayLength = 1,
  }
  unsigned char m_bitArray;
};

Then we simply replace our array member in the flag set with an instance of the bit_set (where we calculate the bit_set size as a power of 8 based on the number of flags we specify in the flag_set).

Since the flag_set doesn't want to worry about what kind of data it's using, we can replace the in-class implementation with a number of utility functions which not only reduces the amount of code duplication but, when specialised for the same reasons as the bit_set, allows us to have the same interface and to use the same class with only a small number of functions specialised as needed.

// Basic, none specialised function for setting 
// a flag as called from flag_set::set
template < typename ContainerT >
void set_bit_entry(ContainerT& flags, const uint entry)
{
  // Get the entry point to the bit list
  const int index = ((( (entry+1) + 7 ) & ~7 ) >> 3)-1;
  const int offset = entry - (8*index);
 
  // Set the bit
  flags.m_bitArray[index] |= 1 << offset;
}

// Specilised for a bit count of 8 - no arrays, no additional calculations
template< >
void set_bit_entry< bit_set< 8 > >(bit_set< 8 >& flags, const uint entry)
{
  flags.m_bitArray |= 1 << entry;
}
// So our basic none-specialised flag set looks something like this
// (code simplified to make it a bit more presentable)
template < typename TEnum, int TMaxFlags = 8 >
class flag_set
{
public:
    void set(const TEnum flag)
    {
      set_bit_entry(m_flags.m_bitArray, flag, true)
    }

    void remove(const TEnum flag)
    {
      set_bit_entry(m_flags.m_bitArray, flag, false)
    }

private:
    static const uint BitSetSize = ((( TMaxFlags + 7 ) & ~7 ) >> 3)*8;
    bit_set< BitSetSize > m_flags;
};

This method makes is very easy to specialise the flag set for flags that can be stored in a built in type and falling back to an array if the size doesn't fit into one type. If we want to specialise for 16 or 32 bit flags, then we simply create specialised versions of our utility functions making the flag_set implementation the same no matter how much additional work is being done behind the scenes.

Final Outcome
So we now have a flag set that meets the issues I wanted to resolve head on. Passing a structure with strong type behaviour stops programmers accidentally using the wrong flag with the wrong set (and the compiler's the one that stops them) and passing a named structure to a function keeps the meaning and reason behind the flags intact.

The code obviously increases in complexity over a simple bitwise operation but from a client point of view, the readability of the client code becomes easier. Calls to .set, .remove and .clear make it very clear what the intention of the code is and makes it easier to avoid simple keyboard mistakes.

It also makes it immeasurably easier to add the debugging features to the flag set, making it easier to see what is set and when. But this post has been long enough and I'll look at that in my next post.

Deque Memory Management

I’ve wanted to blog more about our internal STL implementation for a while and one of the interesting things I wanted to cover was how we implemented the STL deque, which I think is a sometimes misunderstood and unfortunately not a very often used container.

But first, what is a deque.

A good place to start is to say it’s pronounced ‘Deck’ and is short hand for ‘double ended queue’, giving you an idea of what it’s main strengths might be. Like a vector it offers constant time operations when adding or removing from the back, but unlike the vector it also offers the same when adding or removing from the front, making it a perfect container for queue style containers.

And it’s this later ability that makes it vastly different from a vector than the interface might otherwise suggest.

One major missing set of functions is something that you should always be using by default on a vector – reserve() and capacity(). And those missing functions should indicate an underlying difference in the way memory is allocated.

An STL implemented deque usually allocates memory in pools which are able to contain N number of entries in the container. Exceed the number that can fit in a pool and another pool is created, linked to the previous one. This gives it the ability to add values to the front and back of the container, in effect creating a chunky link list structure internally.

And it’s a good idea. By enabling a chunky (or sometimes not so chunky) list structure you have less of a need to preallocate memory, inserting values leans more towards a fixed, or more predictable, model and you lose the spikes that often accompany a vector should you keep extending past the current capacity.

But it’s not perfect. And it’s primary strength can also be it’s primary weakness.

Game developers like static memory. They like to allocate memory and stick with it. They don’t like memory that is allocating left, right and centre and they don’t like to have to guess at whats been allocated and when. Consoles like static memory too, and they like memory to be in the general vicinity of memory they are already looking at.

Coming from this, game developers and consoles generally don’t like link lists (though that’s not to say we don’t use them when we need to). And if we have to use them, it’s good to explicitly use them, or even better to use an intrusive list instead (more on those another day).

But surely you can be a bit clever. Maybe setting the size of a pool to be how ever many elements you want to add, in other words trying to mimic reserve(). But then you still need to allocate more memory when you add to the front, or the back, or where-ever depending where your particular implementation of a deque drops in the first elements (and since there is nothing guiding the implementation in the standard it could be anywhere).

And some implementations (Visual Studio I am looking squarely at you) have horrific ways of specifying how big these pools should be and simply do not provide the flexibility you need when working with them.

So what did we want and how did we approach implementing our own Deque container?

The first thing we wanted was our reserve and capacity back (along with our custom resize_capacity and other related functions) because that gives programmers vital control over what’s being allocated and when. Granted you could probably get similar behaviour by using a different allocator, but we don’t want to have to use a unique allocator 99% of the time!

As a result (and it’s a good one) that leads us back to having a block of continuous memory allocated within the deque which can make traversing the container much more cache friendly and makes memory management much easier. It also removes the question of “if I remove this element will the memory layout change?”. We’re sticking with vector style memory, which dictates that memory will not be freed unless specifically requested, another feature that ties in with developers needing to be confident of what their containers are doing.

This also allows us to lead quite easily towards a fixed::deque container with the same interface and very similar behaviour which is much harder with the chunky linked list approach.

But obviously a vector has this memory layout, and doesn’t have constant time insertion at the start of the deque. So something needs to be different?

A Ring Buffer is a pretty common data structure in computer science and fits the bill perfectly. Add an element to the back or front, and simply wrap around the available heap until your buffer start/end hit each other. At that point either stop accepting new entries (as a fixed deque would) or allocate additional memory and keep adding (in our case we increase the allocated size by 50% to keep it consistent with the vector).

This allows us to have constant time insertion at the start/end but it does make our iterators a bit more complicated as they need to be aware of their heap and buffer boundaries, but that’s only a few pointers so it’s not to bad (and traversing linked lists using iterators isn’t much different so it’s not much of a trade off).

Obviously going down this route has it’s downsides. Inserting elements one after another without reserving is much slower but developers shouldn’t be doing that anyway and most importantly traversal of the deque is much improved. Benchmarks on the platform we develop for at work show an increase in traversal speed in the order of 100% on some platforms, with the worst being around 30% faster which is nothing to sniff at.

But a fundamental change like this does means that people coming to this Deque when they are intimately familiar with a std::deque will either miss the sometimes small tweaks and use it inefficiently or be thrown by what they see and be less inclined to use it. But decent documentation and mentoring can easily overcome those hurdles.

From the outset memory management and traversal were the areas we were most concerned about when looking at the ftl::deque, and while insertion is certainly slower if you don’t reserve (though no-one should be using an FTL Deque without reserving memory before hand) this is a price I’m quite happy to pay.

Title image by mdezemery