Engineering Game Development

Lee Winder, Technical Team Lead at SEGA Europe on Software Engineering and Game Development

Browsing Posts in Programming


I’ve been using Jenkins as my CI server for a while and though user permissions has always been a bit of an issue (I’ll explain why in another blog post covering my Unity build process) running command line tools has never been to much of a problem once it got going.

At least not until I tried to run Ruby 1.9.3 via RVM on our Mac Jenkins server.

I’d developed the Ruby scripts outside Jenkins so I knew they worked, but when I came to run them through Jenkins (using nothing more than the ‘Execute Shell’ build step) it started behaving strangely. Running the script caused the step to fail instantly claiming it couldn’t find any of my installed Gems.

A quick ‘gem query –local’ on the command line listed all the gems I needed were there.

As an experiment I added a build step that installed the Trollop gem (a fantastic gem, you should check it out!) to see if that made any difference, in effect forcing the install in the service that would run the Ruby script. I was surprised when the install worked, but it installed it for Ruby 1.8 rather than Ruby 1.9.

Adding ‘ruby –version’ as a build step showed that for some reason the Jenkins server was using Ruby 1.8.7 rather than 1.9.3.

It turns out that RVM is not available when run through a non-interactive shell script, which is a bit inconvenient when you need it run as part of an automated process.

Searching around I came across this Stack Overflow answer suggesting I make changes to my .bash_profile but those changes were already present meaning I had little luck in finding a solution.

Other suggestions involved rather convoluted steps to just get the thing working, something I neither had the time for nor thought should be required.

Luckily Jenkins provides a RVM Plugin which allows the build step to run inside an RVM managed environment meaning it will respect the RVM settings I’ve set outside of Jenkins…

Now when running ‘rvm list’ via a build script shows that we have multiple versions of Ruby available with 1.9.3 set as the default.

And all of a sudden my Ruby scripts work like a charm!

I originally posted this to #AltDevBlogADay on Saturday November 12, 2011.

That's Enough

Refactor: Refactoring is a disciplined technique for restructuring an existing body of code, altering its internal structure without changing its external behavior.

 

I pretty much wish that word had never been invented. The above definition (taken from Martin Fowler’s Refactoring Home Page) seems to have lost nearly all meaning when used in day to day programming conversations.

To further describe the original definition of refactoring:

Its heart is a series of small behavior preserving transformations. Each transformation does little, but a sequence of transformations can produce a significant restructuring.

 

But when someone comes to me and says “I’ll just refactor that…” I can no longer assume to know what they’ll be doing. Based on how people now use this term it could be any one of a number of things including (in order of violence)

  • Making small changes to make the system more understandable, simpler and easier to use without affecting it’s perceived behaviour
  • Optimising elements of the system which should hopefully(?) not effect the external behaviour
  • Making wide ranging changes to a system which will effect elements of its behaviour
  • Deleting the whole thing and starting again without even looking at what we have now

It seems like there is a reluctance to use the words ‘re-write’, ‘modify’, ‘break’ or (in a some cases) ‘trash’ when discussing intentions towards an existing system. As a result what was a very descriptive and clear term has lost all meaning and resulted in something what can no longer be used to clearly describe a useful and often necessary development process.

Do I wish the word hadn’t been invented? Maybe I just wish it hasn’t taken on an image that ‘refactoring’ is somehow cooler or more elite than simply saying what you mean when describing what you’ll be doing.
 

I originally posted this to #AltDevBlogADay on Saturday 30th July 2011.

Different... Quite a few years back I started developing a custom STL implementation which has eventually be adopted and used throughout our company. One of the most important things to get right when developing any kind of generic container library is the way memory is allocated. It needs to be lightweight, highly flexible and above all easy to understand so people are willing to experiment it.

But STL Allocators have a bad reputation and it’s for a good reason. They are complex, hard to understand and have some interesting behaviour that seems designed to confuse. As a result, I needed to look at how we were going to provide a custom allocation system that was both easy to use and simple to understand without restricting what people could do with them.

A Bit of History
A while back Bitsquid published a post entitled Custom Memory Allocation in C++. This excellent post covered how the BitSquid engine used an allocator model to perform memory allocation throughout their entire engine.

But the FTL requires a slightly different approach so I won’t be treading over already covered ground here.

FTL allocators are well hidden inside containers, the only control you have is specifying the allocator type which means it’s use, and how and when objects are allocated and created is completely fixed with only the allocator itself being able to effect the outcome. Because of this the allocator behaviour needs to be as customisable as possible without requiring any changes the container itself.

When the FTL was original started, it was quite small scale and only used by a couple of developers, so allocators were not a priority. The flexibility wasn’t needed so in-container malloc and free allowed us to concentrate on container creation but obviously this wasn’t going to last.

The follow just describes the various approaches we took, why we dismissed them and what we eventually ended up with.

The Initial Approach
Allocators should have a minimal over-head. What we didn’t want to do was increase the size of every container dramatically when we rolled out customisable allocators. As a result, we initially used an approach used by a couple of vendors and defined the allocator specification using only static members – removing the need for an internal allocator object.

// Completely made up code showing the use of a static based allocator
template <typename T, typename Alloc>
class vector
{
  void  push_back(void)
  {
    void* new_memory = Alloc::allocate( sizeof(T) );
    T* new_object = new(new_memory) T;
  }
};

I knew this would limit the flexibility of the allocators, but it had minimal over-head (especially using default template parameters) and wouldn’t effect those containers already being used. And besides, programmers are a creative bunch and I wanted to see what people did with this before resigning it to the scrap heap.

But while programmers were able to work around the main limitation of not having allocator state per container, they were having to jump through hoops to get the behaviour they wanted. Which made it less likely that other programmers would feel confident enough writing their own allocators and their ability to really customise their behaviour was too limited.

So we ended up adding allocator state on a per container basis, making it much easier for developers to do what they wanted though it did add at least 4 bytes per container. But I felt that flexibility and simplicity were much more important.

// Completely made up code showing the use of an instanced allocator
template <typename T, typename Alloc>
class vector
{
  Alloc m_allocator;

  void  push_back(void)
  {
    void* new_memory = m_allocator.allocate( sizeof(T) );
    T* new_object = new(new_memory) T;
  }
};

Allocator Specification
The allocator specification is complicated. While I’m sure there are good reasons for some of the decisions I wanted ours it to be much simpler. So removing the type information (since our allocators work on raw memory and nothing else), removing the rebind(!) and exception handling (which we don’t use) we ended up with the following.

typedef ftl::pair<void*, size_t>  allocated_memory;

class Alloc
{
public:
  allocated_memory   allocate(const size_t alloc_size);
  void               deallocate(void* ptr);
};

And for the basic allocator, that’s it, it doesn’t require anything else to work, but it does have one interesting aspect.

typedef ftl::pair<void*, size_t>  allocated_memory;

As we can have all sorts of allocator types, what it allocates might not be exactly what you asked for. If it can’t allocate all the memory you requested, that’s fine, it simply returns allocated_memory(nullptr, 0) but it can also return more than was requested (for example, fixed sized block allocators will do this). This return type simply allows the allocator to return not only the allocated memory, but also how much was allocated, which allows calling objects to take advantage of this additional memory if possible.

Most of the time this isn’t queried and only what was asked for is given, but it offers an additional level of information which might allow better memory usage in some containers.

So a container will most likely end up with something like the following when adding and removing elements.

// Creating a new object in a container
allocated_memory new_memory = m_allocator.allocate( sizeof(T) );
if (new_memory.first)
  T* new_object = new(new_memory.first) T;

// Losing the object now we’re done with it
old_object->~T();
m_allocator.deallocate(old_object);

This is fine and gives us a very simple entry point for allocators. But by forcing the use of placement new and the destructor call in the container itself, we are limiting what an allocator can do. While the allocators are required to return raw memory, that doesn’t mean that it has to internally. Some allocators might pre-create the objects before returning them so creation is front loaded but the forced placement new could mean we’re over-riding an object that has already been created.

Construction Functions
As a result, we want developers to be able to not only over-ride the memory allocation, but also the object creation. 99% of allocators won’t need to do this so we don’t want to add additional requirements to the allocator so instead we can create non-member, non-friend functions, specialised on the allocator, which will do the creation for us.

template <typename TConstruct, typename TAlloc>
TConstruct* construct(TAlloc& allocator);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(TAlloc& allocator, const TConstruct& obj);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(void* allocated_mem);

template <typename TConstruct, typename TAlloc>
TConstruct* construct(void* allocated_mem, const TConstruct& obj);

template <typename TConstruct, typename TAlloc>
void destroy(TConstruct* ptr);

template <typename TConstruct, typename TAlloc>
void destroy(TConstruct* ptr, TAlloc& allocator);

So our point of allocation/creation now becomes something much simpler and much more powerful.

// Creating a new object in a container
T* new_object = alloc::construct<T>(m_alloc);

// Lose our existing object and return it back to the allocator
alloc::destroy(old_object, m_alloc);

The default version of construct performs the allocation and placement new within the construct function, but should the allocator need something more (or less), simply over-loading the function on the allocator type allows complete control over both memory allocation and object creation. The same goes for the destroy function and the automatic call of the objects destructor.

No Base Allocator
One thing I didn’t add to the allocators was a base allocator interface. The allocators are created within the container, with their type defined at the point of the containers declaration. The addition of a base interface (and the associated virtual functions) would have increase the size over-head of the allocator which is something I wanted to avoid and something I thought didn’t add enough to warrant the overhead. I was less worried about the overhead of calling a virtual function as that would be insignificant compared to the overhead of everything else what was going on.

Conclusion
So by introducing an allocator model that is much simpler (only 2 required functions) with more extensible customisation should an allocator should need it, developers have complete control over all the memory allocation and importantly the object creation itself. Due to it’s initial simplicity, developers have no problem creating new allocators that improve both memory and container use in a project and can start to experiment with better and more efficient allocators quickly.

Title image by Squeezy.  Used with permission.

In the last few posts, I’ve covered the development of a type safe and easy to use flag set, and as a result it now contains 3 required template parameters, two of which declare default values.

template <
          typename TEnum,
          int TMaxFlags = 8,
          typename TNames = union_names<bit_set_size<TMaxFlags>::Value>
         >
class flag_set

The last two properties have default values, but to be honest, people don’t want to have to add a Noof option to the declaration just to define the flag names.  And they don’t want to keep adding long type declarations, even when using typedef’s to cut down on the clutter.

We have the following coding standard at work which can provide an interesting structure defining both our flags and flag names

Enums should be defined within a structure to provide proper name scoping

struct MyProperty
{
  enum Enum
  {
    FlagEntry1,
    FlagEntry2,
  };
};
MyProperty::Enum property = MyProperty::FlagEntry1;

In the past enums were defined with the name of the enum appended to the names of its entries. While this does reduce name conflicts it is a poor way to scope a property.

Since we can guarantee that our enums are defined like the above, we can make the assumption that people will (or will be happy to) add their flag names as part of this structure.

// Structure defining the flag enums and also
// defining the flag names which we need
struct StateFlags
{
  // Flags
  enum Enum
  {
    HasTarget,
    CarryingWeapon,
    Drugged,

    Noof,
  };

  // Names
  struct Names
  {
    uint HasTarget:1;
    uint CarryingWeapon:1;
    uint Drugged:1;
  };
};

Making that assumption allows us to define the following structure

template <typename TFlags>
struct flag_set_ex
{
  typedef flag_set <
                     typename TFlags::Enum,
                     TFlags::Noof,
                     typename TFlags::Names
                   > flags;
};

This then allows users to define their flags sets quickly using all the available properties

flag_set_ex<StateFlags>::flags myStateFlags;

If programmers don’t want to use the flag names, they can obviously use the base flag set type. And to keep things consistent, we can add the a ‘::flags’ typedef to the base flag set allowing the definitions to be the same regardless of the flag set type being used.

template < typename EnumT, uint SizeT, typename NamesT >
class flag_set
{
public:
  typedef flag_set<EnumT, SizeT, NamesT>     flags;
};

// Define a normal flag set with the same syntax as the flag_set_ex making
// it much easier to interchange the types depending on their needs
flag_set<StateFlags::Enums>::flags myStateFlags;

In the previous post I described how I implemented a type safe, template driven, system for declaring and using flags in C++. But one of the main issues with the solution was not being able to see what flags were set when debugging.

Which Flags Are Set?
When setting bits to store flags there are varying levels of output you can get from them in the debugger.

One of them absolutely sucks – you don’t want to be doing bit arithmetic to know whats been set

The next one is better, it gives you some idea of what’s been set, but not much.

Finally, there is the following, which is much better.

A normal way of getting this behaviour out of the debugger when using base type flags is declaring them alongside a structure containing the individual flag entries inside a union.

struct  StateFlags
{
  union
  {
    uint m_allFlags;
    struct
    {
      uint HasTarget:1;
      uint CarryingWeapon:1;
      uint Drugged:1;
    } m_names;
  };
};

This provides you with the best possible output in the debugger so it’s something I wanted to add to the flag set to make debugging much easier.

Flag Set ‘Names’
Since this is an interface driven implementation the ideal syntax starts out as the following

// The enums used to define the flag set
enum StateFlags
{
  HasTarget,
  CarryingWeapon,
  Drugged,
};

// The ‘names’ we’d like to be displayed in the debugger
struct StateNames
{
  uint HasTarget;
  uint CarryingWeapon;
  uint Drugged;
};

We need to be able to pass these names through to the container, and since we want the type defined on a per flag basis we can add these names to the flag set definition.

template <typename TEnum, int TMaxFlags, typename TNames>
class flag_set

Since we already have our own defined type to store the flags being set and we know that we can use a union to combine the storage type and the flag names, we simply append our name type along side the storage type so when a flag is set the name is automatically set alongside it.

template< int TMaxFlags, typename TNames >
struct bit_union
{
  // Container type with enough bits to store TMaxFlags flags
  typedef bit_set<TMaxFlags> container_type;
  typedef TNames name_type;

  union
  {
    container_type m_type;
    name_type m_name;
  };
};

I’ve extracted the bit_set into it’s own structure rather than including it directly inside the union for two reasons

  • It allows the bit set to be specialised based on the number of flags without worrying about the union. If we had a large number of specilised storage structures, we’d have to duplicate the union inside each one.
  • It allows us to strip out the union and the names in builds that don’t need it, without affecting the speclialised bit sets.

So the flag set now contains user defined names without having to worry about how they are implemented.  The templated utility functions used to set, remove and check the flags also don’t need to know anything about it so aren’t effected.

template < typename TEnum, int TMaxFlags, typename TNames >
class flag_set
{
public:
  void set(const TEnum flag)
  {
    // The utility functions still take the same
    // type regardless of the name union
    set_bit_entry(m_unionFlags.m_type.m_bitArray, flag, true)
  }

private:
  static const uint BitSetSize = ((( TMaxFlags + 7 ) & ~7 ) >> 3)*8;
  bit_union< BitSetSize, TNames > m_unionFlags;
};

Now we can simply define our flag set with a unique set of names and in the debugger we get exactly what we we’re looking for (annoyingly we can’t get around the need to define the integers in the structure as one bit entries if we’re going to add them to the union).

struct MyFlags
{
  enum Enum
  {
    UnderAttack,
    InCover,
    // etc.

    Noof,
  };

  struct Names
  {
    uint UnderAttack:1;
    uint InCover:1;
    // etc.
  };
};

flag_set<MyFlags::Enum, MyFlags::Noof, MyFlags::Names> flagSet;

flagSet.set(MyFlags::InFormation);
flagSet.set(MyFlags::KnockedDown);

Default Names
But we have one problem.  We don’t want to force people to have flag names if they don’t want them. Depending on the type and volatility of the flags they might not need the names in the debugger and it can be a lot of code to add for every single flag set in use.  And since we have a default template parameter for the number of flags, we need to provide a similar default parameter for the names.

Defining the default parameters isn’t much of a problem.  We can quickly define union name companions for 8+ flags (a few small macros make this much easier) but how to drop them into the flag set template as a multiple of 8?

Since we already calculate our multiple of 8 flag count when defining the bit_union type, we can do the same when declaring the default name structure.  But we certainly don’t want such an over-complicated calculation twice in the class.

Since we need the multiple of 8 value outside the class in the template declaration, we can’t just declare it inside the class, so by storing it in it’s own type we can use it where-ever we need it.

template<uint TSize>
struct bit_set_size
{
  enum
  {
    Value = ((( TSize + 7 ) & ~7 ) >> 3)*8,
  };
};

We can now use this to declare our default flag set name where union names is correctly specialised depending on how many flags the user has declared.

template <
           typename TEnum,
           int TMaxFlags = 8,
           typename TNames = union_names<bit_set_size<TMaxFlags>::Value>
         >
class flag_set
{
private:
  bit_union< bit_set_size<TMaxFlags>::Value > m_unionFlags;
};

So now a user can simply declare their flag set with the enum type and by default they will get something out of the debugger that’s more useful than just showing the single base type value.

flag_set<MyFlags::Enum> flagSet;

flagSet.set(MyFlags::InFormation);
flagSet.set(MyFlags::KnockedDown);

Flag Set Drawbacks
In adding custom names to the flag set, we’ve added a number of new templated structures.  The default name types, the bit_set_size and the bit_union and all this comes with a cost.  Obviously we’re generating a number of structures for all the flags that use this system and it certainly does effect compilation speed.

Based on compilation speeds of a couple of projects that use the flag set heavily over a few months, I’d estimate that adding the flag set (before the debug names were added) put an extra 20-30 seconds on a full rebuild.  After adding the debug names, it probably added another 30 seconds or more, bringing a rebuild of the entire project up from 3 minutes to around 4.

But I can live with that for a number of reasons

  • The flag set has introduced a more type safe environment for developers.  On a project that has a number of developers dipping into it, this is vital and does reduce the possibility of human errors causing simple but difficult to track down bugs
  • The code is clearer and easier to read, maintain and debug.  This is a bonus for all the same reasons above.
  • I rarely rebuild the entire project.  A well structured project reduces the number of rebuilds greatly and as such the longer rebuilds actually don’t effect me that much (an average build usually takes <10 seconds)

There’s one more thing I want to look in regards to the flag set, and that’s how we can take the basic template declaration and simplify it so users can quickly and easily declare a flag set with a large number of flags and their own flag names within a single structure.

But I’ll cover that in another post.

As was pointed out in the last post, I… ahem… forgot to actually put down some working examples of the flag set, so I ended up describing something but not actually showing off the final product.

So I thought a good start would be to include a few of the flag set unit tests (since unit tested code is known to be good ‘working’ documentation in itself) followed by a simple working example based on the examples used at the start of the last post.

So, here’s a load of code that you can feast you eyes on (the unit test library being used here is UnitTest++)

struct TestFlagSet_Small
{
  enum Enum
  {
    Flag1,
    Flag2,
    Flag3,
    Flag4,

    Noof,
  };
};

struct TestFlagSet_Large
{
  enum Enum
  {
    Flag1,
    Flag2,
    Flag3,
    Flag4,
    Flag5,
    Flag6,
    Flag7,
    Flag8,
    Flag9,
    Flag10,
    Flag11,
    Flag12,

    Noof,
  };
};

class ExampleFixture
{
public:
  // Define a couple of flags, one defaulting to 8 flags
  flag_set<TestFlagSet_Small::Enum> smallFlagSet;
  flag_set<TestFlagSet_Large::Enum, TestFlagSet_Large::Noof> largeFlagSet;

  ExampleFixture(void)
  {
  }

  ~ExampleFixture(void)
  {
  }
};

// Checks that the flag set starts in an empty state
TEST_FIXTURE(ExampleFixture, InitialFlagSet_Empty)
{
  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
  CHECK_EQUAL(largeFlagSet.is_clear(), TRUE);
}

// Tests that compilation works when defining flags with sizes < 8
TEST(DefaultTemplateSet)
{
  flag_set<TestFlagSet_Small::Enum, TestFlagSet_Small::Noof>    smallFlagSet;
  flag_set<TestFlagSet_Large::Enum, TestFlagSet_Large::Noof>    largeFlagSet;
}

// Checks if the flag set is clear when flag set removed
TEST_FIXTURE(ExampleFixture, SetThenRemoveIsClear)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.remove(TestFlagSet_Small::Flag1);

  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
}

// Checks if the flag set is clear when cleared or no flags are set
TEST_FIXTURE(ExampleFixture, ClearSetIsClear)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.clear();

  CHECK_EQUAL(smallFlagSet.is_clear(), TRUE);
}

// Checks if the set function correctly sets the right flags
TEST_FIXTURE(ExampleFixture, SetFlagSetIsSet_IsSet)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag1), TRUE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag4), TRUE);
}

// Check that returned C type is correctly set
TEST_FIXTURE(ExampleFixture, AsCTypeFunction)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag2);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  // as_c_type allows the flag set to be explicitly passed
  // through to legacy functions and for the underlying
  // datatype to be stored as a base type is needed
  const uint8* flagSetBits = smallFlagSet.as_c_type();

  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag1) );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag2) );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag3) == FALSE );
  CHECK( (*flagSetBits) & (1<<TestFlagSet_Small::Flag4) );
}

// Test that setting from a C type correctly sets the flags
TEST_FIXTURE(ExampleFixture, FromCTypeFunction)
{
  smallFlagSet.set(TestFlagSet_Small::Flag1);
  smallFlagSet.set(TestFlagSet_Small::Flag2);
  smallFlagSet.set(TestFlagSet_Small::Flag4);

  const uint8 tempStorage = *(smallFlagSet.as_c_type());

  smallFlagSet.clear();
  smallFlagSet.from_c_type(&tempStorage);

  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag1), FALSE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag2), FALSE);
  CHECK_EQUAL(smallFlagSet.is_set(TestFlagSet_Small::Flag4), FALSE);
}

// And a (very) simple real world example

// Define a couple of flag types
struct BattleProperties
{
  enum Enum
  {
    HasTarget,
    CarryingWeapon,
  };
};

struct CharacterState
{
  enum Enum
  {
    IsStunned,
    IsPoisoned,
  };
};

// Create a flag set for the character states
typedef flag_set<CharacterState::Enum> StateFlags;
StateFlags m_currentStateFlags;

void UpdateCharacter(void)
{
  // We’ve been stunned
  m_currentStateFlags.set(CharacterState::IsStunned);

  // ... do some more stuff

  // Whoops, I would have just made a mistake, but I can’t do this
  // m_currentStateFlags.set(BattleProperties::CarryingWeapon);

  // Update what ever has happened
  UpdateState(m_currentStateFlags);

  // We’ve dealt with our states now
  m_currentStateFlags.clear();
}

void UpdateState(const StateFlags& state)
{
  // Are we stunned?
  if (state.is_set(CharacterState::IsStunned)
  {
  }

  // Another mistake I can no longer do
  // if (state.is_set(BattleProperties::CarryingWeapon)
  // {
  // }
}

Using bits and shifting enums is a pretty standard way of storing and reading flags. It’s quick, it’s compact and the syntax isn’t to difficult to pick up for people who haven’t seen it before. But since a bit is only a bit and an enum is only an enum it’s far to easy to make mistakes with these. There is no type checking when you accidentally use the wrong enum and passing an unsigned int around loses all relevance to what these ‘flags’ actually are.

So I wanted to look for a solution that not only stopped us accidentally using the wrong flag with the wrong variable (and allow the compiler to let us know if it happens) but to also to remove the loss of relevance when the flags were passed between different systems.

The Problem
Anyone who’s taken Programming 101 will know what I’m talking about.

enum CharacterBattleProperties
{
  HasTarget = (1<<0),
  CarryingWeapon = (1<<1),
};

enum CharacterState
{
  IsStunned = (1<<0),
  IsPoisoned = (1<<1),
};

unsigned int battleFlags;
unsigned int stateFlags;

// Set or remove some flags based on whats happening
// Character has just lost sight of the player
battleFlags &= ~HasTarget;

// We've just picked up a weapon
battleFlags |= CarryingWeapon;

// ...

// Some more code, some more function calls and
// now we have a player who is stunned
battleFlags |= IsStunned; 

// Whoops, I just set the wrong flag on the wrong variable...

The First Solution
While investigating a couple of solutions I came across a post on Game Architect titled Evolution of a Flag Set Class which examined a lot of the questions I was asking myself. And it took it pretty far, coming up with a set of solutions that more-or-less did what I was looking for, but I wanted something that would be easier to scale and specialise based on the number of flags we had and not one based on an integer as the smallest possible storage unit.

So in trying to keep it as simple as possible the flag set initially ended up with a very simple interface

// Stripping it down to it's most basic behaviour
template < typename TEnum, typename TBaseType = unsigned char >
class flag_set
{
public:
    void set(const TEnum flag)
    {
      m_flags |= (1 << flag);
    }

    void remove(const TEnum flag)
    {
      m_flags &= ~(1 << flag);
    }

private:
    TBaseType m_flags;
};

You’ll notice a couple of things with this implementation (on top of ‘remove’ being highlighted in pink for some probably awesome reason)

  • The first is the default value of the second template parameter is set to an unsigned char, meaning that in the majority of cases the variable can be declared simply with the enum type but it’s easy to increase as needed (though it obviously has limits).
  • The second is that the object itself deals with the shifting of the enum value. This was a conscious decision to remove the idea that when setting a flag you are not ‘setting a bit’ rather you’re setting a property, something I wanted the interface to promote.

And this works pretty well, but contradicts something quite important.

I wanted the flag_set to move away from ‘setting bits’ so why do we have to specify a base type which makes us think about how many bits we need. And since we won’t know how many enum entries we have until we start to set them at runtime, we could end up with a flag_set that doesn’t have enough space to store all our flags and the compiler won’t be able to tell us about it (in real life it asserts when you try to use a flag outside the range of the storage type, but that’s not good enough).

A More Appropriate Solution
So I wanted to look at working with the number of flags, and work on making it have as small a footprint as possible. So taking a step back I started out with the interface I wanted.

template < typename TEnum, int TMaxFlags = 8 >
class flag_set

Where people could define them as follows

// This will assume there are 8 or fewer flags
flag_set< MyFlags::Enum > myDefaultSizeFlagSet;  

// This will create enough space for MyFlags::Noof flags
flag_set< MyFlags::Enum, MyFlags::Noof > myDeclaredFlagSet;

So we need a way to create enough storage space for the number of flags we have as compile time and since the base type of a bitset in the FTL is a 32bit unsigned int that’s not a good enough solution. There is no point storing 3 flags in an integer when it could drop into a char. If multiple flag sets were packed together, the smaller the storage space the better.

Creating enough storage space is pretty simple, we store an array of unsigned chars based on how many flags we have.

// Generate the array length based on an 8-bit type
unsigned char m_flags[ ((( TMaxFlags + 7 ) & ~7 ) >> 3) ];

// Figure out which bit to set/remove when needed 
// I'm not worrying about endian code in this example...
const int index = ((( (entry+1) + 7 ) & ~7 ) >> 3)-1;
const int offset = entry - (8*index);

// Set the bit
m_flags[index] |= 1 << offset;

Optimising For Certain Types
So we now have a flag set which allows us to have as many flags as we could possible want (and yes I have seen code with more than 128 'flags' in the past - don't ask me to justify that, I have no idea...) but with a slight problem. In the original version there were no arrays, no calculations when figuring out which bit to set and no loops to check it's state.

All of this over complicates what could be just 'one bit being set on one variable'.

We can easily get around this by specialising the flag set for any types of 8, 16 or 32 bit flags. But I didn't want to specialise the whole flag_set, especially when we might continue this specialisation up to 64 and 128 bit flags - and specialising a whole class just becomes a nightmare to maintain.

So we extract the storage type into its own structure and specialise that based on the number of flags. This allows the general flag_set behaviour to be defined within the one class, and the length based behaviour to be extracted into it's own implementation.

// Just a note that the following code is simplified to make it 
// easier to read on a blog post, but the general gist is the same.

template  < typename TBitCount >
struct bit_set
{
  enum
  {
    ArrayLength = ((( TBitCount + 7 ) & ~7 ) >> 3),
  }
  unsigned char m_bitArray[ArrayLength];
};

template <>
struct bit_set<8>
{
  enum
  {
    ArrayLength = 1,
  }
  unsigned char m_bitArray;
};

Then we simply replace our array member in the flag set with an instance of the bit_set (where we calculate the bit_set size as a power of 8 based on the number of flags we specify in the flag_set).

Since the flag_set doesn't want to worry about what kind of data it's using, we can replace the in-class implementation with a number of utility functions which not only reduces the amount of code duplication but, when specialised for the same reasons as the bit_set, allows us to have the same interface and to use the same class with only a small number of functions specialised as needed.

// Basic, none specialised function for setting 
// a flag as called from flag_set::set
template < typename ContainerT >
void set_bit_entry(ContainerT& flags, const uint entry)
{
  // Get the entry point to the bit list
  const int index = ((( (entry+1) + 7 ) & ~7 ) >> 3)-1;
  const int offset = entry - (8*index);
 
  // Set the bit
  flags.m_bitArray[index] |= 1 << offset;
}

// Specilised for a bit count of 8 - no arrays, no additional calculations
template< >
void set_bit_entry< bit_set< 8 > >(bit_set< 8 >& flags, const uint entry)
{
  flags.m_bitArray |= 1 << entry;
}
// So our basic none-specialised flag set looks something like this
// (code simplified to make it a bit more presentable)
template < typename TEnum, int TMaxFlags = 8 >
class flag_set
{
public:
    void set(const TEnum flag)
    {
      set_bit_entry(m_flags.m_bitArray, flag, true)
    }

    void remove(const TEnum flag)
    {
      set_bit_entry(m_flags.m_bitArray, flag, false)
    }

private:
    static const uint BitSetSize = ((( TMaxFlags + 7 ) & ~7 ) >> 3)*8;
    bit_set< BitSetSize > m_flags;
};

This method makes is very easy to specialise the flag set for flags that can be stored in a built in type and falling back to an array if the size doesn't fit into one type. If we want to specialise for 16 or 32 bit flags, then we simply create specialised versions of our utility functions making the flag_set implementation the same no matter how much additional work is being done behind the scenes.

Final Outcome
So we now have a flag set that meets the issues I wanted to resolve head on. Passing a structure with strong type behaviour stops programmers accidentally using the wrong flag with the wrong set (and the compiler's the one that stops them) and passing a named structure to a function keeps the meaning and reason behind the flags intact.

The code obviously increases in complexity over a simple bitwise operation but from a client point of view, the readability of the client code becomes easier. Calls to .set, .remove and .clear make it very clear what the intention of the code is and makes it easier to avoid simple keyboard mistakes.

It also makes it immeasurably easier to add the debugging features to the flag set, making it easier to see what is set and when. But this post has been long enough and I'll look at that in my next post.

I’ve wanted to blog more about our internal STL implementation for a while and one of the interesting things I wanted to cover was how we implemented the STL deque, which I think is a sometimes misunderstood and unfortunately not a very often used container.

But first, what is a deque.

A good place to start is to say it’s pronounced ‘Deck’ and is short hand for ‘double ended queue’, giving you an idea of what it’s main strengths might be. Like a vector it offers constant time operations when adding or removing from the back, but unlike the vector it also offers the same when adding or removing from the front, making it a perfect container for queue style containers.

And it’s this later ability that makes it vastly different from a vector than the interface might otherwise suggest.

One major missing set of functions is something that you should always be using by default on a vector – reserve() and capacity(). And those missing functions should indicate an underlying difference in the way memory is allocated.

An STL implemented deque usually allocates memory in pools which are able to contain N number of entries in the container. Exceed the number that can fit in a pool and another pool is created, linked to the previous one. This gives it the ability to add values to the front and back of the container, in effect creating a chunky link list structure internally.

And it’s a good idea. By enabling a chunky (or sometimes not so chunky) list structure you have less of a need to preallocate memory, inserting values leans more towards a fixed, or more predictable, model and you lose the spikes that often accompany a vector should you keep extending past the current capacity.

But it’s not perfect. And it’s primary strength can also be it’s primary weakness.

Game developers like static memory. They like to allocate memory and stick with it. They don’t like memory that is allocating left, right and centre and they don’t like to have to guess at whats been allocated and when. Consoles like static memory too, and they like memory to be in the general vicinity of memory they are already looking at.

Coming from this, game developers and consoles generally don’t like link lists (though that’s not to say we don’t use them when we need to). And if we have to use them, it’s good to explicitly use them, or even better to use an intrusive list instead (more on those another day).

But surely you can be a bit clever. Maybe setting the size of a pool to be how ever many elements you want to add, in other words trying to mimic reserve(). But then you still need to allocate more memory when you add to the front, or the back, or where-ever depending where your particular implementation of a deque drops in the first elements (and since there is nothing guiding the implementation in the standard it could be anywhere).

And some implementations (Visual Studio I am looking squarely at you) have horrific ways of specifying how big these pools should be and simply do not provide the flexibility you need when working with them.

So what did we want and how did we approach implementing our own Deque container?

The first thing we wanted was our reserve and capacity back (along with our custom resize_capacity and other related functions) because that gives programmers vital control over what’s being allocated and when. Granted you could probably get similar behaviour by using a different allocator, but we don’t want to have to use a unique allocator 99% of the time!

As a result (and it’s a good one) that leads us back to having a block of continuous memory allocated within the deque which can make traversing the container much more cache friendly and makes memory management much easier. It also removes the question of “if I remove this element will the memory layout change?”. We’re sticking with vector style memory, which dictates that memory will not be freed unless specifically requested, another feature that ties in with developers needing to be confident of what their containers are doing.

This also allows us to lead quite easily towards a fixed::deque container with the same interface and very similar behaviour which is much harder with the chunky linked list approach.

But obviously a vector has this memory layout, and doesn’t have constant time insertion at the start of the deque. So something needs to be different?

A Ring Buffer is a pretty common data structure in computer science and fits the bill perfectly. Add an element to the back or front, and simply wrap around the available heap until your buffer start/end hit each other. At that point either stop accepting new entries (as a fixed deque would) or allocate additional memory and keep adding (in our case we increase the allocated size by 50% to keep it consistent with the vector).

This allows us to have constant time insertion at the start/end but it does make our iterators a bit more complicated as they need to be aware of their heap and buffer boundaries, but that’s only a few pointers so it’s not to bad (and traversing linked lists using iterators isn’t much different so it’s not much of a trade off).

Obviously going down this route has it’s downsides. Inserting elements one after another without reserving is much slower but developers shouldn’t be doing that anyway and most importantly traversal of the deque is much improved. Benchmarks on the platform we develop for at work show an increase in traversal speed in the order of 100% on some platforms, with the worst being around 30% faster which is nothing to sniff at.

But a fundamental change like this does means that people coming to this Deque when they are intimately familiar with a std::deque will either miss the sometimes small tweaks and use it inefficiently or be thrown by what they see and be less inclined to use it. But decent documentation and mentoring can easily overcome those hurdles.

From the outset memory management and traversal were the areas we were most concerned about when looking at the ftl::deque, and while insertion is certainly slower if you don’t reserve (though no-one should be using an FTL Deque without reserving memory before hand) this is a price I’m quite happy to pay.

Title image by mdezemery

One of the most difficult things when it comes to unit testing is cutting down on the number of duplicated tests. In the past I’ve often developed objects which have very different implementations but externally produce very similar results (for example when developing a vector compared to a fixed vector, or implementing different types of iterators), and one thing I don’t want to do is have a set of tests which are almost identical to another set of tests I wrote last week.

One method I’ve used is to write a common set of tests independent of a specific type and had the type defined in another file, along with other flags which are used to indicate which tests should or shouldn’t be run.

So for example, when developing a ring buffer style iterator, it was useful to know that the const and non-const versions produced the same results in a lot of different situations without having to have comparison tests or duplicating a large amount of test code.

The following examples are written using UnitTest++ but will work with a number of different unit testing Frameworks

// In the file RingBufferIterator_Tests.inl

// Tests are defined in a separate file assuming certain
// settings have been defined
TEST(Blah)
{
RINGBUFFER_ITERATOR_TYPE myItor;

//...  Doing tests with this type
}

Now we simply need to define the type of iterator and indicate if some tests should or shouldn’t be run

// In the file RingBufferIterator_TestTypes.cpp

SUITE(RingBufferItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer

// Define a couple of settings which the tests file uses to make
// sure some type specific tests are run or excluded

// This one simply indicates that the tests checking the ability
// to alter the content of the iterator will be run
#define RINGERBUFFER_ITERATOR_NONCONST_CONTENT

// Now we simply need to include the file used to
// implement all the unit tests for this kind of iterator
#include "RingBufferIterator_Tests.inl"
}

We do the same for the const version of the iterator we are testing

// In the file RingBufferConstIterator_TestTypes.cpp

SUITE(RingBufferConstItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer_const

// In this case we don't have any additional settings so we
// simply include the test files and let the tests run
#include "RingBufferIterator_Tests.inl"
}

You can take this as far as you want but there is obviously going to be a point where the number of flags you’re defining starts to make the tests hard to read or unmanageable. Limiting it to about two, where you can easily group the tests within these defines generally works quite well.

You don’t need to limit it to a single file. For example, some types might be similar enough to share 50% of the tests but not the rest. Splitting up the common tests is easy enough.

// In the file VectorContainer_TestTypes.cpp

SUITE(Vector)
{
// Define our type
#define VECTOR_TYPE ftl::vector

// Include our common tests
#include "Vector_CommonTests.inl"

// Include only the tests for this type
#include "Vector_DynamicTests.inl"
}

When tests are not written like this there is a a big gap in the ability to test if comparable types behave the same (fixed and non-fixed containers for example) especially when simple things like human error can get in the way of creating duplicate behaviour tests. By changing over to this style of testing we can automatically test the behaviour of similar types without creating additional work.

It does have it’s draw backs, the main one being that you can get multiple test failures (if a test is used by >1 type and they both fail at the same time) or worse you get one fail and you’re not sure which type caused the problem. An easy solution to this would be to allow the failure message to have a option component, allowing you to identify the type in the message, but this isn’t supported by any test framework that I know of.

Occasionally my compilers dependancy checker doesn’t cope very well, compiling only one of the type files rather than all of them if a test changes, but this has been remarkably rare and these draw backs don’t overshadow the ability to confirm that your types are functionally the same even if their implementation is vastly different.

Unity builds. I don’t like them.

Of all the tools at your disposal to make a build faster, this is the worst. And it’s not just the “hey let’s #include .cpp files” weirdness, but the way that it can make a well structure, modular code base become worse than spaghetti junction at rush hour, and the worse thing is that it’s not the fault of the programmer, especially when something as exceptional as Visual Assist can start helping you create non-modular code because of the way Unity Builds collect the code you write.

What Is a Unity Build?
Unity Builds have nothing to do with the excellent Unity Tool Set. I’ll just clear that up right off the bat.

These Unity Builds are a way of reducing build over-head (specifically opening and closing files and reducing link times by reducing the number of object files generated) and as such are used to drastically speed up build times. I say drastically because they are usually used after a code base has starting generating build times that totally cut down on a programmer’s productivity. This might be 10 minutes, 30 minutes or even hours.

Because Unity Builds give you a quick gain, it’s seen as a pretty magical solution, and when a full build is reduced from 30 to 3 minutes you can see why. But the caveat in that statement? Full builds.

I won’t go through how to generate Unity Builds here as the blog post The Magic of Unity Builds does a great job so have a look there. It’s interesting that I’m linking to a blog post that is in the total opposite direction that I’m coming from, but unless you know both sides of a tale, you don’t know the tale.

So without any more delay, why don’t I like them?

Say Goodbye to Local Properties
Well written code is modular, and modular code relies on a few assumptions. One of those being that a single translation unit (a cpp file and any associated include files) is isolated from other translation units, so (unless you extern the contents) anything defined within a cpp file is not available outside, and as a consequence you can have local objects being called the same things without conflict.

Take for example the following simple code (easily expanded to a real world example I’m sure)

// In VectorTest.cpp
namespace
{
   const uint StandardTestCount = 10;
}
// In ListTest.cpp
namespace
{
   const uint StandardTestCount = 3;
}

In a normal compilation environment these variables are local to the file they are working in, and this is a good thing. Why should the vector tests file care what is defined within another test file?

But if you have a Unity Build with the following then you’re going to have issues…

#include "VectorTest.cpp"
#include "ListTest.cpp"
#include "ArrayTest.cpp"
#include "QueueTest.cpp"

Specifically errors relating to the variable ::StandardTestCount already being defined. So we now have to be aware of what is defined throughout the entire project and resolve any issues, and inevitable this will end up with all variables being pre-appended with (in this example) VectorTest_ and ListTest_ etc.

I don’t want to refer to the ‘Magic’ post to much, as it’s a good article, but there is a statement in there referring directly to this. Specifically the following

“Technically you shouldn’t have this problem if you’re writing “proper” code”

This is wrong. “Proper” code is well structured and well maintained, meaning it’s modular and only refers to what it needs to refer to. In a lot of cases you will have variables with the same name, functions with the same name and other elements that you naturally write as you go along. That’s why the anonymous namespace (or the C style ‘static’) is so useful, and so useless if you switch to Unity Builds.

Using Is (even more) Lethal
Never use the ‘using’  keyword in a header file. It’s a standard statement and one that stands up easily. The reasons behind this are pretty clear (declaring ‘using’ in a header file forces all code including the header file to use that statement – in effect wiping out a namespace).

But using it in a .cpp file isn’t a crime. In fact it’s remarkably useful, even within the whole files scope. As a developer, I should know what I’m using in a local file, what elements I’m bringing into the file and what would/could conflict. Some people might not agree that ‘using’ at the top of a cpp file is a good idea at all, and I see their point, but on a whole file scope, it can be massively useful and as it’s localised it’s not going to pollute anything other than the file it’s in.

But in Unity Builds, using (that is not scoped within a function or type etc.) is lethal. Declaring ‘using ‘ in a cpp file makes it available throughout the code base (or more confusingly only in those files that are included after the one declaring it). Suddenly, using is everywhere. And your namespaces are now useless.

Every Change is a Rebuild
This is one of my biggest problems with Unity Builds. In well structured projects (more on that below) changing the content of a cpp file (or even a local header file) shouldn’t require everything to be rebuilt. Every. Single. Time. Yes, the unity build rebuilds are faster than doing a full build without a unity file, but doing even a fast rebuild every time will build up. Quickly.

When-ever I change a .cpp file, all I need to build is that .cpp file. Granted, link times are a tad long because it still has to re-link the other object files, but it takes a second to compile that one file. On average (I took count today) when I changed a header file it compiled on average 5 .cpp files. And it (again on average) took about 5 seconds to build.

Very rarely should I be required to re-build the entire project, and most of the time I’m don’t. And that saves me a lot of time. Every single day.

Multiple Configurations
This is mainly a bugbear of mine rather than a direct issue with Unity Builds, but I see it in nearly every Unity Build project I use. The main project is the Unity Build project, but there is another project that is built and maintained that doesn’t use Unity Builds. Now there is a point here – by having an additional ‘normal’ project you are forcing the modularity that can collapse with Unity Builds to be checked (usually a Continuous Build server will be building this as well as the Unity Build every time).

But we have problems with this.

Firstly, the non-unity build is only ever being built on the CB server. So any problems are going to break the build, and it will break if people are not using it day-to-day. Secondly you now have multiple projects to maintain. Not too much of a problem if you have an automated project generation step (see below) but it is still another project to maintain.

People may occasionally have to use the non-unity configurations, especially if they are getting an unknown error on the CB server. So now they are left working on a configuration that is uncared for and probably builds so slowly and erratically that they are probably losing all the time they saved from those quick unity builds they have been doing all day.

But What About My Build Times Then?
Well structured software builds quickly (or as quickly as they can anyway). But what is a well structured project when it comes to build times?

  • Sensible file inclusion – Only include the files you need and try as best you can to limit them to .cpp files. This means the compiler only needs to bring in what’s necessary and when you change one of these header files only those things that directly rely on it will change. If you find yourself having to include files that constantly change into a large number of cpp files, then I’d wager that a refactoring of the large header file would be in order. You should be building only a small number of cpp files when you change the content of a header file.
  • Forward Declarations – Where possible use forward declarations rather than including the header file in your class or function declaration. Annoyingly you cannot forward declare enums (on standard compliant compliers anyway) which sometimes throws this out of the window. But by not including the files until use, you’re cutting down on the amount of code being included and the number of files being opened.
  • Pre-compiled Headers – Use Pre-compiled Headers (PCH). Using PCH’s is the one (built into the IDE) feature that will cause your build times to plummet, especially if you are being sensible with them (such as including files that don’t change – SDK’s and common, fixed headers for example). Using pre-compiled headers across multiple platforms can be a pain, but it is possible to unify them and get a massive boost. I’ll cover these a little bit more below.
  • Library Files – Modular code usually leads towards easily extracting common code into a collection of libs. Reducing the amount of code in a project (and as a result how much you need to build) can speed up your productivity quickly.
  • Parallel Building – If you’re IDE supports it (and it might do and you just don’t know) and you have a multi-core machine, turn on parallel building of files. Building individual files at the same time is obviously going to cut down on your compile times no matter how quick they are at the moment.
  • Get a Better PC – It goes without saying (but I will anyway) doing this will speed everything up.

Pre-Compiled Headers
Pre-compiled headers are one of the best ways to get a massive boost to your compile times. So how fast are my build times compared to that of a Unity Build version when using PCH’s and taking into account the other build improvements suggestions?

As stated above the ‘average’ build time of a minimal build throughout the day was around 5 seconds. On a Unity Build it was a full build every time and was around 2 minutes on the slowest platform.

Changes to ‘core’ files, which are included by a higher than average number of files, resulted in builds of around 30 seconds on around 20 files. Again on a Unity Build this would have been around 2 minutes regardless.

Full rebuild (which I did twice today) was around 3 minutes. Granted a whole minute slower than a Unity Build but I did it twice rather than every single time.

Pre-compiled headers are not a silver bullet solution. Nothing is. And because of this here are issues that you do need to be aware of

  • Compiler Support – Some compilers out there simply do not support PCH’s. On a daily basis I use between 4 or 5 different compilers and while I’ve never encountered one that doesn’t, they are out there. This can shoot my argument in the foot, but it is a rare problem and one most people won’t encounter.
  • PCH Quirks – While I use a number of compilers that do support PCH’s, every single one of them has a slightly different way of building them and slightly different requirements before they can be used. This doesn’t affect the code that is written but does affect the content of your project, especially if you want to make them seem as consistent as possible.
  • Over-Inclusion – Because your PCH usually includes common files and files that rarely change, it does mean that some files are being brought into the project in a compilation unit that wouldn’t otherwise be required

Unity builds are a solution for the wrong problem. Long compile times are not caused by not using Unity Builds, they are the result of having badly structured and badly referenced code. Fix that (and have better code to boot) and you’ll be able to use ‘proper’ build tools without resorting to a quick fix.

Making Unity Builds Better
I don’t want to just leave it at that, because no matter how much I argue, Unity Builds will always exist and people will always use them (after all it’s quicker to rescue a broken down build by making it a Unity Build than doing it the hard way). So what can people do to actually make Unity Builds a bit more desirable and less of a problematic fix?

  • Automate Adding Files – A lot of teams have auto project generation code already (XML files, template projects etc.) so it’s important that you automate adding files to the project, otherwise people will forget to remove the file from the build step and they will forget to add it to the unity build file.
  • Multiple Unity Files – Running a Unity Build doesn’t require you to have one unity file with every cpp file inside it. You can have multiple build files (usually per module or per component) which means at least some of the translation unit leaking is limited to each module rather than the whole program.
  • Additional Project – No, this isn’t a contradiction from the above “Multiple Configurations” comment above. In this situation you will have a project that contains the cpp and header files so you can reference them but this project isn’t built. Instead, you have the ‘proper’ project simply contain the unity file(s). This isn’t something I’ve personally tried, but it does get around the issues of adding files if you don’t have an automated step.