Git Off My Lawn – Large and Unmergable Assets

I posted up the Git talk myself and Andrew Fray did at Develop 2013 and mentioned I’d have a few follow up posts going into more detail where I thought it was probably needed (since you often can’t get much from a slide deck and no-one recorded the talk).

One of the most asked questions was how we handled large and (usually) unmergable files (mostly in regards to art assets but it could be other things like Excel spreadsheets for localisation etc.). This was hinted to on slides 35 – 37 though such a large topic needs more than 3 slides to do it justice!

To start talking about this, it’s worth raising one of Git’s (or indeed any DVCS’s) major drawbacks and that’s how it stores assets that cannot be merged. Instead of storing history as a collection of deltas (as it does with mergable files) Git simply stores every version as a whole file, which if you have a 100 versions of a 100MB file, it means your repository could be 10GB in size just for that file alone (it’s not that clear-cut, but it explains the issue clearly enough).

While this is a drawback of DVCS in general it’s not necessarily a bad thing.

It’s how all SCM systems handle files that can’t be merged (some SCMS’s do the same with mergable files too – imagine how large their repositories are) but the problem comes with Git’s requirement that a clone pulls down the whole repository and it’s history rather than just the most recent version of all files. Suddenly you have massive repositories on everyones local drive and pulling that across any connection can be a killer.

As an example, the following image shows how a single server/client relationship might work, where each client pulls down the most recent file, while the history of that file is stored on the server alone.

But in a DVCS, the whole repository is cloned on all clients, resulting in a lot of data being transferred and stored on every clone and pull.

Looking at Sonic Dash, we have some pretty large files (though no-where near as large as they could be) most of them PSDs though we have smaller files like Excel spreadsheets that we use for localisation. Since none of these files are mergable and most of these files are large, we couldn’t store them in Git without drastically altering our workflow. So we needed a process that allowed them to be part of the general development flow but without bringing with them all the problems mentioned above.

Looking at the tools available, and looking at what tools were in use at the moment, it made sense to use Perforce as an intermediary. This would allow us to version our larger files without destroying the size of our Git repository but it did bring up some interesting workflow questions

  • How do we use the files in Perforce without over-complicating our one-button build process?
  • With Git, we could have dozens of active branches, how do they map to single versioned assets in Perforce?
  • How do we deal with conflicts if multiple Git repositories need different versions of the Perforce files?

 

By solving the the first point we start to solve the following points. We made it a rule that only the Git repository is required to build the game i.e if you want to get latest, develop and build you only need to use the Git repository, P4 is never a requirement. As a result of this the majority of the team never even opened P4V during development.

This means the P4 repository is designed to only hold source assets, and we have a structured or automated process that converts assets from the P4 repository into the relevant asset in the Git repository.

As an example, we store our localisation files as binary Excel files as thats whats expected by our localisation team but as that’s not a mergable format so we store it in P4. We could write (or probably buy) an Excel parser for Unity but again that wouldn’t help since we’d constantly run into merge conflicts when combining branches. So, we have a one-click script (written in Ruby if you’re interested) that converts the Excel sheet into per-platform, per-language XML files that are stored in the Git repository.

These files are much more Git friendly since the exported XML is deterministic and easily merged. Any complex conflicts and the conflict resolver can resolve how they see fit and just re-export the Excel file to get the latest version.

It also means that should a branch need a modified version of the converted asset they can either do it within the Git repository or roll back to a version they want in P4 and export the file again. The version in P4 is always classed as the master version, so any conflicts when combining branches can be resolved by exporting the file from P4 again to make sure you’re up to date.

Along with this we do have some additional branch requirements that help assets that might not be in Perforce (such as generating Texture Atlases from source textures) but that’s another topic I won’t go into yet.

2 thoughts on “Git Off My Lawn – Large and Unmergable Assets

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>