Incremental namespace refactoring for large code base

Why I bothered refactoring namespaces

On a weekend, I decided to finally clean up the messy namespaces in a ~140k LOC solution.

There is a single level namespace for each project, but the architecture has grown quite differently from what I initially wanted, so the scopes no longer stand for what they contain. Worse still, in a lot of places the usage of namespaces feel redundant, as they are not simplifying anything or avoiding any conflict, but instead adding plenty of syntactic noises.

Namespaces, when subdividing the project in a way matching to code cohesion, should greatly simplify naming within each of them, and also minimize the number of times these scope names being mentioned. Just like exceptions are not about writing try-catch, namespaces are never about accessing things with extra names.

The plan was to set up a solution-wide namespace for everything except for 3rd party libraries and remove some unnecessarily fine-grained ones. There are still self-contained projects that fit perfectly into their own territories, so these would become nested namespaces inside the solution-wide one. The new syntax in C++17 was another incentive to go for a nested namespace design because it would look a lot less verbose than it used to.

Trickier than you would think

Just 2 hours after I started refactoring, I felt it was going the wrong direction: everything I moved between namespaces was going to require modifying all its references, a lot of which are part of classes to be moved in the next step anyway. It is definitely possible to first add the fully-qualified namespace to every reference and remove them later, except that there is simply too much code to change.

Then, how about moving all of them before compilation? The brute-force way had not worked either because the compiler soon started to report mysterious errors like missing end of line (file?), which was probably related to either wrong order of inclusion, inclusion from inside a namespace, or something else that I had no chance to figure out, since there have been too many changes after the compilation was broken.

Using-directive to the rescue

I soon realized namespace using-directive was the right solution here. On the contrary to common coding guidelines that name pollution is best to be avoided, I needed exactly that both old and new scopes are temporarily treated as one. In fact, if the new namespace has a unique fully-qualified name, there can never be any "pollution" even when they are each other's alias.

And ... this worked as I thought! The bidirectional namespace using has granted compatibility before and after moving a class, because they are treated as one scope until I remove it again. After some searching that did not offer me any similar solution, I thought it would be nice to share this with everybody, and discuss several possible pitfalls when using this method.

Summary of steps

Preparation

Refactor the part of the code to be isolated, preferably in its own header and source file. This is an essential step, considering namespace changes would break the previous scope and involve unrelated classes, especially when there are nested classes. It would be too late to start fixing the part when the compilation is already broken.

before:

namespace GameBase  
{
    CDebugDataReporter::CDebugDataReporter(){/*...*/}
    /* stuff ...*/
    Input::GameKey Input::VKtoGameKey(unsigned long key){/*...*/}
    /* stuff ...*/
}

after:

namespace GameBase  
{
    //we plan to migrate CDebugDataReporter
    CDebugDataReporter::CDebugDataReporter(){/*...*/}
    /* stuff ...*/
}

namespace GameBase::Input  
{
    GameKey VKtoGameKey(unsigned long key){/*...*/}
    /* stuff ...*/
}
Adding using-directives

Add using-directives into a global header that is visible to the entire project. Note that we need to forward declare the namespace and the symbols inside the namespace in order to redirect them.

//global header "migration.h"
//moving symbols from GameBase to MyGame::GameBase
//forward declaration
namespace GameBase{ }

//note the inner name differs from the old one to avoid conflicts
namespace MyGame::GameBase_TOBERENAMED  
{
    //enables moved classes accessing everything else in the old namespace
    using namespace GameBase; 
}

namespace GameBase  
{
    //enables other classes accessing the moved ones
    using namespace MyGame::GameBase_TOBERENAMED;
}

Sometimes the source or target namespace is the global scope. In either case, we still need to redirect them, but simply using namespace GameBase is likely to cause name pollution and ambiguous symbols. There is a more appropriate way, though slightly more verbose than the using-directive version, still requires much less work than manually adding and removing namespaces from every reference:

//suppose we are moving these types and functions from global scope into namespace MyGame
namespace MyGame  
{
    class Landscape;
    class LandscapeEditor;
    class ELandscapeDrawMode;
    bool IsInEditMode(/*...*/);
    /* etc... */
}

//make sure the symbols can still be accessed as-is from the global scope
using MyGame::Landscape;  
using MyGame::LandscapeEditor;  
using MyGame::ELandscapeDrawMode;  
using MyGame::IsInEditMode;  
/* etc... */

The point is, by the time we finished moving all related code, most of these usings can be directly removed, and for those that would trigger compile time error, we append missing namespaces to the code instead. If we have obeyed the Single responsibility principle along the way, references to these classes should be predominantly within the module, and the namespace aims to match the module or its superset.

Applying changes

Now we can adjust the namespace of the class in both header and source files, so it would look like this afterwards:

namespace MyGame::GameBase_TOBERENAMED  
{
    CDebugDataReporter::CDebugDataReporter(){/*...*/}
    /* stuff ...*/
}
Compilation

Be sure to modify only one class at a time, and always keep it compiling.
Fail fast. Try compiling a single .cpp first, followed by unit tests, and the whole project at last.

Cleaning up

Now that all classes are moved as we wanted, try removing the using-directives and using-declarations from the global header. This could spot several cross-namespace places that indeed need to be added necessary namespaces. At last, do a global renaming of the temporary namespace with IDE or whatever tools available.

Caveats

Although with this method it should be easy to take one step forward at a time, there are special cases that need to be tackled carefully:

Forward declaration

If there were forward declarations before the change, they all have to be moved to the new namespace to avoid name conflicts. For incomplete types that appear in parameter types void func(class Foo* foo), or friend declaration friend class Foo from another class, they need to be forward declared in the new namespace in addition.

Specialization

All template specializations of a declaration need to stay in the same namespace, so keep in mind we need to move them all when moving the declaration.

Overloading

Rules for overloading differ from specialization that overloads can be defined across different namespaces. Depending on parameter types, the functions would either form Argument dependent lookup, or are just unrelated functions having the same name. This can easily lead to subtle bugs if one overload is missing but another one can be made compatible by implicitly converting parameters. It is better to just search and track down all such overloads ahead of time.

Conclusion

In this post, we went through a simple and pragmatic method to refactor namespaces incrementally. This way we could afford to make more scope adjustments continuously, and we neither have to modify thousands of call-sites that generally leads to "merge hell", nor worry about stalling the development since we can always keep it working and keep moving forward.