Posted on

Thoughts on software project sizes – or the pain of scaling.

Thoughts of the Fractional ChiefLines of Code aren’t a quality metric, but they are a cognitive one.

As a codebase grows, the limiting factor isn’t performance or tooling – it’s how much of the system a human can realistically hold in their head.

Past certain size thresholds, individual understanding gives way to structural knowledge, tooling reliance, and eventually team coordination, and this is where documentation, standards, and deliberate technical debt management stop being “nice to have” and become survival requirements.

Ignore this, and critical system knowledge quietly walks out the door one day – possibly for the last time.


While the mythical “LOC” is not an absolute in any way, it is still a usable measure to roughly estimate the complexity and size of a project and by this what you  may need or be able to do with the resources you have. 

The scale of an application, and as echoed by most programmers as a rough measure, is the amount of lines of code one can keep in their head effectively and how much of the “big picture” you start to lose as the code size increases.

This is what I have found to be rough guidelines over time, and while it is my personal perception and a lot of programmers tend to concur, but  may individually have their own varying standards:

  • Tiny – 1,000 lines or less.
    • “Trivial” or a single-problem solution like an individual automation.
    • Easy to keep every tiny detail and every line in your head and know exactly where it is.

  • Small – 1,000-5,000 lines.
    • You know the details of individual modules and functions and within a line range of where to locate a particular set of functionality with ease.
    • You still have deep knowledge of the bulk of the code.
  • Medium – 5,000 to 20,000 lines.
    • You as an individual start losing the detail.
    • You start to focus on the overall structure of the entire project in your head,
      and know roughly what module a particular function exists in and how far in to scroll to find it.
    • You’re only maintaining details of code you’re actively working at this point.
  • Large – 20,000 to 50,000 lines.
    • Detail knowledge is limited to immediate work and everything else is having a birds-eye sense of where stuff is in the code.
    • You’re working off high-level structural knowledge and will start using tools to pinpoint things you’ve forgotten the exact location for.
    • This tends to also be the mental threshold of where an experienced programmer can hold the picture of the entire codebase in their head.

  • Very Large – 50,000 to 250,000+ lines.
    • You’re at multiple team members at this point and you’re treating the codebase as a set of interlocked projects.
    • Coordination for changes is absolutely essential at this point.
    • Only a handful of developers could maintain a complete mental picture of the codebase at this point and would be deep specialists in it having worked on it for years.
    • If this is maintained by a smaller team, or worse, an individual, you need to ensure standards and proper handovers when (not if) people move jobs, where the maintaining experience and knowledge literally walks out of the door every night, and at some point, for the last time. 

Please keep in mind that:

  • These thresholds describe individual cognitive limits, not team or system limits, and that 100k LOC in one tangled domain/monolith is not the same as the same 100k LOC across cleanly separated domains such as that of miniservices (see the article about “when microservices go rogue”).

  • The numbers can be skewed upwards significantly by things like the following (in no specific order)

    • Good coding standards and principles. 
    • Good tools and IDE’s supporting the developers, such as the Jetbrains and similar toolchains.
    • Good specifications. (this is where you start…)
    • Good documentation (Sorry guys – no, code is NOT the documentation – it is the implementation – what you did..)
    • Documentation is what describes in simple terms, how things hang together, used data formats, connectivity, data and service relations, related services etc , and, the intent of what you did – not what you did.
    • Good processes supporting the development coupled with development time frames allowing for the above.
    • Continuously caring about technical debt, setting aside time “decruft” the bad stuff.

(C) (BY) EmberLabs / Chris Sprucefield