This is part of a series of posts about early-stage startups. If there is anything you want me to dig into in the future, let me know in the comments.
Much has been written about technical debt, how it impacts software projects and businesses, and why it is a necessary evil if you want to ship something meaningful.
Startups, in general, are a hotbed of technical debt. They move fast, take shortcuts and make decisions on the fly. All of which makes architectural choices difficult. But it’s not the worst environment in the world in which to write code. The feeling of being in perpetual motion can be exhilarating, but the problems come when the technical debt builds up, and delivery begins to slow down.
This post will look at technical debt in startups and what you can do to mitigate it while still delivering a product your customers love.
What is technical debt?
Technical debt accrues when a team makes a series of quality and scalability compromises that enable them to ship short-term gains. It can be anything from poor architecture to untested code. The warning signs that you are accumulating too much technical debt is a team being fearful of making significant changes to the codebase, frequent production bugs and a slowdown in delivery time.
Technical debt can build up for any number of reasons. The most common is pressure from management to ship features. We’ve all seen the quality/speed/cost triangle where you can only pick two. Some of the points on the triangle are static; others are tradeoffs.
In early-stage startups, cost and speed are often non-negotiables. You have a fixed budget as you need to manage your runway. You can’t bring in a team of engineers from another department or hire new ones to meet a tight deadline. You will also likely have fixed deadlines as time-to-market is critical. If you don’t ship code and ship it quickly, you won’t learn what is working and what isn’t. The quickest way to validate your product’s assumptions is to get something in front of the users. Unfortunately, this means that most often, the tradeoff is quality. A lack of quality is where technical debt builds up.
Measuring technical debt
If we agree that technical debt is a problem and something that the team should manage, how do we measure it? Where are the most significant risks, and what aspects of the code should be marked for a refactor?
There are some undeniable signs that you are accumulating technical debt.
The first is that the team is apprehensive about pushing to production. Is there a high chance that something will break? Do a certain percentage of production deploys result in a customer issue or an elevated error rate?
The second is that features take longer to ship than they should. Are engineers trying to shoe-horn code into older, less flexible parts of the codebase and running up against problems? Does the team need to tackle significant refactors to make the architecture fit for purpose before the update becomes possible?
There are many metrics that you can track to get an understanding of technical debt. These include:
- Error rates after each production deployment
- Test coverage
- Cyclomatic complexity
- Churn vs maintainability
- Deployment cadence
- Customer release cadence
None of these individual metrics will help you identify sources of technical debt in isolation. Still, it’s possible to build an accurate picture of where you need to focus by combining them.
Below is a graph from CodeClimate illustrating Churn vs Maintainability. Maintainability, in this case, is a combination of test coverage and various static analysis metrics. If the code changes frequently and has a poor rating, this may be a source of production bugs. Combining this information with error logs can often point to sections of the codebase that may require a refactor.
Types of technical debt to avoid
Codebase architecture in startups is challenging. Platforms evolve rapidly, with ever-changing roadmaps. Businesses pivot, realign and reprioritise. Your roadmap today is not going to be the roadmap in three or six months. What is needed is a codebase flexible enough to adapt quickly to changes in the business. Flexibility means having the ability to throw away or rewrite sections of the codebase with minimal impact. Highly coupled code is unsound in any codebase. It’s all the more unhealthy in startups as it hinders your ability to reverse earlier decisions that may have been good at the time but are no longer the best route forward.
A well-tested codebase can make the difference between a fast, confident refactor and a risky, error-prone one. If you don’t have the confidence to make large-scale changes to the codebase and ship them to the customers, you’ll end up becoming deploy-adverse. The “no deploys on a Friday” mentality is a good sign of this problem. A lack of test coverage is usually a sign that you can’t confidently make rapid changes and keep the bugs to your laptop or CI environment.
The worst types of technical debt are the unknown ones. The ones you don’t know about until you need to work on a specific part of the code. Was there an undocumented hack that somebody introduced to get something into production? Is there a todo or comment at the top of a function that nobody has looked at in 12 months explaining why you shouldn’t modify this code? If you don’t understand the reasoning behind tradeoffs and shortcuts in your software, it’s challenging to accurately estimate how long it will take to ship something new.
Reasons technical debt exists
Lack of time is the most common reason technical debt exists. If given the time, most engineers could refactor the same 500 lines of code for eternity and never be completely happy with it. When you are rushing, you may skimp on planning, testing, refactoring or fixing that buggy deploy script. If work is not essential to a feature’s delivery, it can often be removed from the scope or ignored altogether.
Lack of experience
The team building the product may be too junior to understand the tradeoffs they’re making. You see this a lot in early-stage startups where the budget for senior engineers isn’t available. If there are seniors in the team, they are probably busy and often let significant structural issues slide so the team can ship. Addressing them via an in-depth review and refactor may set delivery back weeks.
Moving in the right direction
I came up with this matrix to try and articulate what can happen in an early-stage startup. You start with a greenfield codebase, and it can go several ways from there.
As you start building your platform, you’re immediately going to move into the “technical debt” quadrant if you want to release something. There are no two ways about this; you have to accrue the debt if you’re going to move forward.
What happens next is the key to whether you can scale your platform or become mired in technical debt to the point that it is hard to ship anything promptly and to a desirable standard for your customers.
In the worst-case scenario, you never pay the debt down, never go back and refactor shortcuts and compromises and let it build up. In this case, you sacrifice stability for continuous short-term gains. This situation quickly becomes unsustainable. Developers are wary of making any changes; things break frequently, and it’s nearly impossible to estimate the time it will take to make changes or ship new things.
If left to rot, you soon enter the “death march” quadrant. You don’t want to be here as team morale will be at an all-time low, and very few features are shipped. Unless you make some drastic changes here, such as going on a feature freeze while you do significant rewrites, you are going to have a tough time.
If you have millions of dollars in the bank and no pressure to ship, then you can move towards the “slow and steady” quadrant. I’ve only seen a handful of startups in this position, so we’re not going to dwell on this.
The happy path is to fluctuate between the “technical debt” and the “good place” quadrants. You accrue technical debt as you ship features, but you address it quickly so that the platform can scale as needed. You want to build scale into your platform incrementally. Not so much you prematurely build the infrastructure you don’t yet need but enough so that you can keep shipping quickly and safely.
Things that will move you in the right direction
There are several things you can do to mitigate the effects of technical debt in your codebase.
Having the ability to deprecate and remove code safely is one of the most powerful tools in your kit in an early-stage startup. Don’t be precious about code. It may have taken you a month to write, but if it’s no longer adding value, don’t keep it hanging around. You may think it may be helpful in the future, but we have version control for that. If you need a piece of code, it will always be there in your git history.
To make it easy to remove older code, always use bounded contexts. Have a defined public API to access certain parts of your codebase limited to a handful of public functions. Highly coupled systems make refactoring a lot more complicated than it needs to be. The natural conclusion for bounded contexts is microservices, but they are almost always a bad idea in startups. The infrastructure overhead is too high, and network boundaries are difficult to test and debug for small teams. A well-designed monolith with defined public internal interfaces is most likely the best way to go at this point.
When we talk about public APIs and interfaces, we’re not just talking about REST or GraphQL. We’re talking about a small, well-defined and well-documented set of entry points that the rest of the program can use to access a part of the codebase. Most commonly, these will be public functions in a package, module, or however your language of choice packages code.
As a package grows to thousands of lines of code over time, if you ensure the public surface is small, it is significantly easier to manage. When you eventually need to remove it and replace it with another piece of code or even delete it altogether, the job is relatively simple.
Following on from the concept of bounded contexts, one way to quickly deprecate and replace functionality is by using API versioning. Your APIs are a contract between your teams and customers. Even if you don’t plan to change much when you start, you will likely need to rearchitect your API in the future. Rather than add deprecations and backward compatibility breaks to an API, it’s a lot easier to upgrade the endpoint version. So even if it does nothing, add an API version to your endpoints so that your consumers can migrate easily when you eventually realise you need new functionality.
Testing is the key to refactoring and deprecating. A well-tested codebase gives you the confidence you can make significant changes to your platform without breaking production. How you test your code and infrastructure is another matter, but you should always aim for meaningful coverage. If your tests don’t fail when you make a change, they’re useless. Spend time writing a combination of unit and integration tests. With modern tooling, you can easily spin up replica databases, mock services and anything else you need to write integration tests and have these as part of your CI and deployment pipelines. Don’t scrimp on testing; if a job will take three days without testing and five days with, always pick the five and write the damn tests. You’ll thank yourself later.
A thought experiment that I like to use is “can this code be pushed to production at 5 pm on a Friday?”. Are you confident that you will not get pinged over the weekend and have to scramble to fix a bug? Are you convinced that customers will not have a degraded experience? Will any problematic updates roll back automatically?
Continuous delivery may not be immediately practical for your company but working towards it as a goal is a great way to improve your platform.
Monitoring your code quality
Keeping an eye on your code quality and monitoring for the early signs of technical debt is critical if you want to know where to focus. There are plenty of products out there that will help you do this. Platforms such as CodeClimate and Codacy will ingest your static analysis and test artefacts such as linting, mess detection, cyclomatic complexity, coverage, etc. and provide you with an overview about the state of your repositories. Building these workflows into your pull-request process will save you time during review and help keep your code in good shape.
Documenting technical debt
Documenting any shortcuts you have taken is essential to understanding what needs fixing in the future. Ideally, you should be able to filter your backlog by a technical debt tag and understand which parts of the codebase have significant risk associated with them. These tickets should have clear explanations of the compromises taken and what is needed to clean them up.
It is helpful if you also document technical debt at the code level. Adding comments and explanations is beneficial for both yourself and anyone else working on the code. Explaining yourself clearly to reviewers, other engineers and possibly your future self is a valuable skill, so learn it and use it often.
Removing technical debt is the tricky part. How do you pay the technical debt down if you’re regularly required to move forward and deliver new features?
One way is to do some preparatory refactoring, as talked about by Martin Fowler. Preparatory refactoring involves fixing existing issues before you write new code so that the changes you have to make become simple.
“for each desired change, make the change easy (warning: this may be hard), then make the easy change.”
This work should be part of your delivery estimates. A good guideline for my teams is to spend 10-20% of the time addressing technical debt. Some managers book time in and have technical debt sprints, but I’ve always found this hard to do, and they often get deprioritised. It’s much easier to pay down the debt as you go and work on it in each sprint.
One roadblock when addressing technical debt is getting buy-in from managers that it is work that adds value. Luckily in early-stage startups, the reporting structures are such that you probably only need to convince your direct manager or a PM.
One analogy that I’ve always liked is that of building a house. Say you need to add an extension to an existing building. You wouldn’t turn up on day one and start hammering planks of wood onto the existing structure. You would look at the plans and figure out what the previous builder has done. You may need to do some prep work such as modifying the foundations, rerouting water pipes and electricity and generally making sure your execution is such that it’s not going to destroy the whole house.
It’s no different in software. Cleaning up technical debt is prep work.
I hope these points have been helpful, and thanks for reading. If you have any feedback or anything you want me to dig deeper on in future posts, leave a comment.