Why Bad Software Happens to Good People
Bad software is one of the few things in the world you cannot solve with money. Billion dollar airlines have flight search apps that are often inferior to those built by groups of students. Established taxi companies the world over have terrible booking apps despite the threat they face from ride-sharing services. And painful corporate IT systems are usually projects with massive budgets, built over the course of many years. Whatever the cause of bad software is, it does not seem to be a lack of funding.
Surprisingly, the root cause of bad software has less to do with specific engineering choices, and more to do with how development projects are managed. The worst software projects often proceed in a very particular way:
The project owners start out wanting to build a specific solution and never explicitly identify the problem they are trying to solve. They then gather a long list of requirements from a large group of stakeholders. This list is then handed off to a correspondingly large external development team, who get to work building this highly customised piece of software from scratch. Once all the requirements are met, everyone celebrates as the system is launched and the project is declared complete.
The root cause of bad software has less to do with specific engineering choices, and more to do with how development projects are managed.
However, though the system technically meets specifications, severe issues are found when it is put in the hands of actual users. It is slow, confusing, and filled with subtle bugs that make using it an exercise in frustration. Unfortunately, by this time the external development team has been dismissed and there are no resources left over to make the necessary fixes. By the time a new project can be initiated years later, all knowledge of what caused these problems has left the organisation and the cycle starts over again.
A Conversation with Li Hongyi (Part 1)
Scroll down for more videos
The right coding language, system architecture, or interface design will vary wildly from project to project. But there are characteristics particular to software that consistently cause traditional management practices to fail, while allowing small startups to succeed with a shoestring budget:
• Reusing good software is easy;
it is what allows you to build good
• Software is limited not by the amount of resources put into building it, but by how complex it can get before it breaks down; and
• The main value in software is not the code produced, but the knowledge accumulated by the people who produced it.
Understanding these characteristics may not guarantee good outcomes, but it does help clarify why so many projects produce bad outcomes. Furthermore, these lead to some core operating principles that can dramatically improve the chances of success:
1. Start as simple as possible;
2. Seek out problems and iterate; and
3. Hire the best engineers you can.
While there are many subtler factors to consider, these principles form a foundation that lets you get started building good software.
Reusing Software Lets You Build Good Things Quickly
Software is easy to copy. At a mechanical level, lines of code can literally be copied and pasted onto another computer. More generally, the internet is full of tutorials on how to build different kinds of systems using ready-made code modules that are available online. Modern software is almost never developed from scratch. Even the most innovative applications are built using existing software that has been combined and modified to achieve a new result.
The biggest source of reusable code modules is the open source community. Open source software is software in which code is freely published for anyone to see and use. Many of the largest contributors to the open source community are giant tech companies. If you want to use a state-of-the-art planet scalable database as Facebook does, just download the code for Cassandra that they open sourced in 2008. If you want to try out Google’s cutting-edge machine learning for yourself, download the TensorFlow system published in 2015. Using open source code does not just make your application development faster, it gives you access to technology that is far more sophisticated than anything you could have developed yourself. For the most popular open source code, it is even more secure as there are many more people paying attention and fixing vulnerabilities. This is the reason digital technology has made such rapid progress: even the newest engineers can build upon the most advanced tools our profession has to offer.
The advent of cloud services has taken reusability even further, offering the full use of even proprietary systems for just a subscription fee. Need a simple website? Just configure one in a few clicks using a website building service like Squarespace or Wix. A database? Subscribe to a virtual one from Amazon Web Services or Microsoft Azure. Cloud services allow developers to benefit from specialisation; the service provider handles the setup, maintenance, and continued development of a reliable, high-quality piece of software that is used by all its subscribers. This allows software developers to stop wasting time on solved problems and instead focus on delivering actual value.
You cannot make technological progress if all your time is spent on rebuilding existing technology. Software engineering is about building automated systems, and one of the first things that gets automated away is routine software engineering work. The point is to understand what the right systems to reuse are, how to customise them to fit your unique requirements, and fixing novel problems discovered along the way.
Software engineering is about building automated systems, and one of the first things that gets automated away is routine software engineering work.
Software Is Limited by Complexity
How useful a piece of software can be is usually limited by its complexity rather than the amount of resources invested in building it.
IT systems are often full of features but are still hated by users because of how confusing they become. In contrast, highly ranked mobile apps tend to be lauded for their simplicity and intuitiveness. Learning to use software is hard. Beyond a point, new features actually make things worse for users because the accumulated complexity starts to become overwhelming. For example, after serving as the hub of Apple’s media ecosystem for almost 20 years, iTunes was split into three different apps (for music, podcasts, and TV shows) this year, as its features had grown too complex for one app to handle. From a usability perspective, the limit is not how many features can be implemented, but rather what can fit into a simple intuitive interface.
Even ignoring usability, engineering progress slows to a halt once a project becomes too complex. Each new line of code added to an application has a chance of interacting with every other line. The bigger an application’s codebase, the more bugs are introduced whenever a new feature is built. Eventually, the rate of work created from new bugs cancels out the rate of work done from feature development. This is known as “technical debt” and is the main challenge in professional software development. It is the reason why many large IT systems have issues that go unfixed for years. Adding more engineers to the project just adds to the chaos: they start running faster in place as the codebase keels over from its own weight.
Building good software involves alternating cycles of expanding and reducing complexity.
In such cases, the only way forward is to take a step back to rationalise and simplify the codebase. The system architecture can be redesigned to limit unexpected interactions. Non-critical features can be removed even if they have already been built. Automated tools can be deployed to check for bugs and badly written code. Bill Gates once said “Measuring programming progress by lines of code is like measuring aircraft building progress by weight”. Human minds can only handle a finite amount of complexity, so how sophisticated a software system can get depends on how efficiently this complexity budget is used.
Building good software involves alternating cycles of expanding and reducing complexity. As new features are developed, disorder naturally accumulates in the system. When this messiness starts to cause problems, progress is suspended to spend time cleaning up. This two-step process is necessary because there is no such thing as platonically good engineering: it depends on your needs and the practical problems you encounter. Even a simple user interface such as Google’s search bar contains a massive amount of complexity under the surface that cannot be perfected in a single iteration. The challenge is managing this cycle, letting it get messy enough to make meaningful progress, but not letting it get so complicated that it becomes overwhelming.
There is no such thing as platonically good engineering: it depends on your needs and the practical problems you encounter.
Software Is about Developing Knowledge More than Writing Code
In software development, most ideas are bad; this is not anyone’s fault. It is just that the number of possible ideas is so large that any particular idea is probably not going to work, even if it was chosen very carefully and intelligently. To make progress, you need to start with a bunch of bad ideas, discard the worst, and evolve the most promising ones. Apple, a paragon of visionary design, goes through dozens of prototypes before landing on a final product. The final product may be deceptively simple; it is the intricate knowledge of why this particular solution was chosen over its alternatives that allows it to be good.
This knowledge continues to be important even after the product is built. If a new team takes over the code for an unfamiliar piece of software, the software will soon start to degrade. Operating systems will update, business requirements will change, and security problems will be discovered that need to be fixed. Handling these subtle errors is often harder than building the software in the first place, since it requires intimate knowledge of the system’s architecture and design principles.
A Conversation with Li Hongyi (Part 2)
Scroll down for more videos
In the short term, an unfamiliar development team can address these problems with stopgap fixes. Over time though, new bugs accumulate due to the makeshift nature of the additional code. User interfaces become confusing due to mismatched design paradigms, and system complexity increases as a whole. Software should be treated not as a static product, but as a living manifestation of the development team’s collective understanding.
Software should be treated not as a static product, but as a living manifestation of the development team’s collective understanding.
This is why relying on external vendors for your core software development is difficult. You may get a running system and its code, but the invaluable knowledge of how it is built and what design choices were made leaves your organisation. This is also why handing a system over to new vendors for “maintenance” often causes problems. Even if the system is very well documented, some knowledge is lost every time a new team takes over. Over the years, the system becomes a patchwork of code from many different authors. It becomes harder and harder to keep running; eventually, there is no one left who truly understands how it works.
For your software to keep working well in the long term, it is important to have your staff learning alongside the external help to retain critical engineering knowledge in your organisation.
3 Principles for Good Software Development
1. Start as Simple as Possible
Projects that set out to be a “one-stop shop” for a particular domain are often doomed. The reasoning seems sensible enough: What better way to ensure your app solves people’s problems than by having it address as many as possible? After all, this works for physical stores such as supermarkets. The difference is that while it is relatively easy to add a new item for sale once a physical store is set up, an app with twice as many features is more than twice as hard to build and much harder to use.
Building good software requires focus: starting with the simplest solution that could solve the problem. A well-made but simplistic app never has problems adding necessary features. But a big IT system that does a lot of things poorly is usually impossible to simplify and fix. Even successful “do it all” apps like WeChat, Grab, and Facebook started out with very specific functionality and only expanded after they had secured their place. Software projects rarely fail because they are too small; they fail because they get too big.
Software projects rarely fail because they are too small; they fail because they get too big.
Unfortunately, keeping a project focused is very hard in practice: just gathering the requirements from all stakeholders already creates a huge list of features.
A Conversation with Li Hongyi (Part 3)
Scroll down for last video
One way to manage this bloat is by using a priority list. Requirements are all still gathered, but each are tagged according to whether they are absolutely critical features, high-value additions, or nice-to-haves. This creates a much lower-tension planning process because features no longer need to be explicitly excluded. Stakeholders can then more sanely discuss which features are the most important, without worrying about something being left out of the project. This approach also makes explicit the trade-offs of having more features. Stakeholders who want to increase the priority for a feature have to also consider what features they are willing to deprioritise. Teams can start on the most critical objectives, working their way down the list as time and resources allow.
We followed a similar process for all our most successful apps. Form.gov.sg started out as a manual Outlook Macro that took us six hours to set up for our first user but today has processed about a million public submissions. Data.gov.sg started out as a direct copy of an open source project and has since grown to over 300,000 monthly visits. Parking.sg had a massive list of almost 200 possible features that we never got around to building but still has over 1.1 million users today. These systems are well received not in spite of their simplicity but because of it.
2. Seek Out Problems and Iterate
In truth, modern software is so complicated and changes so rapidly that no amount of planning will eliminate all shortcomings. Like writing a good paper, awkward early drafts are necessary to get a feel of what the final paper should be. To build good software, you need to first build bad software, then actively seek out problems to improve on your solution.
This starts with something as simple as talking to the actual people you are trying to help. The goal is to understand the root problem you want to solve and avoid jumping to a solution based just on preconceived biases. When we first started on Parking.sg, our hypothesis was that enforcement officers found it frustrating to have to keep doing the mental calculations regarding paper coupons. However, after spending just one afternoon with an experienced officer, we discovered that doing these calculations was actually quite simple for someone doing it professionally. That single conversation saved us months of potentially wasted effort and let us refocus our project on helping drivers instead.
Beware of bureaucratic goals masquerading as problem statements. “Drivers feel frustrated when dealing with parking coupons” is a problem. “We need to build an app for drivers as part of our Ministry Family Digitisation Plans” is not. “Users are annoyed at how hard it is to find information on government websites” is a problem. “As part of the Digital Government Blueprint, we need to rebuild our websites to conform to the new design service standards” is not. If our end goal is to make citizens’ lives better, we need to explicitly acknowledge the things that are making their lives worse.
Having a clear problem statement lets you experimentally test the viability of different solutions that are too hard to determine theoretically. Talking to a chatbot may not be any easier than navigating a website, and users may not want to install yet another app on their phones no matter how secure it makes the country. With software, apparently obvious solutions often have fatal flaws that do not show up until they are put to use. The aim is not yet to build the final product, but to first identify these problems as quickly and as cheaply as possible. Non-functional mock-ups to test interface designs. Semi-functional mock-ups to try different features. Prototype code, written hastily, could help garner feedback more quickly. Anything created at this stage should be treated as disposable. The desired output of this process is not the code written, but a clearer understanding of what the right thing to build is.
Beware of bureaucratic goals masquerading as problem statements. If our end goal is to make citizens’ lives better, we need to explicitly acknowledge the things that are making their lives worse.
With a good understanding of the right solution, you can start work on building the actual product. You stop exploring new ideas and narrow down to identifying problems with your particular implementation. Begin with a small number of testers who will quickly spot the obvious bugs that need to be fixed. As problems are addressed, you can increasingly open up to a larger pool who will find more esoteric issues.
Most people only give feedback once. If you start by launching to a large audience, everyone will give you the same obvious feedback and you’ll have nowhere to go from there. Even the best product ideas built by the best engineers will start out with significant issues. The aim is to repeatedly refine the output, sanding down rough edges until a good product emerges.
Even after all this iteration, after launch is when problems with a product matter the most. A problem that happens only 0.1% of the time may not get noticed during testing. But once you have a million users, every day the problem goes unresolved is a thousand more angry people you have to deal with. You need to fix problems caused by new mobile devices, network outages, or security attacks before they cause substantial harm to your users. With Parking.sg we built a series of secondary systems that continuously check the main system for any discrepancies in payments, duplicate parking sessions, and application crashes. Building up an “immune system” over time lets you avoid being overwhelmed as new issues inevitably come up.
Overall, the approach is to use these different feedback loops to efficiently identify problems. Small feedback loops allow for quick and easy correction but miss out on broader issues. Large feedback loops catch broader issues but are slow and expensive. You want to use both, resolving as much as possible with tight loops while still having wide loops to catch unexpected errors. Building software is not about avoiding failure; it is about strategically failing as fast as possible to get the information you need to build something good.
3. Hire the Best Engineers You Can
The key to having good engineering is having good engineers. Google, Facebook, Amazon, Netflix, and Microsoft all run a dizzying number of the largest technology systems in the world, yet, they famously have some of the most selective interview processes while still competing fiercely to recruit the strongest candidates. There is a reason that the salaries for even fresh graduates have gone up so much as these companies have grown, and it is not because they enjoy giving away money.
Both Steve Jobs and Mark Zuckerberg have said that the best engineers are at least 10 times more productive than an average engineer. This is not because good engineers write code 10 times faster. It is because they make better decisions that save 10 times the work.
A good engineer has a better grasp of existing software they can reuse, thus minimising the parts of the system they have to build from scratch. They have a better grasp of engineering tools, automating away most of the routine aspects of their own job. Automation also means freeing up humans to work on solving unexpected errors, which the best engineers are disproportionately better at. Good engineers themselves design systems that are more robust and easier to understand by others. This has a multiplier effect, letting their colleagues build upon their work much more quickly and reliably. Overall, good engineers are so much more effective not because they produce a lot more code, but because the decisions they make save you from work you did not know could be avoided.
A Conversation with Li Hongyi (Part 4)
This also means that small teams of the best engineers can often build things faster than even very large teams of average engineers. They make good use of available open source code and sophisticated cloud services, and offload mundane tasks onto automated testing and other tools, so they can focus on the creative problem-solving aspects of the job. They rapidly test different ideas with users by prioritising key features and cutting out unimportant work. This is the central thesis of the classic book “The Mythical Man-Month”1: in general, adding more software engineers does not make a project go faster, it only makes it grow bigger.
Building software is not about avoiding failure; it is about strategically failing as fast as possible to get the information you need to build something good.
Smaller teams of good engineers will also create fewer bugs and security problems than larger teams of average engineers. Similar to writing an essay, the more authors there are, the more coding styles, assumptions, and quirks there are to reconcile in the final composite product, exposing a greater surface area for potential issues to arise. In contrast, a system built by a smaller team of good engineers will be more concise, coherent, and better understood by its creators. You cannot have security without simplicity, and simplicity is rarely the result of large-scale collaborations.
The more collaborative an engineering effort, the better the engineers need to be. Problems in an engineer’s code affect not just his work but that of his colleagues as well. In large projects, bad engineers end up creating more work for one another, as errors and poor design choices snowball to create massive issues. Big projects need to be built on solid reliable code modules in an efficient design with very clear assumptions laid out. The better your engineers, the bigger your system can get before it collapses under its own weight. This is why the most successful tech companies insist on the best talent despite their massive size. The hard limit to system complexity is not the quantity of engineering effort, but its quality.
Good software development starts with building a clear understanding of the problem you want to solve. This lets you test many possible solutions and converge on a good approach. Development is accelerated by reusing the right open source code and cloud services, granting immediate access to established software systems and sophisticated new technology. The development cycle alternates between exploration and consolidation, quickly and messily progressing on new ideas, then focusing and simplifying to keep the complexity manageable. As the project moves forward, it gets tested with successively larger groups of people to eliminate increasingly uncommon problems. Launching is when the real work ramps up for a good development team: layers of automated systems should be built to handle issues quickly and prevent harm to actual users. Ultimately, while there are infinite intricacies to software development, understanding this process provides a basis to tackle the complexities of how to build good software.
ABOUT THE AUTHOR
Li Hongyi leads a team of engineers, designers, and product managers who build technology for the public good. Projects they have worked on include Parking.sg—an app to replace parking coupons, Form.gov.sg—a web app for building online government forms in minutes, and Data.gov.sg—the government’s open data repository. Prior to joining the public sector, Hongyi worked at Google on the distributed databases and image search teams. In his free time, he works on personal projects like typographing.com and chatlet.com.
- Frederick P. Brooks, Jr., The Mythical Man-Month: Essays on Software Engineering, Anniversary Edition, 2nd ed. (Boston, MA: Addison-Wesley Longman, 1995).