engineering – VibratingMelon

There are many things that make building and scaling marketplace businesses hard: for example, there’s the quintessential chicken-and-egg problem of building and balancing supply and demand, and there’s the need to build two or more products in parallel to serve the needs of the different participants in your marketplace.

There is also the question of how you scale your marketplace once you’ve got product-market fit established and some unit economics that seem to work.

Electrons vs Atoms

Most marketplaces have to deal with the tangible, real world: unlike pure software/SaaS companies, marketplaces have to deal with whole atoms, rather than just electrons.

Those atoms might make up people, or cars, or meals, or apartments but they are physical resources that have to be managed. This is why marketplaces tend to need significant operational headcount.

However, most marketplace companies aspire to be, and actively position themselves as, technology platform companies. This of course requires an ongoing investment in product/engineering.

Given finite resources, how do you choose between scaling a market place through operation headcount versus product/engineering investment? How do you strike the right balance?

The Comparables

I did some quick research to look at what other marketplace businesses are doing.

I took a basket of marketplace companies at varying funding stages and looked at their employee counts on LinkedIn by role.

Firstly, let’s set the scene by looking at the absolute number of engineers that various marketplaces have and compare that to their funding stage:

Perhaps no surprises here: as marketplaces develop, they hire more engineers. I am struck, though, by the widely varying number of engineers that the earlier stages marketplaces seem to have.

Now let’s look at the ratio between operational headcount and engineering headcount in these same companies:

This is also what you might expect. Although the data is noisy*, it seems that as marketplaces grow, they become less dependent on operational headcount. Presumably, their investments in product/engineering payoff in terms of automations and efficiencies.

Of course there’s also survivorship bias here – these are only the marketplaces that are still around. Perhaps the ones that didn’t make it had wildly different ratios.

What would be great is to get historical data on these ratios and see how that correlates with outcomes. Unfortunately, I don’t have that data (if you do, let me know!).

My bet would be that a higher ratio of operational to engineering headcount is hard for marketplaces to wean themselves off – i.e. it’s hard to change the ratio over time. If you organization gets accustomed to scaling and solving problems by hiring ops people rather than hiring engineers to automate, that just gets amplified over time.

* my methodology here was simply to search on LinkedIn for people with “engineer” and people with “operations” in their job title. This is obviously error prone for a number of reasons. For example, some “engineering” roles have “operations” in their job titles, and not all headcount are necessarily on LinkedIn, especially if a company outsources or off-shores some functions. However, given a sufficiently large sample set, one would hope that these effects blend out.

Brute Force Growth vs Long-term Value

Like any business, marketplaces have to continue to show top-line revenue growth in order to maintain the faith of investors and employees and be able to continue to raise money. The first, second, and third rules of business are “don’t run out of money“.

However, while it’s possible to “brute force” growth of many marketplaces through reliance on operational headcount in the short to medium-term, I believe this strategy has large associated dangers in the longer term.

In a perfect world, you could scale both operational and product/engineering headcount as needed but, in reality, you will be forced to choose between spending each $1 on one or the other. Here’s my quick take on the pros and cons:

In summary, the biggest danger with scaling by adding operational headcount is that it works…in the short-term. It’s also cheaper. But, the danger is that you win the battle but not the war.

Agree or disagree, please leave a comment.

I’m by no means the first person to write about the dangers of rewriting code. The definitive work for me is “Things You Should Never Do, Part I” written by Joel Spolsky in 2000. If you haven’t read that, you should.

A recent discussion caused me to create a series of charts to graphically illustrate the dangers of rewriting. I tend to think graphically – blame too much time in Powerpoint creating VC pitch decks.

I hope these charts help anyone considering rewriting code or being hectored by earnest, bright, young engineers and architects advocating a rewrite. I’ve seen this movie before and I know how it ends.

The Status Quo

Firstly, here’s the status quo:

The more time and money you spend on an existing product, generally speaking, the more functionality you get.

Let’s then overlay the competitiveness of the product in its target market. Of course, what you want to achieve is a continual improvement in the competitiveness of the product. Competitiveness doesn’t increase as fast as functionality. Your competitors don’t stand still so, typically, the best you can hope for is to increase competitiveness gradually over time – even staying flat is a big challenge.

Of course, this chart is idealized – functionality will increase in a more lumpy way as you release major new chunks of functionality and competitiveness will move up and down against your market. However, they demonstrate the point.

Note: the Y axis on all these charts represents “functionality” which, for the purposes of this discussion, can be considered a blend of features and quality. The distinction between the two is not really important in this analysis since you always want to be advancing on one or both and they both impact overall product competitiveness.

Let’s Rewrite!

Cue your software architect. He’s just been reading about this great new application framework and language called Ban.an.as – it’s so much better than the way you’ve been doing things previously. Hell, Ban.an.as has built-in back-hashed inline quadroople integration comprehensions. Plus, all the cool kids are using it.

If you’re a non-technical manager or executive, this can become quickly overwhelming and hard to argue against. They’re the experts, right, so they must know what they’re talking about.

[By the way, if you’re a non-engineer, there’s no such thing as “back-hashed inline quadroople integration comprehensions” – I made that up. Sorry.
A little secret here: there really haven’t been any new programming paradigms invented since the 1970s. Software folks are generally in their 20s and didn’t see them the first time around – they just get “re-discovered” and given new names. Sssh – don’t tell anyone.]

Just to be clear – I’m not saying that new languages, frameworks and technologies can’t create significant improvements in developer productivity – they can. But, their introduction into an existing product has a big price, as we’ll see.

Lastly, lest I alienate or offend my fellow geeks, I have been that very software architect advocating for the rewrite. I learned this the hard way.

So, this is how the rewrite is supposed to work in theory:

Let’s break this down:

The rework is expected to take some amount of time. During this time, functionality won’t increase because developers are focusing on rebuilding the foundations.
In the graph, that’s the blue area you can see peeking through. That blue area represents the cost of the rewrite.

But, once the rewrite is done, the idea is that progress will be massively greater than it was previously because the new technology used is inherently better than the old. Developers will be more productive with the new. There’s no need to work with the old spaghetti code – instead, there’s a beautiful new architecture free of all the baggage that came before.

The critical point in time is the break-even point – this is the point at which the functionality of your product starts to exceed where you would have been had you stuck with the original implementation and continued working on it.

During the rewrite period, the competitiveness of your product will typically decline since your competitors are not standing still and you can’t develop new functionality.

However, the claim typically made by the rewrite advocates is that the rework will be relatively easy, take a relatively short period of time and hence achieve break-even quickly, after which it’s non-stop to the moon…

What tends to go wrong – Part 1

So, what tends to go wrong? The most common problem is arguably the most common problem in software development generally; the rewrite takes significantly longer that expected.

There’s a discussion of why this tends to happen below but, for now, trust me that this often happens – if you’ve had any experience with software development at all, it’s highly likely that you’ve seen this happen too.

The result is that the cost of your rewrite is significantly larger than originally claimed. (The blue area showing through on the chart is now much larger.) This means that the break-even point is also significantly pushed back in time.

The knock-on effect is that the competitiveness of your product drops for a much longer period of time.

If you are a big company, you can probably (hopefully) absorb the pushed out break-even point – maybe other products provide revenue, maybe good channel relationships continue to deliver sales of your product even though it’s falling versus the competition or maybe you’ve simply got lots of cash reserves. Joel Spolsky references several big company rewrites that failed but did not kill the company.

However, if you are a startup or smaller company (or even a less fortunate bigger company), this can be – some might argue, is very likely to be – fatal.

What tends to go wrong – Part 2

It gets worse.

Not only does the rewrite often take longer than expected but functionality is not even flat at the end of it – functionality is actually lost as a side-effect of the rewrite.

That’s because all of those small but important features, tweaks and bug-fixes that were in the original product don’t get reimplemented during the rewrite. (These are the “hairs” that Joel discusses.)

Remember that the main focus of the rewrite in the developer/architect’s mind is generally to build a better architecture and these small features don’t seem important and mess up this beautiful new architecture.

Plus, however good the developers are, they will introduce new bugs that will only get exposed and fixed through usage.

The net result is that the product after the rewrite is, in the eyes of the end-user and the market, worse than the product before the rewrite, even if it’s better in the eyes of the developer.

This further pushes out the break-even point in terms of functionality and competitiveness of your product will take a long time to recover. If you listen carefully, you can probably hear your competitors wetting themselves laughing at your folly.

What tends to go wrong – Part 3

Oh dear.

The nail in the coffin is that, not only does the rewrite take longer than expected, and functionality get lost in the process, but the benefits of the new technology/language/framework turn out to not be nearly as great as claimed. Meanwhile, if you’d stuck with the status quo, you would have got more productive due to normal learning effects – i.e. don’t forget that while your team is climbing the learning curve with the new technology, you would have been getting better with the old one anyway.

Any one of the 3 problems above is potentially fatal but all 3 together is definitely so. Competitiveness of your product will likely never recover.

Why does this happen?

So, the above sections discuss what can go wrong. It might help to understand why this happens.

In Joel’s seminal post cited above, he makes the point, “It’s harder to read code than to write it.”

Very true – it’s also more fun to write new code than learn someone else’s code.

Developers like to feel smart and, intellectually, learning someone else’s code can seem like a lose-lose scenario to a lot of developers – if the code is bad, it’s painful to learn and you’re going to have to fix it. On the other hand, if it’s better than you would have written, it’s going to make you feel stupid.

Also, bear in mind that the fundamental motivation of most (not all) engineers is to Learn New Stuff ™. They are always going to gravitate towards new things over old problems they feel they’ve already solved. Again, I’m not faulting developers for this – it doesn’t make them bad people but, if you’re a non-developer, it’s critical you understand this motivation.

Put it this way: would you rather be the guy that architected the Golden Gate Bridge or one of the guys hanging off it on a rope, scraping off rust and repainting it?

Another problem is that when you rewrite, you typically make rapid progress early on which gives false validation to the decision to rewrite. It makes you feel smart – what a beautiful cathedral I’m building.

That’s because you’re writing code in a vacuum. No one is using the code yet; certainly no actual users. But, as you start to reach launch date, all those small features, tweaks and bug-fixes – all that learning that was encapsulated in your old product – starts to become conspicuous by its absence.

The bottom-line is that, when the case to rewrite is being made, it is not comparing apples to apples – it is comparing theoretical benefits with actual benefits. The actual benefit in question being, of course, that you have an existing product that works.

Comparing the pros and cons of writing a new application in language A versus B is not the same as comparing an application already written in language A versus rewriting that application in language B. Why? because any productivity gains of one language over the other are typically massively overridden by the loss of all the domain knowledge, testing and fixes encapsulated in the existing product.

Should you EVER rewrite?

So, are there ever circumstances when you should rewrite? My answer here is an emphatic“maybe”.

Let’s consider some of the possible situations:

1. An irretrievably sick code-base
The symptom here is that it takes exponentially longer to add each new feature – per the chart above. Another symptom is that defect reports continue to come in as fast, or faster than you can fix them – you’re treading water.
However, an irretrievably sick development team is a much more likely culprit than an irretrievably sick product.
Don’t let developers convince you that they can’t maintain a codebase – what they most often mean is that they don’t want to maintain a codebase.

2. The developers who wrote the code are not available
If you buy/find/inherit a codebase, make sure you have access to as many of the developers who wrote it as possible, for as long as possible. If you didn’t…doh!
Again, don’t fall into the trap – developers will always prefer to write new code of their own than learn how the existing code works.

3. A genuine change to a new problem domain
Sometimes, some of the fundamental assumptions and technology choices may no longer be valid for the problem domain that your product is addressing. But, then I’d argue, you’re really talking about building a new product rather than rewriting an existing one.

4. Fundamentally incorrect or limiting technology choices
In some cases, the team that originally built the product may have made some poor choices in the technologies to use or the approaches to take. They may simply not map well to the problem domain of your product. However, in my experience, this is true much less often than developers claim it’s true.
Also, there are points where you start to hit fundamental scalability limits of certain technologies. However, keep in mind that if you’re hitting those scalability limits, someone else probably has too before you, and solutions to the problem exist.

So, if you are even entertaining the suggestion of a rewrite, make sure you get the developers to give you a specific cost-benefit analysis of the rewrite. Show them the charts above and get them to tell you why these problems won’t occur in this case, even though they occur in almost every case.

If and when you are convinced that this is one of those rare situations where a rewrite is the right approach, you need to make it surgical: where will you make the incision? How deep do you cut? What are the risks? What are the potential side-effects? How can I make sure no functionality is lost? How can it be done incrementally?

Divide the rewrite into a series of smaller changes, during which functionality absolutely cannot be lost. Rewrite parts of the system at a time with a well defined, understood and testable interface between old and new.

Better still, don’t rewrite.

Good luck.

Agree, disagree? Please leave a comment.

VibratingMelon

observations from the startup frontlines

Tag: engineering

Marketplaces: Scaling with Operations vs Engineering?

Electrons vs Atoms

The Comparables

Brute Force Growth vs Long-term Value

Why You Should (Almost) Never Rewrite Code – A Graphical Guide

The Status Quo

Let’s Rewrite!

What tends to go wrong – Part 1

What tends to go wrong – Part 2

What tends to go wrong – Part 3

Why does this happen?

Should you EVER rewrite?

VibratingMelon

observations from the startup frontlines

Electrons vs Atoms

The Comparables

Brute Force Growth vs Long-term Value

Rate this:

The Status Quo

Let’s Rewrite!

What tends to go wrong – Part 1

What tends to go wrong – Part 2

What tends to go wrong – Part 3

Why does this happen?

Should you EVER rewrite?

Rate this: