Monday, December 23, 2024

Server? What server? Oh … that server

Must read

Who, Me? Hold your nose, gentle reader, as we dive headlong into the bucket of ice water that is the start of the working week. But fear not, for The Reg is here to warm your innards with a dose of Who, Me? – our weekly tale of technical shenanigans gone wrong.

This week our hero is a reader we’ll Regomize as “Emily” who worked at a stock trading firm as an application engineer – but, as Emily points out, when she started there the company was small enough that the tech team needed to know how to do just about everything. So the job title is more or less symbolic.

The firm at which Emily worked acquired a global unlimited license for its trading software, meaning it could run the code on as many servers as it wished.

The trick was that each license had to be tied to a hostname, and the hostnames had to be registered with the vendor. And while the license was unlimited, it wasn’t perpetual. Occasional renewals were therefore necessary.

And that got messy when renewals were requested for the wrong hostname, causing emails to bounce back and forth to the vendor in an attempt to activate the right licence on the right host.

The situation became so tiresome that someone had the bright idea to apply one name to all the servers. And the name that was chosen was “server”.

That may sound daft, but it’s worth stressing at this point that this was a stock trading firm, so time was very much of the essence. A day when a server was not operating because of an expired license cost money. While calling every server “server” seemed fraught with danger, there was a certain logic at work.

The firm’s tech team also felt that it could differentiate each box thanks to info in its internal DNS table. And the vendor didn’t care what its clients called their servers.

Emily joined shortly after the shift to all the hostnames being called “server”.

Part of her task was to fine tune the trading software. You know the old saying “there’s time, money, and quality, pick any two”? Well, that’s kind of how this software worked. On the one hand, time was of the above mentioned essence – so the software could be tuned to be very, very responsive. But that had a cost in terms of stability.

And we’ve already mentioned what happens when there’s downtime.

So Emily was tasked with tweaking instances of the trading software, seeking that sweet spot between speed and stability. Typically, she told Who, Me? she would have as many as 50 instances open at once in the test environment – each one easily identifiable to the trained observer, but all called “server”.

And so it was during one of these tweaking sessions that she made an adjustment, then restarted the server to test the change.

The silence that followed was chilling. It was followed by a great deal of yelling, which confirmed Emily’s worst fear: the server she had just restarted was not in the test environment.

The CEO even came running in, to find out what on Earth had just happened to stop the trading. And here is where the story is perhaps not quite a case of “Who, Me?” – because the ring of angry traders surrounding the red-faced application engineer on the IT desk made it very clear whose fault it all was.

What’s worse, there was absolutely nothing Emily could do to speed up the restart process. While she waited for the “ping” to indicate success (the longest few minutes she had experienced before or since), it looked to all the world (and the CEO) like she was doing absolutely nothing about the crisis she had triggered.

A lesson was learned out of it all, thankfully. Emily’s next task was to ensure that each server’s prompt and background clearly identified it at a glance – including whether it was in production or test.

We’re always on the hunt for tales like Emily’s – whether you got away with your goof or not, we want to know. Click here to send an email to Who, Me? and we may immortalize your shenanigans one Monday morn. ®

Latest article