I used to work at a prominent defense contractor in the US. I’m glad I did it as it had always been a dream of mine, but I’m also glad I’ve had the opportunity to move on. I learned a lot, met a TON of wonderful people, and will never abbreviate another variable as long as I live.
I worked on a contract regarding a Stealth aircraft, which shall remain nameless, and the radar receivers that were made for it. “Why does a stealth aircraft need radar receivers?” you ask? Well, 2 reasons. These receivers were custom-made to be able to receive and identify the geographic location as well as the nationality of radar pulses emitted by enemy systems. First of all, knowing where the enemy radars are, geographically, helps you not fly right over their heads by accident (what good is radar invisibility if your shadow flies right over the enemy air base?). Second, it gives you intel to analyze as to where the enemy is looking for you (if there’s a Russian SAM site on a mountain top that wasn’t there last month, you know something’s probably up).
With a receiver in the test lab, you’re usually looking at anywhere from 20-30 lbs of beautifully milled aircraft-grade aluminum filled with 10-20 custom-built electronics cards running $0.5-$1 million dollars worth of software. Moving it around is pretty easy, and most of the ones I dealt with were in the lab for some sort of problem the Air Force had with them. 99.9% of these problems were electronic in nature, and so we had to be able to trick the box into thinking it was flying through the air and receiving radar pulses from enemy sites. The test equipment and software, which were probably cutting edge when purchased, were under contract to remain unchanged after the test system was approved. However, since they were approved in the early 90’s, one can only imagine the relics we had to deal with 20 years later. Ancient CPUs measured in MHz, OS licenses older than me, and software about as user friendly as a hibernating bear. Part of my job was to execute the new contract in which we were charged to upgrade everything to modern test equipment and new software with which to run it. Then the fun began.
There were many layers of this enormous task, one layer that I had to deal with was running the old tests and the new tests on the same receiver, checking for any difference, finding the software root-cause, and correcting the error. Things were pretty slow but steady as there were MANY small problems to be worked out, but none will stay in my memory as much as SlowPOP. Slow Rise-Time Pulse On Pulse, aka SlowPOP, was a test to make sure that the receiver could function when two radar pulses were received overlapping each other when the rise/fall times were very slow compared to normal. The exact details are not only boring, but classified, so suffice it to say, the results just didn’t quite look right. The input parameters had been tweaked so that the test passed….pretty much…but it still didn’t look right, and the post-tweaking parameters were not near enough to the original parameters for my comfort.
I rebooted. I re-calibrated. I re-installed. I re-ran and re-ran and re-ran.
I asked. I inquired. I requested. I quizzed. I conjectured.
We shipped other deliverables in the months that passed by.
It was funny, every time I asked about WaveGenAPI, everybody said “there’s no way that’s wrong, 80% of the other tests use it and it’s been perfect for over a year.”
Then at last, after turning every other stone, I knew that WaveGenAPI had to be checked.
When I looked into the code behind WaveGenAPI, after a few days of digging, I had my first suspect. One line didn’t seem to be right. It was adding a bunch of terms and one of the terms seemed incorrect. I brought the author of WaveGenAPI, a man I highly respected with 26 years of experience, down to the lab to sit with me and look at the results together. He stared at it in silence for nearly half an hour, only asking the odd question here and there, covering basic checks and possibilities. Then he finally just said “good catch” and we shook hands and he walked away.
The problem: One file. One line. One term. One variable. One letter.
The senior programmer was trained in the days of limited space for variable names, so his typical practice was to use variable names for everything that were 8 characters or less. In this case, the variable in question represented the “time of ten percent height on the falling edge” of the pulse. The term was supposed to be “Ttpfe”, but he had mistakenly called it “Ttpre”, which was a real term for the “rising edge” counterpart. This term would have existed which is why the typo didn’t flag an “undefined” error. Also, the time difference in error for all tests EXCEPT SlowPOP was probably less than a picosecond. Finding and fixing this error was the culmination of about 6 months of work and the most satisfying bug-find of my career so far. Now, SlowPOP will always be perfect instead of just close enough.
That is why I will never abbreviate my variable names for as long as I live.
TL; DR It took 6 months of work to find 1 wrong letter, a typo by an engineer with 26 more years experience than myself.
July 9, 2016 at 4:42 am
In my first professional software engineer, I had inherited a 650-page COBOL Insurance Company Claims program that had some odd error. It took me several weeks to find the problem. A missing period. The absence of a period–which terminated an ‘IF’ statement, had caused the program to execute one more line before it encountered that line’s terminating period.
With software, accuracy is king.
July 9, 2016 at 4:47 pm
I hate bugs like these. I always try to make sure my variable names differ by more than a character so single-character typos will always show up as a compile error.
July 10, 2016 at 2:15 am
In fairness though, typos can happen with longer variable names too. While they may be easier to discover than variable name errors such as Tffghige and Tfdghige the human eye/brain is not ideally well suited for tedious work like that anyway.
July 10, 2016 at 3:30 am
I find your comment extremely ironic coming from someone with the username: “weghweh hwewehwhe”. 🙂
July 10, 2016 at 7:27 am
Very interesting and well written post!
Only one suggestion: a TL; DR at the end of the article is just too late, people find it when they have read the article already. Move it to the beginning.