The Calm Before the Storm
Last Friday, everything was perfect. I was dealing with minor tasks with a latte and excited to think about weekend plans.
And my phone rang. Oh gosh, it was a colleague from the sales team.
Not Verified? How? Why?
He said he was in the field to investigate an issue. And the site was using products that I've been in charge of.
The issue was even though he correctly connected a LAN cable from a Wi-Fi router to the product (from now on I'll call it the server. It's dual-LAN) it couldn't access the internet.
I was pretty sure that the server was what I'd been working on. But that was wrong.
I had never properly tested this server before. (for the context, hardware for the server products vary, while software and OS are the same. So if any issue happens, I usually suspect hardware-related parts.)
Investigation Must Continue
At first, I quickly checked some factors. Since it was a field issue, I had no choice but rely only on voice descriptions:
- Was the LAN port link-up? - Yes.
- Was the Wi-Fi router working well? - Yes. He tested it over wifi.
- Did he change the LAN port? - Yes. Both LAN ports worked well.
- What was the result of the ping test?(to 8.8.8.8) - Destination host unreachable.
- What was the result of the ping test?(to gateway) - Destination host unreachable.
Gotcha. The issue was narrowing down.
Since even the gateway was unreachable, it clearly wasn’t just a DNS issue.
The ping test result often means default gateway is not correctly set.
Digging into the Rabbit Hole
So I had the colleague set the network settings via Ubuntu Network Manager (Of course, it's GUI.) But unfortunately, even after everything was set correctly, a ping test for 8.8.8.8 kept printing Destination host unreachable.. The issue turned out to be more complicated than expected. I had no choice but to suspect everything - software, hardware, environment, and so on. At this point, I started doubting everything like a child on April Fool's day.
The Twin Test: Evidence of Innocence
So I quickly sent an email to the sales team to ask whether they had the same server as in the field in storage. Fortunately, I got one. I soon set it up and tried.
Guess what? there was no problem with the server at all. The hardware was working fine. The software in it was doing its best. Both LAN ports worked, of course.
In this situation, the conclusion was pretty clear - the issue is related to the environment where it's placed. Based on my experience, environment-related issues are really hard or impossible to resolve in the first place.
The Hardest Part of Debugging
So my last task was to convince the sales colleague of the issue. He didn't want to come to the same site in the future so he wanted immediate solution from me. I spent almost 20 minutes to make him fully understand. Unfortunately for him, despite my persistent description, he was arranged to visit the site next week again.
Yes. Sometimes it's the hardest and biggest task for developers to prove that issues are irrelevant to them.