This morning, I was one of the many thousands who awoke to bricked phones. My iPhone 13 showed “SOS” and could not make or receive calls or SMS texts. My blood pressure became a sealed bottle of Diet Coke shaken with a pack of Mentos dropped in. All kinds of conspiracies flew around as service was eventually restored (for me, early afternoon). In my morning call with my brother Jay (made on my work iPhone, which is on the T-Mobile network and was unaffected), he noted it was likely a quality problem, like Boeing’s door bolts.
Jay was right, and the conspiracies were almost certainly wrong. AT&T claims the outage was the fault of a botched software update, as ABC News reported.
Sure, the Chinese could have infected the AT&T network with some kind of virus, or a back-door in some chip that crippled cell service (but only for certain phones: my family’s iPhones were unaffected on AT&T). But why would they activate it now? A warning? Some complex telegraphing of “this is what could happen if you cross us?” If you think about it, that’s the worst possible thing they could do, to the point of being nonsense.
Of course, that didn’t stop certain politicians, like Sen. Marco Rubio, from saber-rattling. Rubio tweeted, “I don’t know the cause of the AT&T outage. But I do know it will be 100 times worse when #China launches a cyber attack on America on the eve of a #Taiwan invasion.”
Will it? The Russians are well known for their hacking, and in fact have never stopped hacking Ukrainian media and government assets. But in the first days of the war, Microsoft engineers, along with Ukrainian cyber experts, stopped Russian and Chinese hackers in their tracks. You know who else is really good at hacking? The United States of America. Though many commercial ransomware operations have stung thousands of companies around the world, the Justice Department, CISA, and our intelligence community have had pretty good success shutting them down.
The Chinese do have some handsome tech, and have made mighty strides in cyber currency, social media hacking, and other domains of cyber warfare, denial and deception. I’m especially concerned with their research into Post-Quantum Cryptography, using satellite-based systems that may be impossible to break. The Biden administration recognized this growing threat, which is why transfer of certain chip technology is now banned.
Also, in 2022, the FCC banned use of China-based Huawei and ZTE chips in American 5G networks, which means the networks AT&T is currently using. Unless AT&T and other networks somehow got around the ban, or the Chinese have so thoroughly penetrated our commercial communications grid, it’s highly unlikely this outage was a CHINA-VIRUS. It’s also highly unlikely that China, if they did launch a major cyber operation before attacking Taiwan (which, we would be able to see the PLA massing for days before it happens, because, satellites, U-2s and other assets, and being unable to hide 100,000 troops and a thousand ships in the Taiwan Strait), would be able to take America completely by surprise and be “100 times worse” than today’s AT&T blunder.
So, if it wasn’t China or some cyber attack (I didn’t rule that out; cyber criminals are pretty bold—ask Colonial Pipeline—they also sometimes hit targets without realizing the carnage it will cause), and the company claims it was a botched software update, then we must ask, why did that happen?
The simplest explanation is someone was careless in testing. With all the effort spent by corporate America on cyber security, there seems to be a bad case of pencil-whipping when it comes to quality and attention to detail. Back in the old days of AT&T and the Bell System, the company would spend six months testing a new software release for the 5ESS telephone switching system before it deployed to a single live site. Bell Labs would spend years researching new technology before any of it made its way into the landline or long-distance network
Nowadays, companies go out and set their patching servers to “automatic” and barely look at what’s being done before hitting the “deploy” button. Whatever software update was rolled out at 3:00 a.m. EST this morning caused a whole bunch of phones to brick themselves, cell-service-speaking. To me, this is an intolerable attitude of laissez-faire on quality and testing.
The reason? Money, of course. It costs money to fully test every patch and release. It takes effort to build a test rig, including actual in-service equipment and personnel to simulate actual use. It takes people (technical folks who get paid competitive salaries) to design edge-cases and test plans to ensure things like today don’t happen. And AT&T, ever aware of the bottom line, no longer spends the money to do that. So, we get what we got this morning.
If the only thing today’s outage caused was some inconvenience, then, oh-well, sometimes my toaster oven won’t work either, and I have to go to Walmart to buy a new one. Sometimes, lightning comes over the coaxial cable (before I converted to fiber), twice in the same summer, zapping two cable modems, a receiver, a TV, and an Apple TV box, which I had to replace.
Sometimes, AT&T shows up to my house to install the new fiber we just ordered, and leaves the cable on my lawn. Then a crew comes a few days later to bury the cable, only they cut the fiber in the process. Then the crew goes to lunch despite being told they have just cut the fiber, with my children screaming they were in the middle of a raging session on Fortnight, or Lethal Company. Then AT&T sends another tech that afternoon, who tells us he can’t find the break, so he runs a new cable, and leaves it on the lawn, and schedules a new crew to bury it next week (who will probably cut the fiber again). Groundhog Day, anyone?
(AT&T really sucks these days, okay?)
Where was I? Yes, cell service outage is a big deal, especially when everyone has cut the cord and people can’t call 911 in places like San Francisco. That’s pretty bad. It’s no longer an inconvenience when that happens, and it becomes a health and safety problem. But money is more important than people not being able to call 911 for a few hours, despite the company’s commitment to damage control in the media: “Keeping our customers connected remains our top priority…” Yada yada.
What happened to AT&T is the same thing that happened to Boeing, and the FAA. They decided to put efficiency and politics above proper engineering, even as they become sclerotic and inflexible in their own customer-facing operations. Boeing’s former subsidiary shipped 737 MAX 9 jets without bolts in the door plug, which caused the plug to fly off during flight, according to the NTSB. The FAA’s Notice to Air Mission (NOTAM) system failed because some contractors unintentionally deleted files. That only delayed thousands of commercial flights, which could not depart without updated NOTAMS along the route.
Goodness me, could it be industrial disease?
One of Jay and my favorite bands is Dire Straits (between us, I don’t knowhow many times we’ve seen them, but more than twice for sure). One of our favorite Dire Straits songs is a quirky little number called “Industrial Disease.” It’s a real thing, and it’s infected companies like AT&T. That’s a whole lot more believable than some China conspiracy, don’t you think?
*** I’m sick. I barely made it home from work, though I left an hour early. My whole family has influenza B running through us like corn through a goose, and it seems today was my turn. Got home to a fever of around 100.8, which is fortunately down now due to the magic of Tylenol. We’ll see if I make it to the office tomorrow, or I’ll be jammed into our home with my 13-year-old and wife, who have been struggling with the crud for a few days. My 14-year-old just got over it.
*** Congratulations to NASA and Intuitive Machines for landing the first American piece of hardware on the lunar surface since 1972. It was kind of anti-climatic to watch this live (well, I watched the stream about 10 minutes after it happened), because there was no “Contact Light” call or “This is Tranquility Base.” It was more like “hey guys, we know this thing is on the Moon because we have a weak signal from the high gain antenna, we just need to lock in on it which will take some time.” But still, it’s pretty exciting.
I managed from the vendor side the rollout of new products to the old Bell Atlantic Verizon wire telco network. Nothing reached the customer network unless it successfully navigated the Verizon test lab. The lab was a stand alone baby telephone network that was lorded over by veteran hard hats. I found that that breed of telco engineer was not duplicated in the then baby Verizon internet side. Mistakes happened, but were limited in customer reach.
That's been the story for some time now: short term decisions for shareholder profit > long term sustainability and quality. Boeing was run by engineers - now it's not, and all that matters is the bottom-line.