This single line of code cost AT&T $60,000,000!

Kiki's Bytes

Подписаться 23 тыс.

Просмотров 15 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

7 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 42

@grkuntzmd 7 месяцев назад

I was working at AT&T Bells Labs in New Jersey when this happened. The person in the office across the hall from me, a distinguished member of the technical staff (DMTS), Dave, who was a C expert (editor of the ANSI C standard), found the problem. It was a misunderstanding of how the break statement works in C by the original writer of the code.

@kikisbytes 7 месяцев назад

OMG for real? That's insane!!! Was everyone in the office panicking?

@grkuntzmd 7 месяцев назад

@@kikisbytes Dave and I actually worked on the C compiler team, so it was not our direct responsibility, but Dave was asked to help because of his expertise and the gravity of the situation. He looked over the source code and pretty quickly found the problem.

@voidkid420 7 месяцев назад

I once told a DB dev to set imgs to be 200x200 so he could debug some "empty" sections (they all sucked at css) ... came in next morning to find everyone freaking out about imgs, guess what happened? This v well known company also used some random url rewrite script it bought for $20, which meant if u asked for a jpg that wasn't there it would drop the entire site. One of the largest e-retailers in the UK.

@kikisbytes 7 месяцев назад

Thank you for sharing! So this place bought a script for $20 from somewhere rather than just writing it in house? 🤨

@funkdefied1 7 месяцев назад

Sounds like Wordpress development. Was it?

@voidkid420 7 месяцев назад

@@kikisbytes The in-house team of .NET developers didn't even consider the url re-write till half way through developing the new site when Wordpress folk got it as standard :D

@voidkid420 7 месяцев назад

@@kikisbytes Thinking about it, they screwed me over so I owe NEXT plc no loyalty on this one lol Terrible place where the whole culture is "I didn't beak it" and "it passed QA" ... There's zero "let's make something good"

@voidkid420 7 месяцев назад

@@funkdefied1 Wordpress was a lot more forward thinking :) .NET for "rapid development" because everything should be a fkin hehe

@mikemilner8080 7 месяцев назад

This description is a bit of an oversimplification. The 4ESS proper was coded in EPLX while the 3B20D that ran the CCS7 messaging software was written in C. CCS7 was a message passing overlay that allowed Dynamic Non Hierarchical Routing - all prior routing was based on each switch in the hierarchy kicking a call to the next higher level if the given switch didn't know how to reach the desired destination. The unhandled race condition was in the CCS7 (Common Channel Signaling) - not in the in-band trunk signally that the 4ESS could and did use as well. I was working at Indian Hill, the Illinois location that handled switching system development, when this happened. Lucky for me I was with 5ESS but the rumor was that the guy who wrote the bad code worked in an office a few hallways down. Would not want to have been him. For those who want to dive into the nitty gritty of circuit-switched voice telephony (now largely replaced by packet based switching), there are multiple issues of the Bell System Technical Journal that devote themselves to the hardware and software design of 4ESS and 5ESS.

@WackoMcGoose 7 месяцев назад

My first reaction was "wow that was a fast turnaround on the outage, not even Cloudflare publishes COEs that fast"... then you said 1990. Topical either way!

@kingofcastlechaos 7 месяцев назад

Impeccable timing on his part! I thought the same thing!

@jamesxiao4996 7 месяцев назад

Haha learned about cascading rollbacks last semester in databases. Cool to see how it could potentially happen in the real world

@kikisbytes 7 месяцев назад

ayyyyy nice!! Hope you did well in that class 😎

@tyo007 7 месяцев назад

mark my word, you are going to be big. Very helpful video, nice pace, interesting and clear illustration.

@isiraadithya 2 месяца назад

I wonder how much crowdstrike lost due to recent events, it would be amazing to see a video related to that event as well :) Awesome content btw

@swagatochatterjee7104 7 месяцев назад

Why the fuck do you even need a break there if you dont want the switch case to exit?

@kikisbytes 7 месяцев назад

definitely one of those oopsies

@devSero 7 месяцев назад

It's sometimes completely insane how the world is in the hands of our bugs especially when software is behind major companies.

@kikisbytes 7 месяцев назад

agreed and imagine the ones that haven't been found yet

@andrewgonzalez2025 7 месяцев назад

Really appreciate your videos. I feel like your channels should blow up soon. I was wondering what you use to create all the animation for the videos?

@kikisbytes 7 месяцев назад

thank you for watching! I use motion canvas for my animation :)

@lokylee7872 7 месяцев назад

For the algorithm!

@kikisbytes 7 месяцев назад

hahah you're the best!! 😍😆

@beimichen8035 7 месяцев назад

I took down the staging environment of my company’s data pipeline because the caching layer I wrote had a custom compare sort that didn’t fulfil antisymmetry properly. No one noticed for about two days lol.

@kikisbytes 7 месяцев назад

Thank you for sharing! Did you have a dev env before merging to staging? These envs are to catch issues and good thing it was caught :)

@beimichen8035 7 месяцев назад

@@kikisbytes Yes, we did have a dev env, all unit tests and integration tests passed, and it worked fine there. The issue was insidious as it didn't show up until the second day. It kept chugging along fine for a day until it stopped caching on the second day.

@fwfy_ 7 месяцев назад

maybe doesn't count because it was a hobby project of mine, but i host a self-made Telegram bot that a decent number of people use. it's a modular system with a whole host of features, ranging from animal pictures, to quote storage, a fake economy, and even some homebrewed generative text ai. i was making a small change to the code about a year ago now (can't quite remember what) and when i went to deploy the changes to the remote server, i ran an outdated deploy script that decided to overwrite the user data folder and then immediately restart the bot. RIP all of those saved settings, statistics, fake tokens, etc. that was a painful announcement to make lol. had the bot not been restarted, the data would've still been in memory and i'm sure that i could've restored it somehow. and of course, no backups :)

@kikisbytes 7 месяцев назад

This definitely counts and thank you for sharing! And ohh man that must of been one tough of a night. Were your users okay resigning up for your service afterwards?

@fwfy_ 7 месяцев назад

@@kikisbytesit was more of a helper bot on a chat platform, so nobody had to re-signup, but they definitely were more than a little peeved at having to reset their settings to what they were before the big wipe, and i was sad at the loss of all that PLUS the super big dataset for the aforementioned generative text ai

@darkwoodmovies 7 месяцев назад

AT&T is the most un-ambitious company that has ever existed.

@breathlessblizzard 7 месяцев назад

Timely subject matter lol

@kikisbytes 7 месяцев назад

hahahaha its only been 34 years late

@mikechurvis2762 7 месяцев назад

@@kikisbytes AT&T had a nationwide outage last week. Took down half the phones in the US for 9 hours. My company got an alert level 1 from them, first any of us have seen. Clicked on this vid because I thought it was about that.

@hunorportik5618 6 месяцев назад

Have I ever brought down prod? Several times... Am I proud? No way, but I actually managed to fix with 99% accuracy at least, plus had lots of luck during that. I learned some key things for sure. Like: never ever run a one-liner cmd thinking it's just deleting a single entity in db which I'm intending to do, especially when u don't know the behaviour of the app DB layer 🙃

@kikisbytes 6 месяцев назад

that's pretty good thank you for sharing!