People / companies allowed this to happen, so anyone related to this and still chocking on it, while not saying or doing anything about it can go and screw themselves. I'm simply tired of people ignorance and lazynes
Do not worry, soon this and others will be bought by Disney, who of course will then have all of our information, and we all know it is the land of dream, so when this happens, all our problems will be over😂
(this is really what happened) CrowdStrike: we updated a regex so there was 20 args instead of 21 args *week goes by* CrowdStrike: we updated a template that required the 21 args not 20 args and it crashed
@@katrinabryce it was literally a null pointer exception -the template update required that 21st arg to be pointing somewhere, it wasn't. Whilst the null pointer was caused by a poor regex pattern this time, it was ultimately human error not a regex or kernel bug.
@@katrinabryce Unfortunately, due to the nature of the software you kinda of have to do it in the Kernal. One day that won't be the case, today is not that day
This is an issue at a lot of tech companies. I'm a dev, and I've quit jobs over the company forcing bad practices. Most notably, KnifeCenter. I started there, realized they were torrenting pirated software on the same server where they held customer credit card information. I immediately left after being told I was "confrontational" with the CTO over it.
Can confirm this 100% true and evidence is this issue. Literally if anyone tested this, they would have experienced this crash. It impacted ANY windows devices, no matter the hardware. Crowdstrike's after incident reports also admit as much, this type of update only goes through scripted testing. Edit: I watched the full video and realized you explained this haha. I was on the ground fixing this non-sense for over a week.
I was taking a flight the day it happened. It was a cluster F. I work in IT this was a dodged bullet on my end. But at the same time I can now officially say I worked on an airlines computer.
Yessir you did. I have never experienced anything like it before. I work IT for a major airline. Its insane how fast thousands people called us at 3am that it collapsed our whole phone server. Meanwhile, all of our PCs were stuck in a boot loop. Felt hopeless for a few hours…
While you're correct that Crowdstrike should have checked the update, and respected staging, there's actually an even bigger problem: The app isn't doing any sanity checking on the data files in the updates it gets. One bad read or flipped bit on an otherwise correct update would bluescreen an otherwise perfectly healthy PC because the app just blindly executes the update.
For the sake of efficiency and costs, these companies recurrently cut on redundancies. We will be able to get stronger in the short term, but I doubt the cycle won't repeat itself as long as we have business empty heads on top of engineering companies.
It's shocking that some normal shops had to close because the register was not working. That's why we in Germany love our cash. It is a total joke that a shop full of goods has to turn back customers because the computer is not working.
When the store computer system goes down, it won't matter if you have cash. Many stores are not able to function without their computer(s). Try telling staff to write out receipts on a piece of paper or even use a calculator to add up the cost (and calculate the correct amount of change) and watch all the blank stares. (I usually pay with a credit card, but I do carry cash with me.)
It won’t help to have cash in this case. If the whole system is down,the product cannot be scanned to update inventory, receipts cannot be printed and incoming money cannot be registered as received for example (cashiers can easily pocket the cash for example in these cases and since inventory is not updated, no sale is registred either that will indicate a sale was made at all 🤷♀️)
@@daphne8406 and that's the problem. a normal cash register does not need internet, once the system is back you can update the system with the print out from the cash register. This is typical for this time, paper and pen are not good enough anymoe it seems. I wonder how supermarkets did it in the 90's.
As a QA engineer, I cringed at whatever lax testing they have in place. Automated tests are great, but there's still something to be said for taking an actual device and doing a manual test to make sure things still work right. Obviously I don't know their processes but clearly there was a dangerous gap.
FWIW, this was not a code update, but a data file update, data the Crowdstrike software uses to analyze computer activity. The data file was full of zeroes, which caused the Crowdstrike software to malfunction, causing a cascade failure.
@@DicksonJuma-iv2sc Why the hell would the CrowdStrike driver software not validate the data file when it read it ? Bad data in such a file is inevitable sooner or later, software should validate when parsing files.
This is not how staging works. Staging is a last test environment in the pipeline before anything is released to production. So a company can have a staging environment in which they will test new software before installing it on their production systems.
Financial planning is like navigation. If you know where you are and where you want to go, navigation isn't such a great problem. It's when you don't know the two points that it's difficult...
People don't understand that the prices of things are never going back down. This inflation is deeper than we think. Those buying groceries are well aware that the real inflation is much over 10%. The increments don't match our income, yet certain investors still earn over $365,000 in stocks and assets. Wish I could accomplish that.
Having an investment advisor is the best way to go about the stock market right now. I was going solo, but it wasn't working. I've been in touch with an advisor for a while now, and just last year, I made over 80% capital growth minus dividends.
Also to highlight how bad they are, 3+ months before their version for Linux had a similar issue with a pushed update, then three WEEKS before another Linux update issue. That should have caused a massive rethink on QA processes for all their products.
I work at a small software company and there is a lot of stress for adequate testing before releasing and pushing new versions of our programs to our clients, seeing how reckless one of the biggest security companies turned out to be in that regard makes me feel a a lot better about the job that I do.
I get the impression Crowdstrike don't test their updates thoroughly enough before rolling them out. I can understand them wanting to protect computers from the latest malicious viruses as quickly as possible, but a faulty update can cause even more damage than the viruses.
They do not have a living QA jsut automatic tests and it worked for many many updates before... just this time they didnt think to add the needed automatic tests to see if there will be a crash.
@@mrminecraft4772no that’s not answering the qn. HOW can other companies bought their service without any proper procedure? HOW can they trust enough with crowdstrike to buy their service? Is it just blind buying?
Its horrifying that the entire world including emergency services is reliant on updates from companies they can't control. Both CrowdStrike and Windows. I still can't believe essential services rely on things that could be compromised because one lazy employee (or hell, adversary) pushed a bad update
It's how any electrical system works. Power company goes down, millions of people lose access. Computers are always at risk of failure, so are phones, so are radios... literally anything can fail at scale.
The amount of money that was lost in less than a day must have been absolutely astronomical!!! 😱 In the airline and travel industry alone soooo many people needed to receive compensation or accommodation during the wait for their cancelled flights, there were so many travellers due to summer holidays (at least in europe).
This happened during my vacation. We had just gotten to our destination, and spent our first night there. My husband got a call from work since he's the IT Manager of a utility company. They were hoping he was nearby, but unfortunately we were in Vegas, and we live near the Mississippi River. They thought about flying him back, but that was a bust. I don't know what they did, but they eventually got back online with minimal outages.
crowdstrike IT 1: Sir we did an oopsie, a big one. Crowdstrike IT Manager: Then fix it Crowdstrike IT 1: We cant, the systems are blue screened of death
It didn't take a team of professionals to realize something was very wrong; but it did take a team of professionals to screw things up so badly. This is what happens when you do not test your software. If they had they would've realized the problem.
I work at a small credit union. Had no idea what was going on until a member told me lol. Got an email from the It department saying we are all good. Don't freak out, lol.
This is the first video that I saved for future. They just confirmed everything I am doing. My scale is million if not billlion times smaller. I am trying to make streams for local hometown games. And I ALLWAYS double check EVERYTHING. Computer, cables, camera, internet, the stream itself. And 98% of tests they work 10/10. But I am aiming to those 2%, and trying to make those less than 0,5%. And so far I and my method are the best (most stable tbh) in our amateur games in my country
Airlines actually put out the ground stops on themselves not a faa mandate. And the timeline was wrong. Airlines realized this problem way early as soon as midnight some computers already went to blue screen. By 3 o clock eastern, 3 major airlines + allegiant already put out their first wave of ground stop. Ofc regular customers won’t know until their boarding time
Whoa, this CrowdStrike snafu is a doozy! It's insane how a single borked update can brick so many systems across the globe. Makes you wonder how robust our digital infrastructure really is. Kudos to this video for breaking it down in layman's terms. This is a cautionary tale for all the sysadmins and DevOps engineers out there!
I fail to understand why it is that the Crowdstrike CEO has not been hauled into government hearing after government hearing all across the world since this outage happened. This is not something that should be pushed aside and forgotten about. There needs to be legislation to prevent this from happening again.
Crowd*STROKE* Thanks for getting me 7+ hours of paid time to sit around and watch youtube since my employer was down and everyone company wide was gettin the ol blue screen lol. I was the first at my organization to get the blue screen so I thought my pc had crapped out, I had windows reinstalled an hour later and then my boss was tellin me not to bother logging in that nothing worked lol.
Just a small correction You mentioned staging as a little-by-little release Staging is a different term, which you later used correctly. It's called Canary releases
Y2K did happen… it just took a while longer to happen. (I actually watched this happen at my job. 70% of our computers just completely crapped themselves for a few days.)
If anything, I find it worrisome how many things are connected to the net. Ehhh, I don't know, maybe treat your IT department not as an afterthought that only costs money.
I went from thinking it would be an easy day before my holiday collecting research data at a hospital to a wild one acting as a runner between clinical receptionists, nurses, doctors, and everyone in-between in order to keep a clinic running. No access to clinical software meant I couldn't do any data collection anyways, so I was a messenger. I then had to catch a flight that evening across the country but by that point, the software glitch had been resolved and I was only 3 hours late.
I would have been nice for you to explain that Microsoft isn't allowed to make user-level API's (to avoid the requirement for Kernel level code) because the EU doesn't let them, while they let Apple do it. We run this same type of software from Palo Alto. The concept of "staging" for these types of intrusion protection measures isn't real. The validation should happen at the vendor (Crowdstrike) and there should be some validation of the definitions brought into the driver through an update. The content of this update that was pushed was all 0's.
The only issue I had was that my ring camera became unusable. They blamed crowdstrike but crowdstrike doesn’t effect app functionality. So I'm thinking it was another ring hack on the same day.
Basically everything that wasn't running Linux. Luckily that was like 1% of the global market - mostly in the US. I didn't noticed a single thing down.