Free Republic
Browse · Search
General/Chat
Topics · Post Article

Skip to comments.

How did a CrowdStrike config file crash millions of Windows computers? We take a closer look at the code
The Register ^ | 23 July 2024 | Thomas Claburn

Posted on 07/24/2024 10:48:47 AM PDT by ShadowAce

Analysis Last week, at 0409 UTC on July 19, 2024, antivirus maker CrowdStrike released an update to its widely used Falcon platform that caused Microsoft Windows machines around the world to crash.

The impact was extensive. Supply chain firm Interos estimates 674,620 direct enterprise customer relationships of CrowdStrike and Microsoft were affected. Microsoft said 8.5 million Windows machines failed. The results beyond a massive amount of IT remediation time included global flight and shipping delays due to widespread Windows system failures.

The cause, to the extent so far revealed by CrowdStrike, was "a logic error resulting in a system crash and blue screen (BSOD) on impacted systems."

That crash stemmed from quite possibly mangled data that somehow found its way into a Falcon configuration file called a Channel File, which controls the way CrowdStrike's security software works.

Channel Files are updated over time by CrowdStrike and pushed to systems running its software. In turn, Falcon on those machines uses information in the files to detect and respond to threats. This is part of Falcon's behavioral-based mechanisms that identify, highlight, and thwart malware and other unwanted activities on computers.

In this case, a configuration file was pushed to millions of Windows computers running Falcon that confused the security software to the point where it crashed the whole system. On rebooting an affected box, it would almost immediately start up Falcon and crash all over again.

According to CrowdStrike, Channel Files on Windows machines are stored in the following directory:

C:\Windows\System32\drivers\CrowdStrike\

The files use a naming convention that starts with "C-" followed by a unique identifying number. The errant file's name in this case started with "C-00000291-", followed by various other numbers, and ended with the .sys extension. But these are not kernel drivers, according to CrowdStrike; indeed, they are data files used by Falcon, which does run at the driver level.

That is to say, the broken configuration file was not a driver executable but it was processed by CrowdStrike's highly trusted code that is allowed to run within the operating system context, and when the bad file caused that code to go off the rails, it brought down the whole surrounding operating system – Microsoft Windows in this saga.

"Channel File 291 controls how Falcon evaluates named pipe execution on Windows systems. Named pipes are used for normal, interprocess or intersystem communication in Windows," CrowdStrike explained in a technical summary published over the weekend.

The configuration update triggered a logic error that resulted in an operating system crash

"The update that occurred at 04:09 UTC was designed to target newly observed, malicious named pipes being used by common C2 frameworks in cyberattacks. The configuration update triggered a logic error that resulted in an operating system crash."

Translation: CrowdStrike spotted malware abusing a Windows feature called named pipes to communicate with that malicious software's command-and-control (C2) servers, which typically instruct the malware to perform all sorts of bad things. CrowdStrike pushed out a configuration file update to detect and block that misuse of pipes, but the config file broke Falcon.

While there has been speculation that the error was the result of null bytes in the Channel File, CrowdStrike insists that's not the case.

"This is not related to null bytes contained within Channel File 291 or any other Channel File," the cybersecurity outfit said, promising further root cause analysis to determine how the logic flaw occurred.

Specific details about the root cause of the error have yet to be formally disclosed – CrowdStrike CEO George Kurtz has just been asked to testify before Congress over this matter – though security experts such as Google Project Zero guru Tavis Ormandy and Objective-See founder Patrick Wardle, have argued convincingly that the offending Channel File in some way caused Falcon to access information in memory that simply wasn't present, triggering a crash.

It appears Falcon reads entries from a table in memory in a loop and uses those entries as pointers into memory for further work. When at least one of those entries was not correct or present, as a result of the config file, and instead contained a garbage value, the kernel-level code used that garbage as if it was valid, causing it to access unmapped memory.

That bad access was caught by the processor and operating system, and sparked a BSOD because at that point the OS knows something unexpected has happened at a very low level. It's arguably better to crash in this situation than attempt to continue and scribble over data and cause more damage.

Wardle told The Register the crash dump and disassembly make it clear that the crash arose from trying to use uninitialized data as a pointer – a wild pointer – but further specifics remain unknown.

"We still don’t have the exact reason, though, why the channel file triggered that," he said.

The Register spoke with cybersecurity veteran Omkhar Arasaratnam, general manager of OpenSSF, about how things fell apart.

Arasaratnam said the exact cause remains a matter of speculation because he doesn't have access to the CrowdStrike source code or the Windows kernel.

CrowdStrike's Falcon software, he said, has two components: A digitally signed, Microsoft-approved driver called CSAgent.sys and the Channel Files used for updating the software with the latest security information.

"What CrowdStrike did is they essentially have a driver that's signed that then loads a bunch of these channel configurations," said Arasaratnam. "We don't know what the channel configuration file actually entails. It's a combination of what's in the file, as well as how CSAgent.sys interprets that."

Based on one stack trace, he explained, CSAgent.sys gets terminated for performing what's known as a bad pointer dereference. It tried to access memory from the address 0x000000000000009c, which didn't exist.

"It was an area of memory that it shouldn't have had access to," said Arasaratnam.

"Now, the Catch-22 you get into when you have a very low-level program like this, is the kernel overall is supposed to be responsible for the operating system doing many low-level things, including allocating memory," Arasaratnam said.

"So if the kernel is trying, in essence, is trying to access memory that it shouldn't access, the appropriate thing to do, just from an operating system theory perspective, is to assume that none of the memory that's been allocated is safe, because if the kernel doesn't know, who the heck does, and basically halt the system."

The situation was complicated by the way the Windows driver architecture works, Arasaratnam explained.

"The way that it works is that drivers can set a flag called boot-start," he said.

"So normally, if you've got a driver that's acting kind of buggy and causes a failure like this, Windows can auto resume by simply not loading the driver the next time. But if it is set as boot-start, which is supposed to be reserved for critical drivers, like one for your hard drive, Windows will not eliminate that from the startup sequence and will continue to fail over and over and over and over again, which is what we saw with the CrowdStrike failure."

(We believe the reason why Microsoft recommended people reboot affected Windows virtual machines on Azure as many as 15 times to clear the problem is because there was a small chance each time that the errant config file would be automatically updated to a non-broken one before the CSAgent.sys driver started parsing it. After multiple reboots, you would eventually win that race condition.)

Arasaratnam said that beyond that, we won't know how the Channel File update that told CSAgent.sys to do a bad pointer dereference managed to pass quality assurance (QA).

"It seems obvious that something slipped past QA given the frequency with which the crash occurred," he said. "It seems like even a trivial amount of QA would have caught this. This isn't some edge case where it's like one in 1,000 machines, right?"

Arasaratnam said there are several best practices that should have been observed. One is not to run software in kernel mode if you can help it. Second, is to ensure that QA is more thorough. Third, is to do what Google does by deploying incremental Canary releases.

He explained, "One of the techniques employed by Google, which we used when I was there, is to do what's called Canary releases – gradual or slow rollouts – and observe what's occurring rather than crashing what Microsoft estimated were 8.5 million machines." ®


TOPICS: Computers/Internet
KEYWORDS: crash; crowdstrike
Navigation: use the links below to view more comments.
first 1-2021-33 next last

1 posted on 07/24/2024 10:48:47 AM PDT by ShadowAce
[ Post Reply | Private Reply | View Replies]

To: rdb3; JosephW; martin_fierro; Still Thinking; zeugma; Vinnie; ironman; Egon; raybbr; AFreeBird; ...

2 posted on 07/24/2024 10:49:01 AM PDT by ShadowAce (Linux - The Ultimate Windows Service Pack )
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

>> tux

exactly


3 posted on 07/24/2024 10:51:15 AM PDT by Gene Eric (Don't be a statist! )
[ Post Reply | Private Reply | To 2 | View Replies]

To: ShadowAce

CROWDSTRIKE is a DEMOCRAT COMPANY! I sure hope only Democrats USE it!!


4 posted on 07/24/2024 10:51:53 AM PDT by Ann Archy (Abortion....... The HUMAN Sacrifice to the god of Convenience.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Excellent thread. Spent over forty years of my life playing software engineer.


5 posted on 07/24/2024 10:52:31 AM PDT by kawhill (kawhill)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

error handling is a thing.


6 posted on 07/24/2024 10:53:05 AM PDT by xoxox
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce
Thanks for posting this, I just searched my C drive for: C:\Windows\System32\drivers\CrowdStrike\ and nothing came up, so I assume that I have no Crowd Strike software on my machine, not that I should have.

However, now I know, Thanks again.

7 posted on 07/24/2024 10:57:05 AM PDT by Navy Patriot (Celebrate Decivilization)
[ Post Reply | Private Reply | To 1 | View Replies]

To: xoxox
error handling is a thing.

So is testing updates but apparently not to the team at ClownStrike.



8 posted on 07/24/2024 10:57:42 AM PDT by T.B. Yoits
[ Post Reply | Private Reply | To 6 | View Replies]

To: ShadowAce

Never be the first to download an OS, or any update, or new whiz bang software.


9 posted on 07/24/2024 10:59:17 AM PDT by null and void (I identify as a conspiracy theorist. My personal pronouns are told/you/so.)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

It was not a mundane detail Michael.


10 posted on 07/24/2024 10:59:47 AM PDT by pas
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Keep in mind not every Windows system used Crowdstrike Falcon. In fact, it really was/is an enterprise-level tool.

There are many to chose from including:

• Cynet
• ESET Endpoint Security
• Trend Micro Apex One
• Symantec Endpoint Detection and Response
• Stormshield Endpoint Security
• CrowdStrike Falcon Insight
• Cybereason Total Enterprise Protection
• Malwarebytes Endpoint Protection
• Panda Endpoint Protection
• FireEye Endpoint Security
• Comodo Advanced Endpoint Protection

The one used the most is Symantec.

Falcon Pro is about $99 per endpoint and Falcon Enterprise is $190 per endpoint (both are annual subscriptions)


11 posted on 07/24/2024 11:03:50 AM PDT by Alas Babylon! (Repeal the Patriot Act; Abolish the DHS; reform FBI top to bottom!)
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Just a happy dance because you use Linux….. 😂😂😂😂


12 posted on 07/24/2024 11:09:14 AM PDT by Lockbox (politicians, they all seemed like game show hosts to me.... Sting)
[ Post Reply | Private Reply | To 2 | View Replies]

To: ShadowAce

This guy, an old MS NT developer has a couple of pretty good videos...

Essentially, CS were allowed to access and write code that operated the kernel. Oh, and the EU (European Union), didn’t allow MS to implement ways to protect the kernel... like was Apple were allowed. The EU were concerned about the monopoly that MS had over their OS.

https://www.youtube.com/@DavesGarage/videos


13 posted on 07/24/2024 11:10:42 AM PDT by dhs12345
[ Post Reply | Private Reply | To 2 | View Replies]

To: Alas Babylon!
A cyber security guy recommended against using these types of software packages — Norton, MCaffee,.. He was more concerned about access to and security of and privacy than BSOD. But his point was well taken by me.

He said that the Windows version of protection is adequate — safety vs privacy.

14 posted on 07/24/2024 11:14:12 AM PDT by dhs12345
[ Post Reply | Private Reply | To 11 | View Replies]

To: All

So they came out with a workaround which required booting into safe mode and then deleting a sys file. Now, I was on vacation at the time and my company or personal stuff wasn’t affected, but here were the problems I immediately thought of in the workaround. I use hotel front desk clerk as an example because I was affected on the drive home when my hotel couldn’t make door key cards and had escort us and unlock our hotel room door with a master key.

Some of you way smarter folks can perhaps tweak my understanding of this wherever you’ve seen I’m going wrong:

1. You’re not going to be able to “remote into” a failing computer since it’s in a BSOD/boot loop. Gonna have to fix on site. You’re either going to have to:

a. Travel to the site and fix computers one by one.
b. Overnight and ship a new computer with the fix applied.
c. Talk a user through the workaround via phone.

2. Hard enough for ME to remember how to boot into safe mode let alone some front desk clerk at a hotel or manager at a bank. So good luck getting a non IT employee to boot into safe mode for you.

3. I believe once you boot into safe mode you’re going to need a local admin password for that machine. How many remote IT departments are going to let THAT one out over a phone call with a front desk clerk at a hotel. Most will guard that admin password with their lives.

So, assuming your IT support is remote rather than inhouse, you’re going to be dispatching a bunch of techs or shipping a bunch of systems all over the place for a while.

Am I getting this generally correct?


15 posted on 07/24/2024 11:17:07 AM PDT by mmichaels1970
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

yeah. a domino server crash doesn’t entirely explain why some of these companies have been down for days. but hey if you hire an IT company named ‘crowd strike,’ and let it install stuff across your enterprise without vetting, as an engineer, i don’t have much sympathy for you.


16 posted on 07/24/2024 11:22:42 AM PDT by dadfly
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

Putin and Xi laugh and take notes.


17 posted on 07/24/2024 11:39:03 AM PDT by dynachrome (Auslander Raus!)
[ Post Reply | Private Reply | To 1 | View Replies]

To: dadfly
but hey if you hire an IT company named ‘crowd strike,’ and let it install stuff across your enterprise without vetting, as an engineer, i don’t have much sympathy for you.

I don't have a 100% grasp on all of the ins and outs, but I believe it's a bit more convoluted than that. As the end-user client company, usually you hire an IT support company (rather than staffing up your own inhouse IT department). That IT support provider assumes responsibility for protecting your network from cyber threats. If you ever get hacked, or some goofball clicks on an emailed hyperlink to let loose a bunch of russkie bits and bytes on your machine, you go after your IT support company and grill them for not adequately protecting you.

THAT IT support company decides to go with Crowdstrike which is one of several security software systems out there. Some dude at Crowdstrike messes up, IT support company has your systems set to auto update security software quickly (since these updates are usually responding to emerging threats), client company's computers all go poof.

Crowdstrike immediately says "oops, we messed up. But here's a little workaround that can fix the issue in minutes." Unfortunately, very few people actually sitting at these computers have the expertise OR security access to actually perform this workaround.

President of client company calls IT support company and threatens to fire them all if the issue isn't resolved. IT support company gets overwhelmed as they have more than one client doing this. Three tech guys quit cause they decide it's not longer worth the aggravation. Too much coffee ends up being drank...IT anarchy reigns.

Right now I blame Crowdsrike...and ONLY Crowdstrike.
18 posted on 07/24/2024 11:44:41 AM PDT by mmichaels1970
[ Post Reply | Private Reply | To 16 | View Replies]

To: ShadowAce

Wasn’t crowdstrike a huge part of the Clinton email scandal?


19 posted on 07/24/2024 11:52:18 AM PDT by DouglasKC
[ Post Reply | Private Reply | To 1 | View Replies]

To: ShadowAce

You’d think Microsoft would know something about rolling out software updates...err..wait...


20 posted on 07/24/2024 12:07:42 PM PDT by bigbob
[ Post Reply | Private Reply | To 1 | View Replies]


Navigation: use the links below to view more comments.
first 1-2021-33 next last

Disclaimer: Opinions posted on Free Republic are those of the individual posters and do not necessarily represent the opinion of Free Republic or its management. All materials posted herein are protected by copyright law and the exemption for fair use of copyrighted works.

Free Republic
Browse · Search
General/Chat
Topics · Post Article

FreeRepublic, LLC, PO BOX 9771, FRESNO, CA 93794
FreeRepublic.com is powered by software copyright 2000-2008 John Robinson