Πηγαίνετε εκτός σύνδεσης με την εφαρμογή Player FM !
Defensive Security Podcast Episode 275
Manage episode 433153146 series 1344233
Links:
- https://www.crowdstrike.com/wp-content/uploads/2024/08/Channel-File-291-Incident-Root-Cause-Analysis-08.06.2024.pdf
- https://www.theregister.com/2024/08/05/crowdstrike_is_not_at_all/
- https://www.theverge.com/2024/8/6/24214371/microsoft-delta-letter-crowdstrike-response-comments
- https://www.linkedin.com/posts/alexstamos_why-crowdstrikes-baffling-bsod-disaster-activity-7224046054076243969-1An8?utm_source=combined_share_message&utm_medium=ios_app
- https://www.linkedin.com/posts/choff_why-crowdstrikes-baffling-bsod-disaster-activity-7224078879445958658-ymuc?utm_source=combined_share_message&utm_medium=member_ios
- https://www.securityweek.com/thousands-of-devices-wiped-remotely-following-mobile-guardian-hack/
- https://www.bleepingcomputer.com/news/security/stackexchange-abused-to-spread-malicious-pypi-packages-as-answers/
- https://www.bleepingcomputer.com/news/security/hunters-international-ransomware-gang-targets-it-workers-with-new-sharprhino-malware/
Transcript:
Jerry: Today is Wednesday, August 7th, 2024. And this is episode 275 of the Defensive Security Podcast. My name is Jerry Bell and joining me tonight as always is Mr. Andrew Kalat.
Andrew: Good evening, Jerry. How are you? Good, sir.
Jerry: I am amazing. It is blistering hot at the beach, but it’s awesome.
Andrew: recording from your southern compound.
Jerry: I am.
Andrew: Nice.
Jerry: Yeah, Bell Estate South.
Andrew: And Debbie was not an issue.
Jerry: Debbie not here. We got probably 45 minutes worth of rain.
Andrew: Yeah, it seems, at this point, in real time, stalled out over South Carolina
Jerry: Yeah, it looks several feet of rain hitting like Savannah and That is nuts. But no, it was not a big issue here. I was pretty worried. I packed up all my Milwaukee batteries with lights and whatnot in preparation for the worst got extra tranquilizer for my dog who hates storms.
But no, it’s been absolutely amazing here.
Andrew: So you took the tranks instead? Is that what I’m hearing?
Jerry: Absolutely. You gotta sleep somehow.
Andrew: That’s fair. I’m glad it was a non event, at least for your little neck of the
Jerry: Yeah, it was Nice you could actually see some of the storm clouds off in the distance. And that was the best way to watch a hurricane is when it’s far away.
Andrew: That’s true. That
A few I’ve been through. Stuck on islands, but
Jerry: Yeah, that’s right. since I’ve been here, I have been in the building for two hurricanes, and the building’s been hit by three tornadoes. And then there was also a unsuccessful base jump.
Andrew: So we’re saying you are cursed. Is that what we’re saying?
Jerry: am the human equivalent to a plastic flamingo.
which attracts tornadoes for those who don’t know. Anyway.
Yeah.
Andrew: after that meteorological update,
Jerry: Yeah. just a reminder that the thoughts and opinions we express on the show are ours and do not represent those of our employers past, present, or future.
Andrew: maybe even our
Jerry: Or our pets. my pet is licking me right now and she says, nope, it’s not her opinion.
Andrew: fair,
Jerry: Okay I would say that this is going to be a CrowdStrike heavy episode.
Andrew: three weeks in a row.
Jerry: Yeah, it continues to get more and more interesting. Obviously the main event itself is largely behind us and now we are in the lawyer up phase of the party.
Andrew: the blamestorming
Jerry: blamestorming has indeed begun. The first topic we have to talk about here is the actual formal full root cause analysis was released yesterday by CrowdStrike and it is a 12 page long document. It has lots of marketing fluff in it.
And only I would say a little bit of substance. I don’t think there’s anything that is remarkably telling or revolutionary in the document, but it does indicate technically what went wrong. And it gives some indications of the, potential improvements for their quality assurance, which I think is where a lot of this went wrong.
So the, I’m not going to go through the details in uber technical specificity, but the net is that this channel file update is for this inter process communication agent, for lack of a better term, I’ll call it. And that agent, expects configuration files that have
20 parameters, but through some unfortunate
bad planningtheir test harness actually was Marking the 21st as a catch all, as an asterisk. It was effectively being marked as not used. And so in this particular update, they actually started using it, and that ended up causing their parser to perform what ultimately ended up being an out of bounds read.
Because that parser wasn’t set up to actually read it. And so when that read attempted to happen in kernel space, it tried to access memory. It wasn’t allowed to access, wasn’t allocated. And that caused the blue screen. And because the same thing happened every time it booted up.
You just had this endless boot loop until that particular file got removed. I think the more substantive issue, and that’s the kind of thing that can happen,
Andrew: So let me restate that to make
The application was expecting. a file that had 21 fields in it, and it got a file with 20.
Jerry: Yes.
Andrew: And where it went to read that 21st, it wasn’t allowed to read, and the way that systems protect themselves to do a kernel panic and shut down if you’re trying to read something you’re not allowed to
Jerry: Yes.
Andrew: If you’re in
Jerry: Windows basically says something is horrifically wrong. This should not happen.
Andrew: If I went by that criteria, I’d shut down every day.
Jerry: And so if that were to happen in user space, the application that performed that read would crash. But when it happens in kernel space, Windows attempts to protect itself and it blue screens.
And so the challenge is that testing harness was built assuming that 21st parameter was always set up as a catch all and so effectively was being ignored.
And I think there were really two issues here. One was they didn’t have a very thorough, their testing harness obviously wasn’t, Properly designed, but then they also did not have staged deployments. Like they, what they have a process where once it goes through that test harness and passes it, it goes out far and wide.
There is no staged, deployment ring concept that you have in, let’s say, Microsoft Windows updates and whatnot. And because of that, it, it blasted out. Everybody implicitly trusted CrowdStrike updates and those got applied to pretty much as, fast as they were delivered and the rest is now history.
Andrew: I think it’s a very complicated series of events that led to this. And I think just reacting to a lot of the zeitgeist in the social media world around this, there’s a lot of angry finger pointing some of which is probably well warranted, but it’s interesting to see how the inner chain came together.
And going back to another area I know a little bit about is like aviation incident investigation and things like when space shuttles explode or fall apart, I’ve read a lot of books on those and that sort of thing. Anyway, it’s interesting how there’s very rarely one root cause is where I’m going with this. Usually series of events, an air chain that led to this situation. If one of these situations have been slightly different, this would have been caught and all the Swiss cheese holes lined up just right this situation to happen, not absolving or apologizing for it. It’s just interesting how complex these situations truly are. Compared to how a lot of people will knee jerk their opinions on things, usually based on their own bias around what they care about.
Jerry: When I was reading it, it reminded me of the show. I think you’ve probably watched it too called engineering disasters. And the history or the learning in each one of those episodes that a sequence of disconnected things all lined up in just the right way. for that disaster to happen.
Andrew: right.
Jerry: And I think that is definitely what happened here.
Andrew: For everybody involved, but there’s a part of me that finds these things fascinating to watch play out.
Jerry: I still think, for me, what is most, troublesome, because this is not unprecedented, right? Obviously the amount of systems that were impacted, is unprecedented, but that’s probably more a function of how interconnected and dependent we are on computers than any point in time.
But what’s interesting is that this sort of thing has happened in the past, right? This has happened with Symantec and McAfee and Microsoft probably about five or six different times and several others that I’m probably missing. But one of the things that, distinguishes this from those is that those were much less impactful because they did stage rollouts.
And so when it happened, it was devastating to the people who were among the first, the canaries that they had the problem. But this is a different thing. I think that the fundamental Coding and architecture errors are hard to foresee.
They’re easy to see in hindsight, right? this is like the signal and the noise thing. The failure is easy to identify after the fact, because it’s obvious. Like you can’t, duh, it’s so obvious that this was going to happen, but it’s only obvious after the fact.
Andrew: Certainly.
Jerry: weren’t looking at it beforehand and saying, Oh, we’re just going to accept the risk. They just, it wasn’t in their mind. And so, that part I find less, obviously it is the thing that caused it, but what I find most problematic is the fact that they hadn’t adopted what I would call the industry standard practice of the tiered rollouts.
Andrew: I’m sure that was an intentional decision. I obviously don’t know for sure. I have no idea about decisions that go on a cross track. I’ve never worked there. However, those in their mind, I would imagine a value in not doing, in
Jerry: sure,
Andrew: So dig, if you will, the picture, they are using these sensors, not just to stop things, but also together
these sensors are reporting back to CrowdStrike all the time about potentially malicious or confirm malicious activity. And then they use that data to shotgun this out. And this is basically a reject expression. They were trying to get out to the world about something they thought was malicious. they’re. Go to market strategy is to stop breaches specifically around ransomware. So their business model is really geared towards a ransomware. That’s in my opinion, they’re, I’m not speaking for them. It’s just my read of it. And they care greatly about stopping ransomware and windows environments. So they know that ransomware spreads quickly and rapidly, and I’m sure. In their mind that the value of getting these updates out as quickly as possible to as many sensors as possible a valuable thing to do. So let’s say they know about a new attack type and they pick it up at one customer and they start staging this and they stage it to 10 percent of the customers and 30 percent of the customers and then 50 percent of the customers.
Meanwhile, you’re at a customer who hasn’t gotten staged yet and you get hit by it. Are you happy with their staged deployment at that point?
No. End. So I’m just playing devil’s advocate a little bit on, it probably was an intentional decision to not do staged rollouts. It’s not like they just were unaware of that concept. I think they felt that there was value in updating as rapidly as they possibly could to as many sensors as they could.
Jerry: Likely, I would say that is definitely the proof is in the pudding, right? That is exactly what they were doing.
Andrew: A, in hindsight, it’s a bad decision, but it might not have been a bad decision at the time.
Inputs as to why they made that call, I guess is what I’m saying.
Jerry: I’m sure there is. And by the way, I’m not advocating for, something like what with Microsoft where the rollout happens over the course of days or weeks, I’m more talking about. Hours that, potentially in my mind, at least is a happy medium where you could conceivably have canaries that go an hour before the rest of the world, right?
So yeah, you miss an hour. And I’m sure that potentially exposes some number of customers and maybe this is such a rare thing that one hour difference is something that their customers feel important, to accept the risk on. I don’t know. it’s a fair question.
Andrew: They are saying in their root cause analysis that one of the things they will do as part of the mitigation is provide customers more control over the deployment of rapid release content updates. So to me that means maybe choose when and where rapid release content updates are deployed is what they’re saying. Like they want to go down that path of, hey, if you want to be as safe as possible from ignoring operational risk, or I should say, minimizing your concern about operational risk, deploy as quick as you can. If you are concerned about operational risk, here’s these tools, right? But the trade off you might be More exposed to malware risk.
Jerry: Correct. And, from a customer perspective, it does give you the opportunity to do your own soak testing. But I think you probably also, again, depending on your risk tolerance and size of your environment and exposure and whatnot, would do well to do your own sort of testing.
Before you push it out far and wide. But again, that takes resources. It takes time. It takes people, it takes labor. And if it drops on,after work, right before a long weekend, are you going to, you’re going to have the appetite to do that?
Andrew: and how consistent is your fleet?
Are you on the same patch level? Are you in the same build? Are you on the same hardware? There’s a whole lot of things that are coming to play. And if you’re only testing on a couple of examples,
Jerry: Yeah.
Andrew: yeah, I don’t disagree. I know I’m being a little contrarian on this. It’s more a matter of, again, people are saying things like, if you only just, and I’m like think that through a little, and you’re going to find out that there’s still holes in your planning and that you may have to accept some risk for the benefit of the tool. And have a way to solve that for you. There’s no perfect answer.
It’s you’ve got to figure out that trade off for your own organization.
Jerry: Yeah, I think that’s fair. The other, again, I don’t know a whole lot about their release pipeline process, but it, I find it fascinating that they clearly don’t as part of their release process, verify that it doesn’t blue screen a system like there, there’s a very different distinction between, this.
channel file update causes CrowdStrike to crash or for it to not detect things and then require a subsequent update. to fix it versus the agent is now creating an unbootable situation. And it seems like there’s a very utilitarian, very specific set of tests. And I think deterministic tests that they could put in line in their release pipeline when you deploy it onto.
X number of different windows systems. Does it crash? And if it doesn’t, then you release it with the assumption that if it’s not a catastrophic problem, any other problem could be fixed with an update.
Andrew: Yeah. Or if it’s not crashing on the largest deployed population set example, any crashes you have will be limited to relative small percentage of systems.
Jerry: All right. So anyway, it’s an interesting read. Um, there’s obviously I think it’s 12 or 13 pages long, a lot more technical details about the actual nature of the crash and whatnot. So I invite you to read it if you’re interested in that sort of thing. Moving on to the next story, which is related, this one I’ll take a step back and say for those of you who aren’t aware, there’s a bit of a war of words between Delta Airlines CrowdStrike and Microsoft.
So Delta CEO, Ed Bastian is very famously now said that this outage has cost Delta Airlines about half a billion us dollars in losses and that he feels compelled on behalf of his customers and shareholders and employees to try to recoup some of those losses. And so he’s signaled that he’s going to be suing.
both CrowdStrike and Microsoft. And we’ve talked about this a little bit in previous shows, but now CrowdStrike and Microsoft have separately responded to those legal threats. I don’t think they’ve actually materialized as filed lawsuits yet, but the first one is from the register and the title is CrowdStrike Unhappy About Delta’s Litigation Threat Claims Airline Refused Free On Site Help.
So this particular story is about a open letter that one of CrowdStrike’s lawyers sent to the legal counsel that Delta retained. And I think that legal counsel is pretty some fairly high profile legal team that was involved in in prior high profile cases. In this, I’ll summarize this letter is basically saying, you, Delta should be, cautious about what you ask for.
And by the way, if you are going to proceed with this, we would expect you to retain a certain set of information that we think will be useful in this litigation. And there was one quote, I think summarizes the whole thing very well. And it says, Should Delta pursue this path, meaning the lawsuit, Delta will have to explain to the public, its shareholders, and ultimately a jury, why CrowdStrike took responsibility for its actions swiftly, transparently, constructively, while Delta did not.
Now, there is an interesting adjunct to this that both Microsoft and CrowdStrike had offered assistance to Delta. And in the industry, there’s lots of, armchair quarterbacking about how Silly that is, because what would they really be able to do, right? Like in this particular instance, you need hands and feet going and sitting in front of computers to do the thing, to get the systems back up.
And so what were either CrowdStrike or Microsoft, really going to do? And I think that’s a fair characterization, but It’s not highlighted in this particular letter, but Delta, to both Microsoft and CrowdStrike basically said, no we’re good.
We don’t need your help. And so now both CrowdStrike and Microsoft are throwing that back in their face saying, if it was so horrible, why? Why didn’t you accept our offer of help? But I think that does beg the question, what help could they really have been? And I actually don’t have an answer for that.
Andrew: it’s interesting because these lawyers for CrowdStrike and Microsoft also go into this whole, and this is clearly for public, and posturing
Jerry: Oh yeah.
Andrew: lawsuits, but saying, Hey, what you Delta clearly were different in the way you approach this and your competitors. So what’s up with your organization that makes you different?
When why didn’t you want our help? What were you hiding? And, implying that. dirty laundry would come out on Delta’s side, and they’ve got skeletons in the closet that, Microsoft and CrowdStrike are aware of, that they are brandishing of, hey, if you want to drag us into court. You’re not going to come out of this unscathed, which I’m sure this was going to come up great during negotiations for renewal on CrowdStrike with Delta. I’m sure that sales team is really happy right now.
Jerry: Oh
Yes.
Andrew: this is wild. I also, and we’ve talked about this before, I don’t know what sort of legal leg Delta really has to stand on.
I guess this is what the court system will explore, but yeah it’s an ugly one. And it was pretty aggressive on both Microsoft and CrowdStrike’s lawyers parts to be like, Delta. Yeah, we screwed up, but you screwed up a lot too.
Jerry: So the Microsoft response. And there’s a similar article from the verge titled Microsoft says Delta ignored Satya Nadella’s offer of CrowdStrike help. And Microsoft similarly wrote a letter to Delta’s lawyers, but this one actually shines a light on a, perhaps a question or line of questioning that I hadn’t thought of before, and it does highlight again that Microsoft obviously did reach out and offer help, and they were told that, no, they Delta have it under control.
And what Microsoft and the reporter on the verge seemed to be insinuating is that the problem for Delta was not actually with windows. Obviously the problem started with CrowdStrike. Causing their window systems blue screen. But what they seem to be asserting is that when that happened, it created downstream problems in their legacy infrastructure, which is not based on windows and that then had to go get fixed.
And that is what took the place. All the time. Now that is an interesting point, although I will say it stains a little bit in contrast to some of the images that we saw of contractors standing up on ladders and airports rebooting. Display terminals, I think well after the, like up to a week after the outage, it’s an interesting point, what Microsoft is asserting here is that Delta has chosen not to modernize its it.
And when this incident happened, because of the fragility of their system, they were able to get their Windows systems back up and running relatively fast, as evidenced by their refusal. the problem they had that took all the time was with their old aging IT infrastructure.
Now it’s an interesting thing because legally, It almost doesn’t matter, because even if that’s true, and the court fines in favor of Delta, the fact that Delta didn’t invest doesn’t necessarily absolve either Microsoft or CrowdStrike of anything.
Andrew: Yeah. And Southwest is still running on the Commodore 64 and they were fine.
Jerry: true.
Andrew: counterpoint,
Jerry: That’s very true.
Andrew: it’s almost do you want the court to, Find what is due diligence and what is appropriate level of I. T. modernization. And you’ve heard this point a couple of times. You don’t think this is ever going to
Jerry: No, this is they’re trying this in the court of public opinion. This is never going to go to trial. This is going to get settled out of court. None of these parties, maybe Microsoft, I don’t know. They may be okay, but certainly CrowdStrike and Delta don’t want.
They don’t want, they’re not going to want it to go through the process of discovery and having all of this shit about them in their it programs and development programs laid bare for the public to see, they’re just not going to want that. So I think it’s going to get settled out of court. And right now this is about PR and damage control and everybody’s trying to make sure that their own we don’t have a story about it, but CrowdStrike’s shareholders have filed a lawsuit against CrowdStrike for for loss of shareholder value.
Andrew: Indeed.
It’s interesting. So I have one other topic on the CrowdStrike thing before I move on, if you’ll indulge me.
Jerry: And I think this is the LinkedIn posts, right? And I’ve got, I’m going to put these links in the show notes for you.
Andrew: Okay. So, Alex Stamos, who’s well known and, was at all sorts of interesting organizations recently joined SentinelOne Now SentinelOne is a direct competitor of CrowdStrike, so keep that in mind. And full disclosure. I happen to use Sentinel 1, so I know it better, but I don’t have any, strong opinions one way or the other, but I just want to be transparent about that. Anyway, so about a week ago, Alex put a post out on LinkedIn talking about this outage. So keep in mind it’s coming from a competitor. And he’s basically alleging it is false to say this could have happened to anybody. It’s clear that CrowdStrike made intentional architectural engineering and process decisions that led to this global catastrophe. All right. And he has a bunch of points, but there’s one specifically that I want to drill into here, which is 0.
4 in his post. And this addresses why are people running a kernel level? And why do they have kernel for these tools? And a lot of people have said, Hey, you shouldn’t do that. You don’t need to do that. You don’t do that on Mac. You don’t do that on other systems. Why are you doing in Windows? Point four, quoting, Additionally, there are models for architecting EDR with minimal kernel access, and the team at Sentinel One is, quote, willing to work with Microsoft on exploring these models, assuming Microsoft holds their own products to the same standards, end quote. So as I think this is a very key point to this entire discussion and this entire debate, because what this is saying, and this goes back to some things that Microsoft 2009, the EU through antitrust settlement Microsoft, who was competing at a, with a security tooling. other vendors to give other vendors the exact same access as Microsoft had to the kernel for the efficacy of their security tooling. So what Microsoft is saying, look, we didn’t have a choice. The government, the EU made us give folks access to this kernel. And what I’m reading here is saying, we also have access to the kernel because we have to compete with Microsoft’s tooling. So if Microsoft is willing to not be in the kernel with their security tooling, we’ll do the same. And that is calling attention, I think, to there’s obviously some benefit to running your security tooling at the kernel level if they’re worried about competing with Microsoft who has kernel access and they wouldn’t
Jerry: When I read that, whole thing is interesting because. It’s coming from a competitor of CrowdStrike. And so I come back to my new favorite legal term, which is corporate puffery,
Andrew: indeed.
Jerry: And I see corporate puffery. Obviously there’s some well founded points in here, but.
I will say that the first point and that fourth point stand in contrast to each other. Because on the one hand, they seem to be saying nobody else is doing it that way. CrowdStrike is standing alone in how they do this. And by the way we look forward to working with Microsoft to figure out how we can also not do it.
Andrew: Right?
Jerry: I don’t know. it’s an interesting point. there is associated video where Alex talks a bit more about his thoughts in depth. I think it fundamentally isn’t wrong. if there is an alternative way to do this, that is safer outside of the kernel and we don’t lose visibility we should be doing that.
Andrew: Yeah. speed. Severity of impact of the agent. There’s multiple things there. And I just think that there’s more at play here than just, we chose to do it because it was easier. I think there’s a competition issue here with Microsoft.
Jerry: Yeah. There was one point in here that I wanted to talk about. I picked up on his second point, which is quote, it is dangerous to claim that any security product could have caused this global outage because you’re telling CEOs, CIOs, and boards around the world that it is highly risky to deploy advanced security products.
In the long run, that makes it the world harder to defend and less secure. in any event so I don’t disagree conceptually. Like we shouldn’t be doing things that make our job harder, but we also shouldn’t be sugarcoating things and telling our senior leadership that, there is no risk
Andrew: The other thing I find interesting about this point that is not ever included in these conversations is the cost of the software.
So I’m making up numbers here. Don’t quote me, but I’m in the right area. Let’s say for the average company, average CrowdStrike agent per host. I, again, I’m making up numbers, but just. Work with me, 50 bucks a head, right? So it’s probably more than that, but let’s just say it’s 50 bucks a unit. What if I offered you a thousand bucks a unit, but I promise you it’ll never crash. Is it worth it to you? I’ve got to apply so many more resources and so much more time and so much more energy to do that extra checking and I’m going to develop slower and I’m going to be behind everybody else, but it’s more stable. Is it worth it to you? There has to be that trade off. There has to be that question of speed versus cost versus stability.
Jerry: So I,
Andrew: And we never
Jerry: yeah I think it just becomes unaffordable, right? It’s not a, do I want it? Do I think it would be, is it necessary? at that price point you’re basically. Probably spending two or three times more than your overall budget just on a single tool, not even including, what it would require to run the tool.
So I think it would be awesome to have, aviation, life safety grade stuff, but it’s really not. It’s not practical, but I guess my,
Andrew: that’s my point. It’s do we acknowledge that ever? That, hey, look, there are folks who develop things that run aircraft, that cost, many multiples of standard software development because of life, grade. I used a good term that I’ve blanked on. That isn’t affordable for the average generic system. But is it then fair to assume that they would have the same stability that highly refined, or however you want to say it, system?
Or are we just deluding ourselves?
Jerry: no I strongly suspect most organizations would not say that it’s worth it to them, right? They would trade off some amount of downtime or the chance of some amount of downtime for. To be able to a, not spend that exorbitant amount of money, and also to have generally a reasonable amount of protection in place, which I think is the implicit or the tacit trade off that we’ve made.
The thing that I guess is concerning to me is that this particular bullet seems to be asserting that, we shouldn’t. Be, upfront about the potential downside, right? I think what Alex and again, I didn’t talk to Alex. I don’t know, Alex, but I think what he’s trying to say is we have a hard enough time as it is.
We have a hard enough time allocating money for security programs as it is. And if we go and we tell our senior leadership team that, hey, not only do we have to spend all this money, but by the way, there’s, a non significant chance that it’s going to cause, a global outage for us because, it happens to everybody.
I think that’s what he’s railing against, but on the other side, I think we, at our own detriment, tell our boards that, there’s no risk in running CrowdStrike or SentinelOne or any other program
Andrew: let’s take it back to password managers, right? When somebody has a problem like LastPass did, for instance, We all came out and said, look, don’t throw password managers out.
Jerry: right?
Andrew: Like you’re still better off running a password manager statistically and everything else.
You’re still better off using unique complex passwords on every site. You manage through password manager. It’s still better even with these sorts of circumstances. So I feel that’s a similar conversation. Yeah, there’s a risk, but there’s no zero risk here. You can’t be in business and have zero risk. So got to, I think Alex knows better, frankly, than to try to imply that there’s a zero risk option here. There’s always some trade
Jerry: Yeah I think he’s two, two things. One is he works for a vendor and so he needs to make sure that companies still keep buying XDR software. But I think more broadly, he is hoping that we don’t shoot ourself in the foot by talking our senior leadership team out of allocating money for these important products.
Andrew: fair. And by the way, I don’t know Alex yet. I think I’ve met him once or twice, but I certainly am not trying to, Know what he’s saying. I think there’s a, it’s somewhat disingenuous to say that there’s zero risk in running software like this,
Jerry: All right. So moving on to more exciting things.
Andrew: From security week. The title here is , thousands of devices wiped remotely following mobile guardian hack. So mobile guardian is anMDM solution, mobile device management solution, who focuses on the education sector.
Jerry: they had an incident on August 4th that resulted in some number of their iPads managed through their platform, being deregistered in the process of being deregistered that caused the devices to be wiped. And there’s actually some fairly sad looking pictures of kids in front of stacks of broken iPads, because they’ve been deregistered.
wiped. this particular article doesn’t go into it, but it seems like this company mobile guardian has actually had a run of some security related issues leading up to this one. This one apparently isn’t related to the prior ones, but it does give me.
Concerns about the health of their offering. right now, in fact, they’ve actually got their offering deactivated and any of the systems that are managed by them are currently not functional until they. fix whatever issues they have. I think they took a proactive step of disabling their central management infrastructure.
So that’s rendered the functionality of those devices pretty significantly.
Andrew: Yeah, it’s an interesting attack factor. Somebody gets into your centralized management software and could just wipe all your devices. That’s tough to recover from. Not one I’d thought about before, to be honest, from an MDM
Jerry: Yeah. The it tools have long concerned me where you have central management of your infrastructure. In this particular case, it’s outsourced to a third party company, but we have lots of different flavors, whether that’s like Ansible or salt or Active Directory or many others that serves as a concentration point, like it is a place that an adversary can go in with one, you area cause untold chaos for your organization.
And so it really needs to be well protected. And, in this instance, the country of Singapore was pretty significantly impacted by this. And so they have actually decided to move off of a mobile guardian. Going to guess they won’t be the last one.
Andrew: Yeah, this is an interesting corollary of, is it worth it to have the protection? They’ve decided, nope, it
Jerry: Yeah I think
Andrew: need
Jerry: my guess is they’ll
Move to something else, but
Andrew: I should say not even just the protection, but the manageability. Which yeah, you’re right. They’re going to go to a competitor. I’m sure. But the capabilities offered by the centralized manageability is what can be used against you, which is what you were saying.
Jerry: Yeah,
Andrew: but I don’t know how you’d manage a modern large infrastructure without it. you just got to
Jerry: I don’t think you can, but I also don’t think that many it shops spend enough time thinking about how to protect those orchestration and management platforms. And I also don’t think that we necessarily don’t do enough due diligence on. companies like Mobile Guardian.
And again, I don’t know, this is the first time I’ve ever heard about them. I don’t know anything about them. They could be an absolutely fine company, just having a bad run. I don’t really know, but I think it’s a, to me, a highlight on an issue that I think is going to continue to grow in prominence.
So now on to, the other topic I want to talk about, which is really focused on targeting I. T. people. So the first story is from Bleeping Computer. The title here is Stack Exchange Abused to Spread Malicious PyPy Packages as Answers. And it, again, it’s not a, like a massive attack here, but I thought the technique was super interesting and worth talking about.
There’s a couple of different blockchain platforms, one of them called radium and I forget the name of the other one. Radium doesn’t have any Python modules. And so what the adversary here did was they created some malicious. Repositories in the PiPi repo. And then they created a whole bunch of fake accounts on stack exchange and they started answering questions and they started basically talking up how to use these bogus packages that they.
Uploaded into the PyPy repo and it worked, they got almost 2, 100 downloads.
Andrew: that’s actually really what we’re up to with the show. It’s a long con to start telling people to go install malware once we’ve gotten over 5, 000 episodes out and people trust us. That’s what we’re doing
Jerry: we’ll be there sometime around 2050.
Andrew: but
Jerry: amazing.
Andrew: They’re just, They’re abusing the reputational trust that comes with that platform of valid appropriate answers and Oh yeah, it’s like the easy button to go find an answer and yeah, I trust it. You get some malware, have a nice day.
Jerry: Yeah. So this particular one was, capturing credentials and other sensitive information out of big crypto wallets Telegram and other instant messaging software. And then also information out of browsers. But what I thought was quite interesting was that it was, focused on developers.
It was targeted at developers and As I guess now the former CISO, this causes me a lot of concern, right? That you have a population of developers who are, I would argue what makes an effective developer these days is figuring out how to get efficient answers to your technical problems. you can’t hire developers that know everything, that are completely prescient and omnipotent.
the average IT person is an effective IT person because they know where to go to get the answers to the problems they have. Whether that’s programming, or system administration, or helpdesk, or anything like that. And this worries me. And I think we have to be very cognizant of this trend going forward.
And I think in the near term, we at least have to be incorporating something about this into our security education for developers. But, I think this is probably going to come up more broadly. And that, by the way, dovetails into The other way that it manifests, or has manifested, and I’m sure there’s lots more ways, this one also comes from Bleeping Computer and the title is Ransomware Gain Targets IT Workers with New Sharp Rhino Malware.
And this is more of a watering hole slash
It’s, watering hole, SEO, domain hijacking, where they are purporting to be the angry IP scanner. They’re doing typo squatting or malicious ad buys, trying to divert you. The IT person to a malicious website that they control that looks like the legitimate source of the angry IP scanner for you to download.
And somebody who’s downloading that in the context of an IT or a security person in a company, probably more often than not, has some sort of elevated privileges because they’re probably trying to get that to do some important IT work.
Andrew: Indeed.
Jerry: Between the two, I think we should expect an increase focus on targeting it workers. I think in I think we’ve seen it people being targeted by more sophisticated adversaries for quite some time. What I think is becoming more novel is that this is really becoming a commodity type.
tactic where you’ve got, like lower tier ransomware actors and people trying to steal crypto wallets, actually targeting IT people. So I think we’re almost certainly going to see more and more criminal elements coming in and using these sorts of tactics to try to infiltrate, we know that entry as a service or, that initial entry point into an organization is part of the industrialization of the attack chain where You can, as a ransomware actor or as somebody trying to steal intellectual property, go and buy access.
So you’ve got bad actors who are trying to find ways of Infiltrating companies. And this I think is going to become a much more prominent and effective way in the future, because we, it people hold ourselves in high regard in our ability to not fall for stuff.
Yeah.
Andrew: Believe me, it happens. And the bad guys can keep getting more sophisticated and keep trying different tactics and they start to hit on things that work for a while or, catch certain people at a certain time in a certain set of stressors or certain circumstances that just, it’s, I was curious, going back to the malicious PyPy packages. I do wonder if any static code analysis tools would have picked that up. I don’t know. Maybe I’ll play with that.
Jerry: This is
Andrew: Or that sort of
Jerry: the they were not detected by. Like when the normal anti malware, but I don’t know if they would have been detected by, your aquas and and whatnot.
Andrew: And is it worth running? Having a process of any third party code? That you bring in and scan. I don’t know. I’m just musing out loud here. I honestly don’t know. And I think for a while, wasn’t Google, I think a couple people were trying to do like a safe package repository where, it was vetted and secured.
And I don’t know how well that’s been adopted, but it might be another idea. Don’t just get random stuff, like only go to the safe
Jerry: There’s,
Andrew: But that may slow you
Jerry: yeah.
Andrew: may not be able to answer all the questions you need that way. And, you may not find what you need there.
Jerry: lot, there’s a lot of. focus on supply chain. And I think we’re seeing a lot more focus on supply chain security as it pertains to open source? But my observation is that tends to focus more on Products where we’re, to some extent becoming compelled to care about this, because for example, the U S government has created some, regulatory hoopla where you now have to create S bombs and you have to make attestations about the integrity of your software development practice and whatnot.
So I think. in certain contexts that make sense. But there’s a lot of companies that have a lot of developers that don’t make products for sale or offer services for sale. We have huge numbers of people creating internal applications and whatnot. And I don’t expect, I think it’s a very long time before we see that kind of maturity being applied to purely internal operations.
Andrew: fair. And, these are also expensive sort of things to ask an organization to do it. You’d have to be pretty mature and pretty well staffed and well funded to, to do some of these things too.
Jerry: I definitely agree, but I think we have to be mindful about the ecosystem that we operate our, what I’ll call privileged users in, right? it goes back to that quote, with great power comes great responsibility. If you’re going to have IT people with elevated privileges, they should have to operate within a tighter set of parameters.
And whether that’s, the context of what Microsoft used to call their red forest, I forget what they call it these days. I’m getting old. But I think it’s very difficult to combat all of the different ways these things can manifest. And in particular, a lot of companies aren’t going to have the tolerance, to, stop their employees from leveraging open source that way.
But that doesn’t mean that we get to throw up our hands and say it’s just too hard. I think it means that we need to take a different look at how we construct our environment in a way that is more tolerant to the problems that can come from this sort of thing.
Andrew: Yep. I would agree. It makes me think also somewhat similar, like if they’re a high value target, Similar to how executives are a high value target. It’s only a reason they shouldn’t break the rules or have exceptions to the rules because the consequences of them being hacked in some way are so much higher than the average employee.
Jerry: Oh, a hundred percent. Absolutely. All right. That is the show for today. I appreciate everybody’s attention. Thank you for working through my noisy cat and our technical problems. If you like the show, you can find all of our back episodes on our website at defensive security. org. You can also find us on your favorite podcast platform.
I think we’ve got them all now, if I’m not mistaken, pretty much all.
Andrew: Unless there’s ones out there we don’t know about.
Jerry: Yeah,
Andrew: We’ll
Jerry: absolutely. You can follow me on the Fediverse at Jerry at infosec. exchange. You can follow Andy where?
Andrew: Also on the Fediverse at lurg, L E R G, at infosec. exchange. Also, I’m still hanging around the Twitter slash X world. Also at lurg, L E R G.
Jerry: All right. I look forward to talking to everybody again real soon. I hope you have a great week.
Andrew: Thanks everybody.
Jerry: Take care.
Andrew: bye.
272 επεισόδια
Defensive Security Podcast Episode 275
Defensive Security Podcast - Malware, Hacking, Cyber Security & Infosec
Manage episode 433153146 series 1344233
Links:
- https://www.crowdstrike.com/wp-content/uploads/2024/08/Channel-File-291-Incident-Root-Cause-Analysis-08.06.2024.pdf
- https://www.theregister.com/2024/08/05/crowdstrike_is_not_at_all/
- https://www.theverge.com/2024/8/6/24214371/microsoft-delta-letter-crowdstrike-response-comments
- https://www.linkedin.com/posts/alexstamos_why-crowdstrikes-baffling-bsod-disaster-activity-7224046054076243969-1An8?utm_source=combined_share_message&utm_medium=ios_app
- https://www.linkedin.com/posts/choff_why-crowdstrikes-baffling-bsod-disaster-activity-7224078879445958658-ymuc?utm_source=combined_share_message&utm_medium=member_ios
- https://www.securityweek.com/thousands-of-devices-wiped-remotely-following-mobile-guardian-hack/
- https://www.bleepingcomputer.com/news/security/stackexchange-abused-to-spread-malicious-pypi-packages-as-answers/
- https://www.bleepingcomputer.com/news/security/hunters-international-ransomware-gang-targets-it-workers-with-new-sharprhino-malware/
Transcript:
Jerry: Today is Wednesday, August 7th, 2024. And this is episode 275 of the Defensive Security Podcast. My name is Jerry Bell and joining me tonight as always is Mr. Andrew Kalat.
Andrew: Good evening, Jerry. How are you? Good, sir.
Jerry: I am amazing. It is blistering hot at the beach, but it’s awesome.
Andrew: recording from your southern compound.
Jerry: I am.
Andrew: Nice.
Jerry: Yeah, Bell Estate South.
Andrew: And Debbie was not an issue.
Jerry: Debbie not here. We got probably 45 minutes worth of rain.
Andrew: Yeah, it seems, at this point, in real time, stalled out over South Carolina
Jerry: Yeah, it looks several feet of rain hitting like Savannah and That is nuts. But no, it was not a big issue here. I was pretty worried. I packed up all my Milwaukee batteries with lights and whatnot in preparation for the worst got extra tranquilizer for my dog who hates storms.
But no, it’s been absolutely amazing here.
Andrew: So you took the tranks instead? Is that what I’m hearing?
Jerry: Absolutely. You gotta sleep somehow.
Andrew: That’s fair. I’m glad it was a non event, at least for your little neck of the
Jerry: Yeah, it was Nice you could actually see some of the storm clouds off in the distance. And that was the best way to watch a hurricane is when it’s far away.
Andrew: That’s true. That
A few I’ve been through. Stuck on islands, but
Jerry: Yeah, that’s right. since I’ve been here, I have been in the building for two hurricanes, and the building’s been hit by three tornadoes. And then there was also a unsuccessful base jump.
Andrew: So we’re saying you are cursed. Is that what we’re saying?
Jerry: am the human equivalent to a plastic flamingo.
which attracts tornadoes for those who don’t know. Anyway.
Yeah.
Andrew: after that meteorological update,
Jerry: Yeah. just a reminder that the thoughts and opinions we express on the show are ours and do not represent those of our employers past, present, or future.
Andrew: maybe even our
Jerry: Or our pets. my pet is licking me right now and she says, nope, it’s not her opinion.
Andrew: fair,
Jerry: Okay I would say that this is going to be a CrowdStrike heavy episode.
Andrew: three weeks in a row.
Jerry: Yeah, it continues to get more and more interesting. Obviously the main event itself is largely behind us and now we are in the lawyer up phase of the party.
Andrew: the blamestorming
Jerry: blamestorming has indeed begun. The first topic we have to talk about here is the actual formal full root cause analysis was released yesterday by CrowdStrike and it is a 12 page long document. It has lots of marketing fluff in it.
And only I would say a little bit of substance. I don’t think there’s anything that is remarkably telling or revolutionary in the document, but it does indicate technically what went wrong. And it gives some indications of the, potential improvements for their quality assurance, which I think is where a lot of this went wrong.
So the, I’m not going to go through the details in uber technical specificity, but the net is that this channel file update is for this inter process communication agent, for lack of a better term, I’ll call it. And that agent, expects configuration files that have
20 parameters, but through some unfortunate
bad planningtheir test harness actually was Marking the 21st as a catch all, as an asterisk. It was effectively being marked as not used. And so in this particular update, they actually started using it, and that ended up causing their parser to perform what ultimately ended up being an out of bounds read.
Because that parser wasn’t set up to actually read it. And so when that read attempted to happen in kernel space, it tried to access memory. It wasn’t allowed to access, wasn’t allocated. And that caused the blue screen. And because the same thing happened every time it booted up.
You just had this endless boot loop until that particular file got removed. I think the more substantive issue, and that’s the kind of thing that can happen,
Andrew: So let me restate that to make
The application was expecting. a file that had 21 fields in it, and it got a file with 20.
Jerry: Yes.
Andrew: And where it went to read that 21st, it wasn’t allowed to read, and the way that systems protect themselves to do a kernel panic and shut down if you’re trying to read something you’re not allowed to
Jerry: Yes.
Andrew: If you’re in
Jerry: Windows basically says something is horrifically wrong. This should not happen.
Andrew: If I went by that criteria, I’d shut down every day.
Jerry: And so if that were to happen in user space, the application that performed that read would crash. But when it happens in kernel space, Windows attempts to protect itself and it blue screens.
And so the challenge is that testing harness was built assuming that 21st parameter was always set up as a catch all and so effectively was being ignored.
And I think there were really two issues here. One was they didn’t have a very thorough, their testing harness obviously wasn’t, Properly designed, but then they also did not have staged deployments. Like they, what they have a process where once it goes through that test harness and passes it, it goes out far and wide.
There is no staged, deployment ring concept that you have in, let’s say, Microsoft Windows updates and whatnot. And because of that, it, it blasted out. Everybody implicitly trusted CrowdStrike updates and those got applied to pretty much as, fast as they were delivered and the rest is now history.
Andrew: I think it’s a very complicated series of events that led to this. And I think just reacting to a lot of the zeitgeist in the social media world around this, there’s a lot of angry finger pointing some of which is probably well warranted, but it’s interesting to see how the inner chain came together.
And going back to another area I know a little bit about is like aviation incident investigation and things like when space shuttles explode or fall apart, I’ve read a lot of books on those and that sort of thing. Anyway, it’s interesting how there’s very rarely one root cause is where I’m going with this. Usually series of events, an air chain that led to this situation. If one of these situations have been slightly different, this would have been caught and all the Swiss cheese holes lined up just right this situation to happen, not absolving or apologizing for it. It’s just interesting how complex these situations truly are. Compared to how a lot of people will knee jerk their opinions on things, usually based on their own bias around what they care about.
Jerry: When I was reading it, it reminded me of the show. I think you’ve probably watched it too called engineering disasters. And the history or the learning in each one of those episodes that a sequence of disconnected things all lined up in just the right way. for that disaster to happen.
Andrew: right.
Jerry: And I think that is definitely what happened here.
Andrew: For everybody involved, but there’s a part of me that finds these things fascinating to watch play out.
Jerry: I still think, for me, what is most, troublesome, because this is not unprecedented, right? Obviously the amount of systems that were impacted, is unprecedented, but that’s probably more a function of how interconnected and dependent we are on computers than any point in time.
But what’s interesting is that this sort of thing has happened in the past, right? This has happened with Symantec and McAfee and Microsoft probably about five or six different times and several others that I’m probably missing. But one of the things that, distinguishes this from those is that those were much less impactful because they did stage rollouts.
And so when it happened, it was devastating to the people who were among the first, the canaries that they had the problem. But this is a different thing. I think that the fundamental Coding and architecture errors are hard to foresee.
They’re easy to see in hindsight, right? this is like the signal and the noise thing. The failure is easy to identify after the fact, because it’s obvious. Like you can’t, duh, it’s so obvious that this was going to happen, but it’s only obvious after the fact.
Andrew: Certainly.
Jerry: weren’t looking at it beforehand and saying, Oh, we’re just going to accept the risk. They just, it wasn’t in their mind. And so, that part I find less, obviously it is the thing that caused it, but what I find most problematic is the fact that they hadn’t adopted what I would call the industry standard practice of the tiered rollouts.
Andrew: I’m sure that was an intentional decision. I obviously don’t know for sure. I have no idea about decisions that go on a cross track. I’ve never worked there. However, those in their mind, I would imagine a value in not doing, in
Jerry: sure,
Andrew: So dig, if you will, the picture, they are using these sensors, not just to stop things, but also together
these sensors are reporting back to CrowdStrike all the time about potentially malicious or confirm malicious activity. And then they use that data to shotgun this out. And this is basically a reject expression. They were trying to get out to the world about something they thought was malicious. they’re. Go to market strategy is to stop breaches specifically around ransomware. So their business model is really geared towards a ransomware. That’s in my opinion, they’re, I’m not speaking for them. It’s just my read of it. And they care greatly about stopping ransomware and windows environments. So they know that ransomware spreads quickly and rapidly, and I’m sure. In their mind that the value of getting these updates out as quickly as possible to as many sensors as possible a valuable thing to do. So let’s say they know about a new attack type and they pick it up at one customer and they start staging this and they stage it to 10 percent of the customers and 30 percent of the customers and then 50 percent of the customers.
Meanwhile, you’re at a customer who hasn’t gotten staged yet and you get hit by it. Are you happy with their staged deployment at that point?
No. End. So I’m just playing devil’s advocate a little bit on, it probably was an intentional decision to not do staged rollouts. It’s not like they just were unaware of that concept. I think they felt that there was value in updating as rapidly as they possibly could to as many sensors as they could.
Jerry: Likely, I would say that is definitely the proof is in the pudding, right? That is exactly what they were doing.
Andrew: A, in hindsight, it’s a bad decision, but it might not have been a bad decision at the time.
Inputs as to why they made that call, I guess is what I’m saying.
Jerry: I’m sure there is. And by the way, I’m not advocating for, something like what with Microsoft where the rollout happens over the course of days or weeks, I’m more talking about. Hours that, potentially in my mind, at least is a happy medium where you could conceivably have canaries that go an hour before the rest of the world, right?
So yeah, you miss an hour. And I’m sure that potentially exposes some number of customers and maybe this is such a rare thing that one hour difference is something that their customers feel important, to accept the risk on. I don’t know. it’s a fair question.
Andrew: They are saying in their root cause analysis that one of the things they will do as part of the mitigation is provide customers more control over the deployment of rapid release content updates. So to me that means maybe choose when and where rapid release content updates are deployed is what they’re saying. Like they want to go down that path of, hey, if you want to be as safe as possible from ignoring operational risk, or I should say, minimizing your concern about operational risk, deploy as quick as you can. If you are concerned about operational risk, here’s these tools, right? But the trade off you might be More exposed to malware risk.
Jerry: Correct. And, from a customer perspective, it does give you the opportunity to do your own soak testing. But I think you probably also, again, depending on your risk tolerance and size of your environment and exposure and whatnot, would do well to do your own sort of testing.
Before you push it out far and wide. But again, that takes resources. It takes time. It takes people, it takes labor. And if it drops on,after work, right before a long weekend, are you going to, you’re going to have the appetite to do that?
Andrew: and how consistent is your fleet?
Are you on the same patch level? Are you in the same build? Are you on the same hardware? There’s a whole lot of things that are coming to play. And if you’re only testing on a couple of examples,
Jerry: Yeah.
Andrew: yeah, I don’t disagree. I know I’m being a little contrarian on this. It’s more a matter of, again, people are saying things like, if you only just, and I’m like think that through a little, and you’re going to find out that there’s still holes in your planning and that you may have to accept some risk for the benefit of the tool. And have a way to solve that for you. There’s no perfect answer.
It’s you’ve got to figure out that trade off for your own organization.
Jerry: Yeah, I think that’s fair. The other, again, I don’t know a whole lot about their release pipeline process, but it, I find it fascinating that they clearly don’t as part of their release process, verify that it doesn’t blue screen a system like there, there’s a very different distinction between, this.
channel file update causes CrowdStrike to crash or for it to not detect things and then require a subsequent update. to fix it versus the agent is now creating an unbootable situation. And it seems like there’s a very utilitarian, very specific set of tests. And I think deterministic tests that they could put in line in their release pipeline when you deploy it onto.
X number of different windows systems. Does it crash? And if it doesn’t, then you release it with the assumption that if it’s not a catastrophic problem, any other problem could be fixed with an update.
Andrew: Yeah. Or if it’s not crashing on the largest deployed population set example, any crashes you have will be limited to relative small percentage of systems.
Jerry: All right. So anyway, it’s an interesting read. Um, there’s obviously I think it’s 12 or 13 pages long, a lot more technical details about the actual nature of the crash and whatnot. So I invite you to read it if you’re interested in that sort of thing. Moving on to the next story, which is related, this one I’ll take a step back and say for those of you who aren’t aware, there’s a bit of a war of words between Delta Airlines CrowdStrike and Microsoft.
So Delta CEO, Ed Bastian is very famously now said that this outage has cost Delta Airlines about half a billion us dollars in losses and that he feels compelled on behalf of his customers and shareholders and employees to try to recoup some of those losses. And so he’s signaled that he’s going to be suing.
both CrowdStrike and Microsoft. And we’ve talked about this a little bit in previous shows, but now CrowdStrike and Microsoft have separately responded to those legal threats. I don’t think they’ve actually materialized as filed lawsuits yet, but the first one is from the register and the title is CrowdStrike Unhappy About Delta’s Litigation Threat Claims Airline Refused Free On Site Help.
So this particular story is about a open letter that one of CrowdStrike’s lawyers sent to the legal counsel that Delta retained. And I think that legal counsel is pretty some fairly high profile legal team that was involved in in prior high profile cases. In this, I’ll summarize this letter is basically saying, you, Delta should be, cautious about what you ask for.
And by the way, if you are going to proceed with this, we would expect you to retain a certain set of information that we think will be useful in this litigation. And there was one quote, I think summarizes the whole thing very well. And it says, Should Delta pursue this path, meaning the lawsuit, Delta will have to explain to the public, its shareholders, and ultimately a jury, why CrowdStrike took responsibility for its actions swiftly, transparently, constructively, while Delta did not.
Now, there is an interesting adjunct to this that both Microsoft and CrowdStrike had offered assistance to Delta. And in the industry, there’s lots of, armchair quarterbacking about how Silly that is, because what would they really be able to do, right? Like in this particular instance, you need hands and feet going and sitting in front of computers to do the thing, to get the systems back up.
And so what were either CrowdStrike or Microsoft, really going to do? And I think that’s a fair characterization, but It’s not highlighted in this particular letter, but Delta, to both Microsoft and CrowdStrike basically said, no we’re good.
We don’t need your help. And so now both CrowdStrike and Microsoft are throwing that back in their face saying, if it was so horrible, why? Why didn’t you accept our offer of help? But I think that does beg the question, what help could they really have been? And I actually don’t have an answer for that.
Andrew: it’s interesting because these lawyers for CrowdStrike and Microsoft also go into this whole, and this is clearly for public, and posturing
Jerry: Oh yeah.
Andrew: lawsuits, but saying, Hey, what you Delta clearly were different in the way you approach this and your competitors. So what’s up with your organization that makes you different?
When why didn’t you want our help? What were you hiding? And, implying that. dirty laundry would come out on Delta’s side, and they’ve got skeletons in the closet that, Microsoft and CrowdStrike are aware of, that they are brandishing of, hey, if you want to drag us into court. You’re not going to come out of this unscathed, which I’m sure this was going to come up great during negotiations for renewal on CrowdStrike with Delta. I’m sure that sales team is really happy right now.
Jerry: Oh
Yes.
Andrew: this is wild. I also, and we’ve talked about this before, I don’t know what sort of legal leg Delta really has to stand on.
I guess this is what the court system will explore, but yeah it’s an ugly one. And it was pretty aggressive on both Microsoft and CrowdStrike’s lawyers parts to be like, Delta. Yeah, we screwed up, but you screwed up a lot too.
Jerry: So the Microsoft response. And there’s a similar article from the verge titled Microsoft says Delta ignored Satya Nadella’s offer of CrowdStrike help. And Microsoft similarly wrote a letter to Delta’s lawyers, but this one actually shines a light on a, perhaps a question or line of questioning that I hadn’t thought of before, and it does highlight again that Microsoft obviously did reach out and offer help, and they were told that, no, they Delta have it under control.
And what Microsoft and the reporter on the verge seemed to be insinuating is that the problem for Delta was not actually with windows. Obviously the problem started with CrowdStrike. Causing their window systems blue screen. But what they seem to be asserting is that when that happened, it created downstream problems in their legacy infrastructure, which is not based on windows and that then had to go get fixed.
And that is what took the place. All the time. Now that is an interesting point, although I will say it stains a little bit in contrast to some of the images that we saw of contractors standing up on ladders and airports rebooting. Display terminals, I think well after the, like up to a week after the outage, it’s an interesting point, what Microsoft is asserting here is that Delta has chosen not to modernize its it.
And when this incident happened, because of the fragility of their system, they were able to get their Windows systems back up and running relatively fast, as evidenced by their refusal. the problem they had that took all the time was with their old aging IT infrastructure.
Now it’s an interesting thing because legally, It almost doesn’t matter, because even if that’s true, and the court fines in favor of Delta, the fact that Delta didn’t invest doesn’t necessarily absolve either Microsoft or CrowdStrike of anything.
Andrew: Yeah. And Southwest is still running on the Commodore 64 and they were fine.
Jerry: true.
Andrew: counterpoint,
Jerry: That’s very true.
Andrew: it’s almost do you want the court to, Find what is due diligence and what is appropriate level of I. T. modernization. And you’ve heard this point a couple of times. You don’t think this is ever going to
Jerry: No, this is they’re trying this in the court of public opinion. This is never going to go to trial. This is going to get settled out of court. None of these parties, maybe Microsoft, I don’t know. They may be okay, but certainly CrowdStrike and Delta don’t want.
They don’t want, they’re not going to want it to go through the process of discovery and having all of this shit about them in their it programs and development programs laid bare for the public to see, they’re just not going to want that. So I think it’s going to get settled out of court. And right now this is about PR and damage control and everybody’s trying to make sure that their own we don’t have a story about it, but CrowdStrike’s shareholders have filed a lawsuit against CrowdStrike for for loss of shareholder value.
Andrew: Indeed.
It’s interesting. So I have one other topic on the CrowdStrike thing before I move on, if you’ll indulge me.
Jerry: And I think this is the LinkedIn posts, right? And I’ve got, I’m going to put these links in the show notes for you.
Andrew: Okay. So, Alex Stamos, who’s well known and, was at all sorts of interesting organizations recently joined SentinelOne Now SentinelOne is a direct competitor of CrowdStrike, so keep that in mind. And full disclosure. I happen to use Sentinel 1, so I know it better, but I don’t have any, strong opinions one way or the other, but I just want to be transparent about that. Anyway, so about a week ago, Alex put a post out on LinkedIn talking about this outage. So keep in mind it’s coming from a competitor. And he’s basically alleging it is false to say this could have happened to anybody. It’s clear that CrowdStrike made intentional architectural engineering and process decisions that led to this global catastrophe. All right. And he has a bunch of points, but there’s one specifically that I want to drill into here, which is 0.
4 in his post. And this addresses why are people running a kernel level? And why do they have kernel for these tools? And a lot of people have said, Hey, you shouldn’t do that. You don’t need to do that. You don’t do that on Mac. You don’t do that on other systems. Why are you doing in Windows? Point four, quoting, Additionally, there are models for architecting EDR with minimal kernel access, and the team at Sentinel One is, quote, willing to work with Microsoft on exploring these models, assuming Microsoft holds their own products to the same standards, end quote. So as I think this is a very key point to this entire discussion and this entire debate, because what this is saying, and this goes back to some things that Microsoft 2009, the EU through antitrust settlement Microsoft, who was competing at a, with a security tooling. other vendors to give other vendors the exact same access as Microsoft had to the kernel for the efficacy of their security tooling. So what Microsoft is saying, look, we didn’t have a choice. The government, the EU made us give folks access to this kernel. And what I’m reading here is saying, we also have access to the kernel because we have to compete with Microsoft’s tooling. So if Microsoft is willing to not be in the kernel with their security tooling, we’ll do the same. And that is calling attention, I think, to there’s obviously some benefit to running your security tooling at the kernel level if they’re worried about competing with Microsoft who has kernel access and they wouldn’t
Jerry: When I read that, whole thing is interesting because. It’s coming from a competitor of CrowdStrike. And so I come back to my new favorite legal term, which is corporate puffery,
Andrew: indeed.
Jerry: And I see corporate puffery. Obviously there’s some well founded points in here, but.
I will say that the first point and that fourth point stand in contrast to each other. Because on the one hand, they seem to be saying nobody else is doing it that way. CrowdStrike is standing alone in how they do this. And by the way we look forward to working with Microsoft to figure out how we can also not do it.
Andrew: Right?
Jerry: I don’t know. it’s an interesting point. there is associated video where Alex talks a bit more about his thoughts in depth. I think it fundamentally isn’t wrong. if there is an alternative way to do this, that is safer outside of the kernel and we don’t lose visibility we should be doing that.
Andrew: Yeah. speed. Severity of impact of the agent. There’s multiple things there. And I just think that there’s more at play here than just, we chose to do it because it was easier. I think there’s a competition issue here with Microsoft.
Jerry: Yeah. There was one point in here that I wanted to talk about. I picked up on his second point, which is quote, it is dangerous to claim that any security product could have caused this global outage because you’re telling CEOs, CIOs, and boards around the world that it is highly risky to deploy advanced security products.
In the long run, that makes it the world harder to defend and less secure. in any event so I don’t disagree conceptually. Like we shouldn’t be doing things that make our job harder, but we also shouldn’t be sugarcoating things and telling our senior leadership that, there is no risk
Andrew: The other thing I find interesting about this point that is not ever included in these conversations is the cost of the software.
So I’m making up numbers here. Don’t quote me, but I’m in the right area. Let’s say for the average company, average CrowdStrike agent per host. I, again, I’m making up numbers, but just. Work with me, 50 bucks a head, right? So it’s probably more than that, but let’s just say it’s 50 bucks a unit. What if I offered you a thousand bucks a unit, but I promise you it’ll never crash. Is it worth it to you? I’ve got to apply so many more resources and so much more time and so much more energy to do that extra checking and I’m going to develop slower and I’m going to be behind everybody else, but it’s more stable. Is it worth it to you? There has to be that trade off. There has to be that question of speed versus cost versus stability.
Jerry: So I,
Andrew: And we never
Jerry: yeah I think it just becomes unaffordable, right? It’s not a, do I want it? Do I think it would be, is it necessary? at that price point you’re basically. Probably spending two or three times more than your overall budget just on a single tool, not even including, what it would require to run the tool.
So I think it would be awesome to have, aviation, life safety grade stuff, but it’s really not. It’s not practical, but I guess my,
Andrew: that’s my point. It’s do we acknowledge that ever? That, hey, look, there are folks who develop things that run aircraft, that cost, many multiples of standard software development because of life, grade. I used a good term that I’ve blanked on. That isn’t affordable for the average generic system. But is it then fair to assume that they would have the same stability that highly refined, or however you want to say it, system?
Or are we just deluding ourselves?
Jerry: no I strongly suspect most organizations would not say that it’s worth it to them, right? They would trade off some amount of downtime or the chance of some amount of downtime for. To be able to a, not spend that exorbitant amount of money, and also to have generally a reasonable amount of protection in place, which I think is the implicit or the tacit trade off that we’ve made.
The thing that I guess is concerning to me is that this particular bullet seems to be asserting that, we shouldn’t. Be, upfront about the potential downside, right? I think what Alex and again, I didn’t talk to Alex. I don’t know, Alex, but I think what he’s trying to say is we have a hard enough time as it is.
We have a hard enough time allocating money for security programs as it is. And if we go and we tell our senior leadership team that, hey, not only do we have to spend all this money, but by the way, there’s, a non significant chance that it’s going to cause, a global outage for us because, it happens to everybody.
I think that’s what he’s railing against, but on the other side, I think we, at our own detriment, tell our boards that, there’s no risk in running CrowdStrike or SentinelOne or any other program
Andrew: let’s take it back to password managers, right? When somebody has a problem like LastPass did, for instance, We all came out and said, look, don’t throw password managers out.
Jerry: right?
Andrew: Like you’re still better off running a password manager statistically and everything else.
You’re still better off using unique complex passwords on every site. You manage through password manager. It’s still better even with these sorts of circumstances. So I feel that’s a similar conversation. Yeah, there’s a risk, but there’s no zero risk here. You can’t be in business and have zero risk. So got to, I think Alex knows better, frankly, than to try to imply that there’s a zero risk option here. There’s always some trade
Jerry: Yeah I think he’s two, two things. One is he works for a vendor and so he needs to make sure that companies still keep buying XDR software. But I think more broadly, he is hoping that we don’t shoot ourself in the foot by talking our senior leadership team out of allocating money for these important products.
Andrew: fair. And by the way, I don’t know Alex yet. I think I’ve met him once or twice, but I certainly am not trying to, Know what he’s saying. I think there’s a, it’s somewhat disingenuous to say that there’s zero risk in running software like this,
Jerry: All right. So moving on to more exciting things.
Andrew: From security week. The title here is , thousands of devices wiped remotely following mobile guardian hack. So mobile guardian is anMDM solution, mobile device management solution, who focuses on the education sector.
Jerry: they had an incident on August 4th that resulted in some number of their iPads managed through their platform, being deregistered in the process of being deregistered that caused the devices to be wiped. And there’s actually some fairly sad looking pictures of kids in front of stacks of broken iPads, because they’ve been deregistered.
wiped. this particular article doesn’t go into it, but it seems like this company mobile guardian has actually had a run of some security related issues leading up to this one. This one apparently isn’t related to the prior ones, but it does give me.
Concerns about the health of their offering. right now, in fact, they’ve actually got their offering deactivated and any of the systems that are managed by them are currently not functional until they. fix whatever issues they have. I think they took a proactive step of disabling their central management infrastructure.
So that’s rendered the functionality of those devices pretty significantly.
Andrew: Yeah, it’s an interesting attack factor. Somebody gets into your centralized management software and could just wipe all your devices. That’s tough to recover from. Not one I’d thought about before, to be honest, from an MDM
Jerry: Yeah. The it tools have long concerned me where you have central management of your infrastructure. In this particular case, it’s outsourced to a third party company, but we have lots of different flavors, whether that’s like Ansible or salt or Active Directory or many others that serves as a concentration point, like it is a place that an adversary can go in with one, you area cause untold chaos for your organization.
And so it really needs to be well protected. And, in this instance, the country of Singapore was pretty significantly impacted by this. And so they have actually decided to move off of a mobile guardian. Going to guess they won’t be the last one.
Andrew: Yeah, this is an interesting corollary of, is it worth it to have the protection? They’ve decided, nope, it
Jerry: Yeah I think
Andrew: need
Jerry: my guess is they’ll
Move to something else, but
Andrew: I should say not even just the protection, but the manageability. Which yeah, you’re right. They’re going to go to a competitor. I’m sure. But the capabilities offered by the centralized manageability is what can be used against you, which is what you were saying.
Jerry: Yeah,
Andrew: but I don’t know how you’d manage a modern large infrastructure without it. you just got to
Jerry: I don’t think you can, but I also don’t think that many it shops spend enough time thinking about how to protect those orchestration and management platforms. And I also don’t think that we necessarily don’t do enough due diligence on. companies like Mobile Guardian.
And again, I don’t know, this is the first time I’ve ever heard about them. I don’t know anything about them. They could be an absolutely fine company, just having a bad run. I don’t really know, but I think it’s a, to me, a highlight on an issue that I think is going to continue to grow in prominence.
So now on to, the other topic I want to talk about, which is really focused on targeting I. T. people. So the first story is from Bleeping Computer. The title here is Stack Exchange Abused to Spread Malicious PyPy Packages as Answers. And it, again, it’s not a, like a massive attack here, but I thought the technique was super interesting and worth talking about.
There’s a couple of different blockchain platforms, one of them called radium and I forget the name of the other one. Radium doesn’t have any Python modules. And so what the adversary here did was they created some malicious. Repositories in the PiPi repo. And then they created a whole bunch of fake accounts on stack exchange and they started answering questions and they started basically talking up how to use these bogus packages that they.
Uploaded into the PyPy repo and it worked, they got almost 2, 100 downloads.
Andrew: that’s actually really what we’re up to with the show. It’s a long con to start telling people to go install malware once we’ve gotten over 5, 000 episodes out and people trust us. That’s what we’re doing
Jerry: we’ll be there sometime around 2050.
Andrew: but
Jerry: amazing.
Andrew: They’re just, They’re abusing the reputational trust that comes with that platform of valid appropriate answers and Oh yeah, it’s like the easy button to go find an answer and yeah, I trust it. You get some malware, have a nice day.
Jerry: Yeah. So this particular one was, capturing credentials and other sensitive information out of big crypto wallets Telegram and other instant messaging software. And then also information out of browsers. But what I thought was quite interesting was that it was, focused on developers.
It was targeted at developers and As I guess now the former CISO, this causes me a lot of concern, right? That you have a population of developers who are, I would argue what makes an effective developer these days is figuring out how to get efficient answers to your technical problems. you can’t hire developers that know everything, that are completely prescient and omnipotent.
the average IT person is an effective IT person because they know where to go to get the answers to the problems they have. Whether that’s programming, or system administration, or helpdesk, or anything like that. And this worries me. And I think we have to be very cognizant of this trend going forward.
And I think in the near term, we at least have to be incorporating something about this into our security education for developers. But, I think this is probably going to come up more broadly. And that, by the way, dovetails into The other way that it manifests, or has manifested, and I’m sure there’s lots more ways, this one also comes from Bleeping Computer and the title is Ransomware Gain Targets IT Workers with New Sharp Rhino Malware.
And this is more of a watering hole slash
It’s, watering hole, SEO, domain hijacking, where they are purporting to be the angry IP scanner. They’re doing typo squatting or malicious ad buys, trying to divert you. The IT person to a malicious website that they control that looks like the legitimate source of the angry IP scanner for you to download.
And somebody who’s downloading that in the context of an IT or a security person in a company, probably more often than not, has some sort of elevated privileges because they’re probably trying to get that to do some important IT work.
Andrew: Indeed.
Jerry: Between the two, I think we should expect an increase focus on targeting it workers. I think in I think we’ve seen it people being targeted by more sophisticated adversaries for quite some time. What I think is becoming more novel is that this is really becoming a commodity type.
tactic where you’ve got, like lower tier ransomware actors and people trying to steal crypto wallets, actually targeting IT people. So I think we’re almost certainly going to see more and more criminal elements coming in and using these sorts of tactics to try to infiltrate, we know that entry as a service or, that initial entry point into an organization is part of the industrialization of the attack chain where You can, as a ransomware actor or as somebody trying to steal intellectual property, go and buy access.
So you’ve got bad actors who are trying to find ways of Infiltrating companies. And this I think is going to become a much more prominent and effective way in the future, because we, it people hold ourselves in high regard in our ability to not fall for stuff.
Yeah.
Andrew: Believe me, it happens. And the bad guys can keep getting more sophisticated and keep trying different tactics and they start to hit on things that work for a while or, catch certain people at a certain time in a certain set of stressors or certain circumstances that just, it’s, I was curious, going back to the malicious PyPy packages. I do wonder if any static code analysis tools would have picked that up. I don’t know. Maybe I’ll play with that.
Jerry: This is
Andrew: Or that sort of
Jerry: the they were not detected by. Like when the normal anti malware, but I don’t know if they would have been detected by, your aquas and and whatnot.
Andrew: And is it worth running? Having a process of any third party code? That you bring in and scan. I don’t know. I’m just musing out loud here. I honestly don’t know. And I think for a while, wasn’t Google, I think a couple people were trying to do like a safe package repository where, it was vetted and secured.
And I don’t know how well that’s been adopted, but it might be another idea. Don’t just get random stuff, like only go to the safe
Jerry: There’s,
Andrew: But that may slow you
Jerry: yeah.
Andrew: may not be able to answer all the questions you need that way. And, you may not find what you need there.
Jerry: lot, there’s a lot of. focus on supply chain. And I think we’re seeing a lot more focus on supply chain security as it pertains to open source? But my observation is that tends to focus more on Products where we’re, to some extent becoming compelled to care about this, because for example, the U S government has created some, regulatory hoopla where you now have to create S bombs and you have to make attestations about the integrity of your software development practice and whatnot.
So I think. in certain contexts that make sense. But there’s a lot of companies that have a lot of developers that don’t make products for sale or offer services for sale. We have huge numbers of people creating internal applications and whatnot. And I don’t expect, I think it’s a very long time before we see that kind of maturity being applied to purely internal operations.
Andrew: fair. And, these are also expensive sort of things to ask an organization to do it. You’d have to be pretty mature and pretty well staffed and well funded to, to do some of these things too.
Jerry: I definitely agree, but I think we have to be mindful about the ecosystem that we operate our, what I’ll call privileged users in, right? it goes back to that quote, with great power comes great responsibility. If you’re going to have IT people with elevated privileges, they should have to operate within a tighter set of parameters.
And whether that’s, the context of what Microsoft used to call their red forest, I forget what they call it these days. I’m getting old. But I think it’s very difficult to combat all of the different ways these things can manifest. And in particular, a lot of companies aren’t going to have the tolerance, to, stop their employees from leveraging open source that way.
But that doesn’t mean that we get to throw up our hands and say it’s just too hard. I think it means that we need to take a different look at how we construct our environment in a way that is more tolerant to the problems that can come from this sort of thing.
Andrew: Yep. I would agree. It makes me think also somewhat similar, like if they’re a high value target, Similar to how executives are a high value target. It’s only a reason they shouldn’t break the rules or have exceptions to the rules because the consequences of them being hacked in some way are so much higher than the average employee.
Jerry: Oh, a hundred percent. Absolutely. All right. That is the show for today. I appreciate everybody’s attention. Thank you for working through my noisy cat and our technical problems. If you like the show, you can find all of our back episodes on our website at defensive security. org. You can also find us on your favorite podcast platform.
I think we’ve got them all now, if I’m not mistaken, pretty much all.
Andrew: Unless there’s ones out there we don’t know about.
Jerry: Yeah,
Andrew: We’ll
Jerry: absolutely. You can follow me on the Fediverse at Jerry at infosec. exchange. You can follow Andy where?
Andrew: Also on the Fediverse at lurg, L E R G, at infosec. exchange. Also, I’m still hanging around the Twitter slash X world. Also at lurg, L E R G.
Jerry: All right. I look forward to talking to everybody again real soon. I hope you have a great week.
Andrew: Thanks everybody.
Jerry: Take care.
Andrew: bye.
272 επεισόδια
सभी एपिसोड
×Καλώς ήλθατε στο Player FM!
Το FM Player σαρώνει τον ιστό για podcasts υψηλής ποιότητας για να απολαύσετε αυτή τη στιγμή. Είναι η καλύτερη εφαρμογή podcast και λειτουργεί σε Android, iPhone και στον ιστό. Εγγραφή για συγχρονισμό συνδρομών σε όλες τις συσκευές.