Sunday, December 22, 2024

‘Error’ in Microsoft’s DDoS defenses amplified Azure outage

Must read

Do you have problems configuring Microsoft’s Defender? You might not be alone: Microsoft admitted that whatever it’s using for its defensive implementation exacerbated yesterday’s Azure instability.

No one has blamed the actual product named “Windows Defender,” we must note.

According to Microsoft, the initial trigger event for yesterday’s outage, which took out great swathes of the web, was a distributed denial-of-service (DDoS) attack. Such attacks are hardly unheard of, and an industry has sprung up around warding them off.

A DDoS attack aims to overwhelm the resources of the targeted system. It usually involves multiple machines infected with malware flooding the victim with network traffic. Admins employ various methods to differentiate real requests from malicious traffic, but according to F5 Labs, there was still an explosive growth in DDoS attacks in 2023.

“Attacks grew so much in fact that, on average, businesses can be expected to deal with a DDoS attack around eleven times a year, almost once a month,” the security vendor said.

Microsoft has published its strategy to defend against network-based DDoS attacks, noting it was unique due to the global footprint of the company. Microsoft said it was able to “utilize strategies and techniques that are unavailable to most other organizations” thanks to that footprint, as well as draw from the collective knowledge of an extensive threat network.

“This intelligence, along with information gathered from online services and Microsoft’s global customer base, continuously improves Microsoft’s DDoS defense system that protects all of Microsoft online services’ assets.”

This is assuming Microsoft actually implemented that strategy correctly.

For yesterday’s event, Microsoft’s DDoS protection mechanisms were indeed triggered correctly. However, the response did not go so well. “Initial investigations suggest that an error in the implementation of our defenses amplified the impact of the attack rather than mitigating it,” the Windows giant admitted last night.

The problem was global and affected a subset of customers attempting to connect to services, including Azure App Services, Application Insights, Azure IoT Central, Azure Log Search Alerts, Azure Policy, the Azure portal itself, and a subset of Microsoft 365 and Microsoft Purview services.

According to Microsoft the incident lasted from approximately 1145 UTC to 1943 UTC, although the company reckoned the majority of the impact was successfully mitigated by 1410 UTC. The problem wasn’t, however, declared over until 2048 UTC.

We contacted Microsoft to learn more about the implementation of its DDoS defenses, but the company has yet to respond. A Preliminary Post Incident Review (PIR) is due in approximately 72 hours, and the company will publish a Final PIR in around two weeks. ®

Latest article