Microsoft has taken immediate action to address a significant security incident that led to the exposure of a staggering 38 terabytes of private data. The breach was identified within the company’s AI GitHub repository and is believed to have occurred inadvertently during the publication of open-source training data, according to Wiz, a cybersecurity research team.
This breach included a backup from the workstations of two former employees, containing sensitive information like secrets, keys, passwords, and over 30,000 internal Teams messages.
The repository, named “robust-models-transfer,” has been made inaccessible. Before its takedown, it housed source code and machine learning models related to a 2020 research paper titled “Do Adversarially Robust ImageNet Models Transfer Better?”
Wiz’s report revealed that the breach resulted from an overly permissive Shared Access Signature (SAS) token, an Azure feature that facilitates data sharing in a challenging-to-track and revoke manner. Specifically, the repository’s README.md file inadvertently allowed developers to download models from an Azure Storage URL that also granted access to the entire storage account, exposing additional private data.
To address this issue, Microsoft promptly revoked the SAS token and blocked external access to the storage account. The company’s investigation found no unauthorized exposure of customer data and confirmed that no other internal services were compromised.
The company also identified a bug in its scanning system that led to the false flagging of the specific SAS URL in the repository. To enhance future security measures, Microsoft has expanded its secret scanning service to include SAS tokens with overly permissive settings.