An ongoing attack campaign targeting reported vulnerabilities in Ray, a computing framework used by OpenAI, Uber, and Amazon, has hacked thousands of servers storing AI workloads and network credentials. I did.
This attack has been active for at least seven months and resulted in the tampering of AI models. Network credentials are also compromised, allowing access to tokens used to access internal networks, databases, and accounts on platforms such as OpenAI, Hugging Face, Stripe, and Azure. In addition to destroying models and stealing credentials, the attackers behind the campaign are installing cryptocurrency miners on the compromised infrastructure, which typically provides large amounts of computing power. . The attacker has also installed Reverse Her Shell, a text-based interface for remote control of the server.
hit the jackpot
“If an attacker gets their hands on a Ray production cluster, it's a jackpot,” researchers at security firm Oligo, which discovered the attack, wrote in a post. “Valuable corporate data and remote code execution make it easy to monetize attacks, all remaining in the shadows and completely undetected (undetectable using static security tools).”
Among the sensitive information compromised are AI production workloads, which allow attackers to control or tamper with models during the training phase and destroy model integrity from there. A vulnerable cluster exposes a central dashboard to the internet. This configuration allows anyone looking for the dashboard to see a history of all commands ever entered. This history allows intruders to quickly learn how the model works and what sensitive data they have access to.
Oligo captured screenshots exposing sensitive personal data and displayed history showing the cluster was actively hacked. Compromised resources included encrypted password hashes and credentials to internal databases and accounts at OpenAI, Stripe, and Slack.
Ray is an open source framework for scaling AI apps. This means you can run a huge number of AI apps at once in an efficient way. These apps typically run on huge clusters of servers. The key to making all this work is a central dashboard that provides an interface to view and control the tasks and apps you're running. One of the programming interfaces available from the Dashboard, the Jobs API, allows users to send a list of commands to the cluster. Commands are issued using simple HTTP requests that do not require authentication.
Last year, researchers at security firm Bishop Fox reported this behavior as a high-severity code execution vulnerability tracked as CVE-2023-48022.
distributed execution framework
“By default, Ray does not enforce authentication,” writes Berenice Flores Garcia, senior security consultant at Bishop Fox. “As a result, an attacker could freely submit jobs, delete existing jobs, obtain sensitive information, or exploit other vulnerabilities described in this advisory. there is.”
Anyscale, the developer and maintainer of Ray, responded by disputing the vulnerability. Anyscale folks have always proposed Ray as a framework for running code remotely, and as a result have long argued that code needs to be properly segmented within a properly secured network. He said he had given advice.
“Due to the nature of Ray as a distributed execution framework, Ray's security perimeter is outside of the Ray cluster,” Anyscale officials wrote. “Therefore, we emphasize the need to prevent access to Ray clusters from untrusted machines (such as the public Internet).”
Anyscale's response stated that the reported Jobs API behavior is not a vulnerability and will not be addressed in an upcoming update. The company also said it plans to eventually introduce changes that will force authentication on the API. It was explained as follows.
We have considered very seriously whether such a thing is a good idea, but we believe that users may place too much trust in the mechanism and provide a false security without adequately protecting user functionality. I didn't implement it until now because I was afraid of doing it. Form clusters in any way they imagine.
That said, we recognize that reasonable thinking may differ on this issue, and as a result, organizations should not rely on isolated controls within Ray, such as authentication. I haven't thought about it yet, but I've decided that it might be worth expediting authentication in certain situations. Since it is a defense-in-depth strategy, we plan to implement this as a new feature in a future release.
Critics of Anyscale's response point out that the repository, meant to streamline deployment of Ray in cloud environments, binds the dashboard to 0.0.0.0. This address is used to specify all network interfaces and specify port forwarding with the same address. One such beginner's boilerplate is available on the Anyscale website itself. Another example of a publicly exposed vulnerable setup can be found here.
Critics also note that Anyscale's assertion that the reported behavior is not a vulnerability has prevented many security tools from reporting the attack.
An Anyscale representative said in an email that the company plans to publish a script that will allow users to easily check whether their Ray instances are exposed to the internet.
The ongoing attacks highlight the importance of properly configuring Ray. In the links above, Oligo and Anyscale list essential practices for locking down clusters. Oligo also provided a list of metrics that Ray users can use to determine whether an instance has been compromised.