Skip to content

The Nautilus ceph storage cluster can be accessed via S3 protocol. It uses our own storage, which is free for our users and is not related to Amazon or any commercial cloud.

Ceph filesystems data use

Credit: Ceph data usage

S3 ceph grafana dashboard


You should request your credentials (key and secret) in Matrix chat. Go there and let admins know you'd like to access S3, and which pool works best for you.

S3 regions settings

Use the appropriate endpoint URL for your S3 client or library.

Pool Inside endpoint Outside endpoint
West pool (default) http://rook-ceph-rgw-nautiluss3.rook
Central pool http://rook-ceph-rgw-centrals3.rook-central
East pool http://rook-ceph-rgw-easts3.rook-east

Note that the inside endpoint is http (without SSL) and the outside endpoint is https (with SSL). You can use the outside endpoint within the kubernetes cluster but it will end up going through a load balancer. By using the inside endpoint it is possible for multiple parallel requests from one or many machines to hit multiple separate OSD's and therefore achieve very large training set bandwith.

Using Rclone

The easiest way to access S3 is Rclone.

Use these options:

Storage: Amazon S3 Compliant Storage Providers

S3 provider: Ceph Object Storage

AWS Access Key ID, AWS Secret Access Key: ask in Matrix chat

Endpoint: use the regions section

Using s3cmd

S3cmd is an open-source tool for accessing S3.

To configure, create the ~/.s3cfg file with contents if you're accessing from outside of the cluster:

access_key = <your_key>
host_base =
host_bucket =
secret_key = <your_secret>
use_https = True

or this if accessing from inside:

access_key = <your_key>
host_base = http://rook-ceph-rgw-nautiluss3.rook
host_bucket = http://rook-ceph-rgw-nautiluss3.rook
secret_key = <your_secret>
use_https = False

Run s3cmd ls to see the available buckets.

Uploading files

Upload files with the s3cmd put FILE

$ s3cmd put <FILE> s3://<BUCKET>/<DIR>

Or, to upload a file to be public, use the -P for public file:

$ s3cmd put -P <FILE> s3://<BUCKET>/<DIR>
Public URL of the object is:

Using AWS S3 tool


First add your credentials to ~/.aws/credentials.

If you are familiar with the AWS CLI you can create an additional profile preserving your AWS credentials by adding it to ~/.aws/credentials:


[profile prp]

If you don't use AWS then you can just add credentials to [default] and skip the [profile] selection.

We recommend to use awscli-plugin-endpoint to write endpoint url in .aws/config, instead of typing endpoint in the CLI repeatedly. Install the plugin with:

pip install awscli-plugin-endpoint

There are a few steps on the awscli-plugin-endpoint to install this plugin. If you do not wish to add this plugin, add --endpoint-url to all commands below.

Your .aws/config file should look like:

[profile prp]

s3api =
    endpoint_url =

endpoint = awscli_plugin_endpoint


The AWS CLI (command line interface) has two modes of operation for S3, aws s3 are used for basic file manipulations (copy, list, delete, move, etc), and aws s3api for creating/deleting buckets, manipulating permissions, etc.

You can specify the endpoint on the command line (example: aws --endpoint s3 ls s3://bucket-name/path) or via the s3 endpoint plugin (which is sometimes hard to install).

  1. Create a bucket:

    aws s3api create-bucket --bucket BUCKETNAME --profile prp 
  2. List objects in the bucket:

    aws s3api list-buckets --profile prp
    aws s3 ls --profile prp
  3. Upload a file:

    aws s3 cp ~/hello.txt s3://BUCKETNAME/path --profile prp
  4. Upload a file and make it publicly accessible:

    aws s3 cp ~/hello.txt s3://BUCKETNAME/path --profile prp --acl public-read

    You can how access this file via a browser as

  5. Download a file:

    aws s3 cp s3://BUCKETNAME/path/hello.txt hello.txt 

Give multiple users full access to the bucket

  1. Give multiple users full access to the bucket (this does not extend permission to objects in the bucket, follow the next step to allow shared access to the objects in the bucket):

    aws s3api put-bucket-acl --profile prp --bucket BUCKETNAME --grant-full-control "id=<user1id>,id=<user2id>"

    NOTE: These ID's need to be the name that the PRP sys admin uses when providing you your key and secret (the id is not the s3 key or secret, it's the username associated with those keys). Also note that this operation is not additive so if you first do id=user1 and later do id=user2 then user1 will no longer have access. Instead call get-bucket-acl to get the list of id's and apply them all in one call as shown in the example. If you have more users than you can fit on the command line you can use get-bucket-acl to get a full json output, edit it, then use put-bucket-acl with --access-control-policy file://acl.json instead of --grant-full-control seen above.

  2. Give multiple users full access to all objects in the bucket (replace BUCKETNAME and create file policy.json):

     aws s3api put-bucket-policy --bucket BUCKETNAME --policy file://policy.json

    Create policy.json with the following text:

       "Statement": [
             "Effect": "Allow",
             "Principal": "*",
             "Action": "*",
             "Resource": "arn:aws:s3:::BUCKETNAME/*"

    More detailed policy.json examples at:

Using Cyberduck

Cyberduck is a free S3 client for Mac and Windows. It can be used to upload and download files to/from S3 buckets. To use Cyberduck with Ceph S3 endpoints you need to leverage "deprecated" path style requests. The simplest way to do this is to install the appropriate profile into Cyberduck referenced in the Cyberduck profiles documentation, S3 (Deprecated path style requests).cyberduckprofile.

Once you add the profile, you can connect to the S3 endpoint by entering the endpoint hostname in the "Server" field. If you enter it as a URL instead of a hostname, it will likely trigger the selection of a different and undesired connection profile. For example, to connect to the S3 endpoint the for the PRP project's western region, you would enter in the "Server" field. You can then enter your access key and secret key in the "Access Key ID" and "Secret Access Key" fields, respectively.

S3 Cookbook

S3 from tensorflow
with's3://bucket/myfile.mat', 'rb') as f:
   # yield your samples from the f file in your tensorflow dataset as usual
Note that smart_open supports both local and S3 files, so when you're testing this on a local file, it'll work as well as when you run it on the cluster and pass it in a file located on S3. See this [TFRecord][2] presentation for details.
Setting up s3fs (posix mount) ## Setting up s3fs (posix mount) To mount a S3 bucket to filesystem, use [s3fs-fuse][9]. Also see the [FUSE docs](/userdocs/storage/fuse/) ### Example mounting commands are as follows access from outside the cluster
s3fs bucket /mount/point -o passwd_file=${HOME}/.passwd-s3fs -o url= -o use_path_request_style -o umask=0007,uid=$UID
access inside the cluster
s3fs bucket /mount/point -o passwd_file=${HOME}/.passwd-s3fs -o url=http://rook-ceph-rgw-nautiluss3.rook -o use_path_request_style -o umask=0007,uid=$UID
_Things to Note_ (2 and 3 are from the issue here: ) 1. `-o use_path_request_style` is required for non-amazon S3 compliant storage. 2. `-o umask=0007` is used to set up the access permission. s3fs defaults to no access for any objects for POSIX compliant. 3. `-o uid=$UID` set up the owner of the objects. Default is root. **unmount**
sudo umount /mount/point
or for unprivileged user
fusermount -u /mount/point
**fstab** Add following line to `/etc/fstab` outside the cluster
s3fs#mybucket /path/to/mountpoint fuse _netdev,allow_other,use_path_request_style,url=,passwd_file=/path/to/passwd-file,umask=0007,uid=1001 0 0
inside the cluster
s3fs#mybucket /path/to/mountpoint fuse _netdev,allow_other,use_path_request_style,url=http://rook-ceph-rgw-nautiluss3.rook,passwd_file=/path/to/passwd-file,umask=0007,uid=1001 0 0
You can find current user id through
Using S3 in GitLab CIsummary> In GitLab project go to `Settings`->`CI/CD`, open the `Variables` tab, and add the variables holding your S3 credentials: `ACCESS_KEY_ID` and `SECRET_ACCESS_KEY`. Choose `protect variable` and `mask variable`. Your `.gitlab-ci.yml` file can look like:
  image: ubuntu
    - apt-get update && apt-get install -y curl unzip
    - curl | bash
  stage: build
    - rclone config create nautilus-s3 s3 endpoint provider Ceph access_key_id $ACCESS_KEY_ID secret_access_key $SECRET_ACCESS_KEY
    - rclone ls "nautilus-s3:"
Creating a new bucket in S3 - Create a new bucket (change profile to match what is in `~/.aws/credentials`, and endpoint to the appropriate endpoint (Ceph/S3/West is used in this example):
aws --endpoint s3api create-bucket --bucket my-bucket-name --profile prp