Ceph S3

Table of Contents:

1 Credentials
2 Access
3 Using AWS CLI
4 Using s3cmd
5 S3 API References
6 Rclone for access from outside
7 Setting up s3fs (posix mount)

Accessing Ceph via S3

The CEPH storage cluster can be accessed via S3 protocol. This will use our Ceph storage, which is still free for our users, and is not related to Amazon.

1 Credentials

Request credentials (key and secret) and add them to ~/.aws/credentials. Soon we will have an auto registration service in the portal. If you are familiar with the AWS CLI you can create an additional profile preserving your AWS credentials by adding it to ~/.aws/credentials:

[default]
aws_access_key_id=xxxx
aws_secret_access_key=yyyy

[profile prp]
aws_access_key_id=iiiiii
aws_secret_access_key=jjjjj

If you don’t use AWS then you can just add credentials to [default] and skip the [profile] selection.

2 Access

Note that the inside endpoint is http (without SSL) and the outside endpoint is https (with SSL). You can use the outside endpoint within the kubernetes cluster but it will end up going through a load balancer. By using the inside endpoint it is possible for multiple parallel requests from one or many machines to hit multiple separate OSD’s and therefore achieve very large training set bandwith. See this TFRecord presentation for details.

We recommand to use awscli-plugin-endpoint to write endpoint url in .aws/config, instead of typing endpoint in the CLI repeatedly. You .aws/config file should look like:

[profile prp]

s3api =
    endpoint_url = https://s3.nautilus.optiputer.net

[plugins]
endpoint = awscli_plugin_endpoint
3 Using AWS CLI

Since aws s3 doesn’t support regionless s3 bucket, the user should use aws s3api instead.

  1. Create a bucket:
    aws s3api create-bucket --bucket my-bucket-name --profile prp 
    
  2. List objects in the bucket:
    aws s3api list-buckets --profile prp 
    
  3. Upload a file:
    aws s3api copy-object ~/hello.txt my-bucket-name --profile prp
    
  4. Upload a file and make it publicly accessible:
    aws s3api copy-object ~/hello.txt my-bucket-name --profile prp --acl public-read
    

    You can how access this file via a browser as https://s3.nautilus.optiputer.net/my-bucket/hello.txt

  5. Download a file:
    aws s3api copy-object my-bucket-name/hello.txt hello.txt 
    
  6. Give multiple users full access to the bucket (this does not extend permission to objects in the bucket):
    aws s3api put-bucket-acl --profile prp --bucket braingeneers --grant-full-control id=<user1id>,id=<user2id>
    

    NOTE: These ID’s need to be the name that the PRP sys admin uses when providing you your key and secret. Also note that this operation is not additive so if you first do id=user1 and later do id=user2, user1 will no longer have access. Instead call get-bucket-acl to get the list of id’s and then use them as well as the new id.

  7. Give multiple users full access to all objects in the bucket (replace BUCKETNAME and create file policy.json):

    aws s3api put-bucket-policy --bucket BUCKETNAME --policy policy.json
       
    # Create file policy.json with the following text:
    {
       "Statement": [
          {
             "Effect": "Allow",
             "Principal": "*",
             "Action": [
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:PutObject"
             ],
             "Resource": "arn:aws:s3:::BUCKETNAME/*"
          }
       ]
    }
    

    More detailed policy.json examples at: https://docs.aws.amazon.com/cli/latest/reference/s3api/put-bucket-policy.html

  8. Use awscli_plugin_endpoint
    Note that you can skip typing the endpoint everytime by using awscli_plugin_endpoint installation (assuming your awscli is set up correctly, with profile prp)

    pip install awscli-plugin-endpoint
    aws configure set plugins.endpoint awscli_plugin_endpoint
    aws configure --profile prp set s3.endpoint_url https://s3.nautilus.optiputer.net
    

    If you want, you can set up a function to further simplify typing Add following to .bashrc

    s3prp() {
       args=("$@")
       aws s3 --profile prp "${args[@]}"
    }
    
4 Using s3cmd

S3cmd is an open-source tool for accessing S3.

To configure, create the ~/.s3cfg file with contents if you’re accessing from outside of the cluster:

[default]
access_key = <your_key>
host_base = https://s3.nautilus.optiputer.net
host_bucket = https://s3.nautilus.optiputer.net
secret_key = <your_secret>
use_https = True

or this if accessing from inside:

[default]
access_key = <your_key>
host_base = http://rook-ceph-rgw-nautiluss3.rook
host_bucket = http://rook-ceph-rgw-nautiluss3.rook
secret_key = <your_secret>
use_https = False

Run s3cmd ls to see the available buckets.

5 S3 API References
6 Rclone for access from outside

The AWS CLI has a ‘sync’ command that works just like rsync. For more sophisticated large scale syncing install Rclone.

$ rclone config
Current remotes:

Name                 Type
====                 ====

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> n
name> nautilus_s3
Type of storage to configure.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Alias for a existing remote
   \ "alias"
 2 / Amazon Drive
   \ "amazon cloud drive"
 3 / Amazon S3 Compliant Storage Providers (AWS, Ceph, Dreamhost, IBM COS, Minio)
   \ "s3"
 4 / Backblaze B2
   \ "b2"
 5 / Box
   \ "box"
 6 / Cache a remote
   \ "cache"
 7 / Dropbox
   \ "dropbox"
 8 / Encrypt/Decrypt a remote
   \ "crypt"
 9 / FTP Connection
   \ "ftp"
10 / Google Cloud Storage (this is not Google Drive)
   \ "google cloud storage"
11 / Google Drive
   \ "drive"
12 / Hubic
   \ "hubic"
13 / JottaCloud
   \ "jottacloud"
14 / Local Disk
   \ "local"
15 / Mega
   \ "mega"
16 / Microsoft Azure Blob Storage
   \ "azureblob"
17 / Microsoft OneDrive
   \ "onedrive"
18 / OpenDrive
   \ "opendrive"
19 / Openstack Swift (Rackspace Cloud Files, Memset Memstore, OVH)
   \ "swift"
20 / Pcloud
   \ "pcloud"
21 / QingCloud Object Storage
   \ "qingstor"
22 / SSH/SFTP Connection
   \ "sftp"
23 / Webdav
   \ "webdav"
24 / Yandex Disk
   \ "yandex"
25 / http Connection
   \ "http"
Storage> 3
Choose your S3 provider.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Amazon Web Services (AWS) S3
   \ "AWS"
 2 / Ceph Object Storage
   \ "Ceph"
 3 / Digital Ocean Spaces
   \ "DigitalOcean"
 4 / Dreamhost DreamObjects
   \ "Dreamhost"
 5 / IBM COS S3
   \ "IBMCOS"
 6 / Minio Object Storage
   \ "Minio"
 7 / Wasabi Object Storage
   \ "Wasabi"
 8 / Any other S3 compatible provider
   \ "Other"
provider> 2
Get AWS credentials from runtime (environment variables or EC2/ECS meta data if no env vars).
Only applies if access_key_id and secret_access_key is blank.
Enter a boolean value (true or false). Press Enter for the default ("false").
Choose a number from below, or type in your own value
 1 / Enter AWS credentials in the next step
   \ "false"
 2 / Get AWS credentials from the environment (env vars or IAM)
   \ "true"
env_auth>
AWS Access Key ID.
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
access_key_id> YOUR-ID-HERE
AWS Secret Access Key (password)
Leave blank for anonymous access or runtime credentials.
Enter a string value. Press Enter for the default ("").
secret_access_key> YOUR-KEY-HERE
Region to connect to.
Leave blank if you are using an S3 clone and you don't have a region.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Use this if unsure. Will use v4 signatures and an empty region.
   \ ""
 2 / Use this only if v4 signatures don't work, eg pre Jewel/v10 CEPH.
   \ "other-v2-signature"
region>
Endpoint for S3 API.
Required when using an S3 clone.
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
endpoint> https://s3.nautilus.optiputer.net
Location constraint - must be set to match the Region.
Leave blank if not sure. Used when creating buckets only.
Enter a string value. Press Enter for the default ("").
location_constraint>
Canned ACL used when creating buckets and/or storing objects in S3.
For more info visit https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl
Enter a string value. Press Enter for the default ("").
Choose a number from below, or type in your own value
 1 / Owner gets FULL_CONTROL. No one else has access rights (default).
   \ "private"
 2 / Owner gets FULL_CONTROL. The AllUsers group gets READ access.
   \ "public-read"
   / Owner gets FULL_CONTROL. The AllUsers group gets READ and WRITE access.
 3 | Granting this on a bucket is generally not recommended.
   \ "public-read-write"
 4 / Owner gets FULL_CONTROL. The AuthenticatedUsers group gets READ access.
   \ "authenticated-read"
   / Object owner gets FULL_CONTROL. Bucket owner gets READ access.
 5 | If you specify this canned ACL when creating a bucket, Amazon S3 ignores it.
   \ "bucket-owner-read"
   / Both the object owner and the bucket owner get FULL_CONTROL over the object.
 6 | If you specify this canned ACL when creating a bucket, Amazon S3 ignores it.
   \ "bucket-owner-full-control"
acl>
Edit advanced config? (y/n)
y) Yes
n) No
y/n> n
Remote config
--------------------
[nautilus_s3]
type = s3
provider = Ceph
access_key_id = YOUR-ID-HERE  
secret_access_key = YOUR-KEY-HERE  
endpoint = https://s3.nautilus.optiputer.net
--------------------
y) Yes this is OK
e) Edit this remote
d) Delete this remote
y/e/d> y
Current remotes:

Name                 Type
====                 ====
nautilus_s3          s3

e) Edit existing remote
n) New remote
d) Delete remote
r) Rename remote
c) Copy remote
s) Set configuration password
q) Quit config
e/n/d/r/c/s/q> q

Back to top

7 Setting up s3fs (posix mount)

To mount a S3 bucket to filesystem, use s3fs-fuse.

Example mounting commands are as follows

access from outside the cluster

s3fs bucket /mount/point -o passwd_file=${HOME}/.passwd-s3fs -o url=https://s3.nautilus.optiputer.net -o use_path_request_style -o umask=0007,uid=$UID

access inside the cluster

s3fs bucket /mount/point -o passwd_file=${HOME}/.passwd-s3fs -o url=http://rook-ceph-rgw-nautiluss3.rook -o use_path_request_style -o umask=0007,uid=$UID

Things to Note

(2 and 3 are from the issue here: https://github.com/s3fs-fuse/s3fs-fuse/issues/673)

  1. -o use_path_request_style is required for non-amazon S3 compliant storage.

  2. -o umask=0007 is used to set up the access permission. s3fs defaults to no access for any objects for POSIX compliant.

  3. -o uid=$UID set up the owner of the objects. Default is root.

unmount

sudo umount /mount/point

or for unprivileged user

fusermount -u /mount/point

fstab

Add following line to /etc/fstab

outside the cluster

s3fs#mybucket /path/to/mountpoint fuse _netdev,allow_other,use_path_request_style,url=https://s3.nautilus.optiputer.net,passwd_file=/path/to/passwd-file,umask=0007,uid=1001 0 0

inside the cluster

s3fs#mybucket /path/to/mountpoint fuse _netdev,allow_other,use_path_request_style,url=http://rook-ceph-rgw-nautiluss3.rook,passwd_file=/path/to/passwd-file,umask=0007,uid=1001 0 0

You can find current user id through

id

Back to top