In Nautilus cluster we use Ceph distributed storage system orchestrated by Rook. It provides several ways to access your data.
We now have storage located in several geographic zones. Make sure you use the right compute nodes to ensure the optimal speed accessing Ceph!
Ceph posix volumes
Persistent data in kubernetes comes in a form of Persistent Volumes (PV), which can only be seen by cluster admins. To request a PV, you have to create a PersistentVolumeClaim (PVC) of a supported StorageClass in your namespace, which will allocate storage for you.
Currently available storageClasses:
|StorageClass||Filesystem Type||Zone||AccessModes||Restrictions||Storage Type||Size|
|rook-cephfs||CephFS||US West||ReadWriteMany||Spinning drives with NVME meta||2.1 PB|
|rook-cephfs-east||CephFS||US East||ReadWriteMany||Mixed||500 TB|
|rook-cephfs-haosu||CephFS||US West||ReadWriteMany||Hao Su and Ravi cluster||NVME||131 TB|
|rook-cephfs-suncave||CephFS||US West||ReadWriteMany||UCSD Suncave data only||SSD||8 TB|
|rook-ceph-block||(default) RBD||US West||ReadWriteOnce||Spinning drives with NVME meta||2.1 PB|
|rook-cephfs-hawaii||CephFS||Hawaii+Asia (Coming soon)||ReadWriteMany||Spinning drives with NVME meta||192TB|
Ceph shared filesystem (CephFS) is the primary way of storing data in nautilus and allows mounting same volumes from multiple PODs in parallel.
Ceph block storage allows RBD (Rados Block Devices) to be attached to a single pod at a time. Provides fastest access to the data, and is preferred for smaller (below 500GB) datasets, and all datasets not needing shared access from multiple pods.
Creating and mounting the PVC
Use kubectl to create the PVC:
apiVersion: v1 kind: PersistentVolumeClaim metadata: name: examplevol spec: storageClassName: <required storage class> accessModes: - <access mode, f.e. ReadWriteOnce > resources: requests: storage: <volume size, f.e. 20Gi>
After you’ve created a PVC, you can see it’s status (
kubectl get pvc pvc_name). Once it has the Status
Bound, you can attach it to your pod (claimName should match the name you gave your PVC):
apiVersion: v1 kind: Pod metadata: name: vol-pod spec: containers: - name: vol-container image: ubuntu args: ["sleep", "36500000"] volumeMounts: - mountPath: /examplevol name: examplevol restartPolicy: Never volumes: - name: examplevol persistentVolumeClaim: claimName: examplevol
Using the right zone for your pod
Latency is significantly affecting the I/O performance. If you want optimal access speed to Ceph, add the zone affinity to your pod for the correct
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: topology.kubernetes.io/zone operator: In values: - us-west
You can list the nodes zone label using:
kubectl get nodes -L topology.kubernetes.io/zone
Mounting pre-assigned folders (deprecated)
If you have a CephFS FOLDER assigned with a secret CEPH_KEY, to use it you first need to create a secret in your NAMESPACE:
kubectl create secret -n NAMESPACE generic ceph-fs-secret --from-literal=key=CEPH_KEY
Then use the secret in your pod volume (by default the folder name in path corresponds to your user name):
volumes: - name: fs-store flexVolume: driver: ceph.rook.io/rook fsType: ceph options: clusterNamespace: rook fsName: nautilusfs path: /FOLDER mountUser: USER mountSecret: ceph-fs-secret
Also add a volumeMounts section (see above) to mount the volume into your pod.