Scheduling

Explore the system

Let’s start with looking what’s available in the the system. You have already had the chance to see the list of all the nodes:

kubectl get nodes

Let’s now dig a little deeper. For example, you can see which nodes have which GPU type:

kubectl get nodes -L gpu-type

Now, pick one node, and see what other resources it has:

kubectl get nodes -o yaml nodename

If you picked a node with a GPU, look for the “gpu-type” in the output. Did it match?

Now pick another attribute from the output, and use it instead of gpu-type in the above all-node query. Did it return what you expected?

Validating requirements

In the next section we will play with adding requirements to Pod yamls. But first let’s make sure those resources are actually available.

As a simple example, let’s pick a specific GPU type:

kubectl get node -l 'gpu-type=1080'

Did you get any hits?

Here we look for nodes with 100Gbps NICs:

kubectl get node -l 'nautilus.io/network=100000'

Did you get any hits?

How about a negative selector? And let’s see what do we get:

kubectl get node -l 'gpu-type!=1080, gpu-type!=1070' -L gpu-type

Requirements in pods

You have already used requirements in the pods. Here is the very first pod we made you launch:

apiVersion: v1
kind: Pod
metadata:
  name: test-pod
spec:
  containers:
  - name: mypod
    image: centos:centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
      requests:
        memory: 100Mi
        cpu: 100m
    command: ["sh", "-c", "sleep infinity"]

But we set them to be really low, so it was virtually guaranteed that the Pod would start.

Now, let’s add one more requirement. Let’s ask for a GPU. We also change the container, so that we get the proper drivers in place. Note: While you can ask for a fraction of a CPU, you cannot ask for a fraction of a GPU in our current setup. You should also keep the same number for requirements and limits.

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  containers:
  - name: mypod
    image: nvidia/cuda:10.1-runtime-centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Once the pod started, login using kubectl exec and check what kind of GPU you got:

nvidia-smi

Let’s now ask for a specific GPU type (remember to destroy the old pod):

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  nodeSelector:
    gpu-type: "1080Ti"
  containers:
  - name: mypod
    image: nvidia/cuda:10.1-runtime-centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Did the pod start?

Log into the Pod and check if you indeed got the desired GPU type.

Preferences in pods

Sometimes you would prefer something, but can live with less. In this example, let’s ask for the fastest GPU in out pool, but not as a hard requirement:

apiVersion: v1
kind: Pod
metadata:
  name: test-gpupod
spec:
  affinity:
    nodeAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: gpu-type
            operator: In
            values:
            - "2080Ti"
            - "V100"
  containers:
  - name: mypod
    image: nvidia/cuda:10.0-runtime-centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Now check you Pod. How long did it take for it to start? What GPU type did you get.

Using tolerations

There are several parts of the PRP Kubernetes pool that are off-limits to regular users. One of them is Science-DMZ nodes that don’t have access to regular internet.

Here is a Pod yaml that will try to run on one:

apiVersion: v1
kind: Pod
metadata:
  name: test-suncavepod
spec:
  nodeSelector:
    kubernetes.io/hostname: clu-fiona2.ucmerced.edu
  containers:
  - name: mypod
    image: nvidia/cuda:10.1-runtime-centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Go ahead, request a pod like that. Did the pod start?

It should not have. There are essentially no nodes that will match the above requirements. Check for yourself:

kubectl get events --sort-by=.metadata.creationTimestamp

Now, look up the list of nodes that were supposed to match:

kubectl get nodes -l 'kubernetes.io/hostname=clu-fiona2.ucmerced.edu'

Look for the node details. kubectl get nodes -l 'kubernetes.io/hostname=clu-fiona2.ucmerced.edu' -o yaml. Search for any taints. You should see something along the lines of:

...
spec:
  taints:
  - effect: NoSchedule
    key: nautilus.io/science-dmz
    value: "true"
...

You have been granted permission to run on those nodes, so let’s now add the toleration that will allow you to run there (remember to remove the old Pod):

apiVersion: v1
kind: Pod
metadata:
  name: test-scidmzpod
spec:
  nodeSelector:
    kubernetes.io/hostname: clu-fiona2.ucmerced.edu
  tolerations:
  - effect: NoSchedule
    key: nautilus.io/science-dmz
    operator: Exists
  containers:
  - name: mypod
    image: nvidia/cuda:10.1-runtime-centos7
    resources:
      limits:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
      requests:
        memory: 100Mi
        cpu: 100m
        nvidia.com/gpu: 1
    command: ["sh", "-c", "sleep infinity"]

Try to submit this one.

Did the Pod start?

The end

Please make sure you did not leave any pods behind.