Special use

Our cluster combines various hardware resources from multiple universities and other organizations. By default you can only use the production nodes (see the resources page of the portal).

You can only tolerate nodes mentioned on this page. All other tolerations can be used only after explicit permission from admins.

If your project is related to one of these persons:

Ken Kreutz-Delgado Tajana Simunic Rosing Amit K. Roy Chowdhury Walid Najjar
Nikil Dutt Trevor Darrell Lise Getoor Anshul Kundaje
Gary Cottrell Frank Wuerthwein Hao Su Dinesh Bharadia
YangQuan Chen Jeff Krichmar Charless Fowlkes Padhraic Smyth
James Demmel Yisong Yue Shawfeng Dong Rajesh Gupta
Todd Hylton Falko Kuester Jurgen Schulze Arun Kumar
Ron Dror Ravi Ramamoorthi John Sheppard Nuno Vasconcelos
Ramakrishna Akella Manmohan Chandraker Baris Aksanli Dimitris Achlioptas
Ilkay Altintas Brad Smith Christopher Paolini Jerry Sheehan

, which means you’ve specified the person as a PI in the namespace description, it will also be assigned to nodes tainted as Chase-CI (“chaseci”). This will give you more GPU nodes and shorter wait time.

In addition, there are GPU nodes attached to our cluster from two video rooms:

room nodes GPU type used as general pool
UCSD Suncave 34 2 x 1080 or 2 x 1080Ti no
UCMerced Wave 10 2 x 1080 no

Because these nodes can be used for demos at anytime, any job running on those can be cancelled without any warning. If you’re fine with this, you can use those by adding the corresponding toleration to your job:

spec:
  tolerations:
  - key: "nautilus.io/suncave"
    operator: "Exists"
    effect: "NoSchedule"
  - key: "nautilus.io/wave"
    operator: "Exists"
    effect: "NoSchedule"

If you want to use a specific node, use the nodeSelector:

spec:
  nodeSelector:
    kubernetes.io/hostname: <node_name>

Or use nodeAffinity to make more complex bindings:

spec:
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: nautilus.io/group
            operator: In
            values:
            - haosu
          - key: env
            operator: In
            values:
            - production

Currently we have other tolerations which are reserving nodes for specific groups or other purposes. You can not use those without prior approve from cluster admins.

While the university is closed for COVID-19, the suncave is not used for demos, and toleration is removed.

Other tolerations

Some nodes in the cluster don’t have access to public Internet, and can only access educational network. They still can pull images from Docker Hub using a proxy.

If your workload is not using the public Internet resources, you might tolerate the nautilus.io/science-dmz and get access to additional nodes.