Our cluster combines various hardware resources from multiple universities and other organizations. By default you can only use the production nodes (see the resources page of the portal).
You can only tolerate nodes mentioned on this page. All other tolerations can be used only after explicit permission from admins.
If your project is related to one of these persons:
|Ken Kreutz-Delgado||Tajana Simunic Rosing||Amit K. Roy Chowdhury||Walid Najjar|
|Nikil Dutt||Trevor Darrell||Lise Getoor||Anshul Kundaje|
|Gary Cottrell||Frank Wuerthwein||Hao Su||Dinesh Bharadia|
|YangQuan Chen||Jeff Krichmar||Charless Fowlkes||Padhraic Smyth|
|James Demmel||Yisong Yue||Shawfeng Dong||Rajesh Gupta|
|Todd Hylton||Falko Kuester||Jurgen Schulze||Arun Kumar|
|Ron Dror||Ravi Ramamoorthi||John Sheppard||Nuno Vasconcelos|
|Ramakrishna Akella||Manmohan Chandraker||Baris Aksanli||Dimitris Achlioptas|
|Ilkay Altintas||Brad Smith||Christopher Paolini||Jerry Sheehan|
, which means you’ve specified the person as a PI in the namespace description, it will also be assigned to nodes tainted as Chase-CI (“chaseci”). This will give you more GPU nodes and shorter wait time.
In addition, there are GPU nodes attached to our cluster from two video rooms:
|room||nodes||GPU type||used as general pool|
|UCSD Suncave||34||2 x 1080 or 2 x 1080Ti||no|
|UCMerced Wave||10||2 x 1080||no|
Because these nodes can be used for demos at anytime, any job running on those can be cancelled without any warning. If you’re fine with this, you can use those by adding the corresponding toleration to your job:
spec: tolerations: - key: "nautilus.io/suncave" operator: "Exists" effect: "NoSchedule" - key: "nautilus.io/wave" operator: "Exists" effect: "NoSchedule"
If you want to use a specific node, use the nodeSelector:
spec: nodeSelector: kubernetes.io/hostname: <node_name>
Or use nodeAffinity to make more complex bindings:
spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: nautilus.io/group operator: In values: - haosu - key: env operator: In values: - production
Currently we have other tolerations which are reserving nodes for specific groups or other purposes. You can not use those without prior approve from cluster admins.
While the university is closed for COVID-19, the suncave is not used for demos, and toleration is removed.
Some nodes in the cluster don’t have access to public Internet, and can only access educational network. They still can pull images from Docker Hub using a proxy.
If your workload is not using the public Internet resources, you might tolerate the
nautilus.io/science-dmz and get access to additional nodes.