Skip to content

Nautilus documentation

Batch jobs

Nautilus documentation

Need help?
User Guide
User Guide
- Cluster usage
- Start
  Start
- Tutorials
  Tutorials
  - Docker and containers
  - Basic kubernetes
  - Scaling and exposing
  - Scheduling
  - Batch jobs Batch jobs
    Table of contents
    
    Running batch jobs
    
    The end
  - Images
  - Storage
  - MNIST Training (pres.)
- Running
  Running
  - Beginner start
    Beginner start
    
    GPU pods
    
    Long idle pods
    
    Monitoring
    
    Running batch jobs
    
    Running CPU only jobs
  - Intermediate
    Intermediate
    
    Client scripts
    
    Exposing HTTP
    
    Special use
    
    Faster images download
    
    Globus-connect
    
    Python k8s API
    
    Virtualization
    Virtualization
    
    General
    
    Live ubuntu
    
    Windows
    
    Federation
    
    GUI Desktop
    
    Images we provide
  - Performance
    Performance
    
    High I/O jobs
    
    CPU throttling
- Jupyter
  Jupyter
- AI
  AI
  - LLM as a service
  - LLM in jupyterhub
- Storage
  Storage
  - Storage options
    Storage options
    
    Intro
    
    Ceph FS / RBD
    
    Ceph S3
    
    CVMFS
    
    Local scratch
    
    Linstor
    
    SeaweedFS
    
    Nextcloud
    
    Syncthing
  - Managing
    Managing
    
    Moving data
    
    Purging
- Development
  Development
Admin guide
Admin guide
- Participating
  Participating
  - Networking
  - Joining a server
- Perfsonar
  Perfsonar
- FIONA
  FIONA
  - Builds
  - Install
- NRP
  NRP
  - Topology
- Storage
  Storage
- Vault
  Vault
  - Getting certs
- Links
  Links
  - Hardware
  - Knowledge base
  - Software
  - Vis
- Cluster admin
  Cluster admin
  - Cluster user management
  - Cluster node management
  - JupyterLab
  - Services topology
  - Upgrades
    Upgrades
    
    k8s
    
    calico
    
    gitlab
    
    nextcloud
    
    prometheus
    
    cert-manager
    
    jupyterhub
    
    elasticsearch
    
    coder
    
    Storage
    Storage
    
    rook/ceph
    
    seaweedfs
    
    linstor

Batch jobs

Running batch jobs

Kubernetes has a support for running batch jobs. A Job is a daemon which watches your pod and makes sure it exited with exit status 0. If it did not for any reason, it will be restarted up to backoffLimit number of times.

Since jobs in Nautilus are not limited in runtime, you can only run jobs with meaningful command field. Running in manual mode (sleep infinity command and manual start of computation) is prohibited.

Let's run a simple job and get it's result.

Create a job.yaml file and submit:

apiVersion: batch/v1
kind: Job
metadata:
  name: pi
spec:
  template:
    spec:
      containers:
      - name: pi
        image: perl
        command: ["perl",  "-Mbignum=bpi", "-wle", "print bpi(2000)"]
        resources:
           limits:
             memory: 200Mi
             cpu: 1
           requests:
             memory: 50Mi
             cpu: 50m
      restartPolicy: Never
  backoffLimit: 4

Explore what's running:

kubectl get jobs
kubectl get pods

When job is finished, your pod will stay in Completed state, and Job will have COMPLETIONS field 1/1. For long jobs, the pods can have Error, Evicted, and other states until they finish properly or backoffLimit is exhausted.

Our job did not use any storage and outputed the result to STDOUT, which can be seen as our pod logs:

kubectl logs pi-<hash>

The pod and job will remain for you to come and look at for ttlSecondsAfterFinished=604800 seconds (1 week) by default, and you can adjust this value in your job definition if desired.

You can use the more advanced example when ready.

The end

Please make sure you did not leave any pods and jobs behind.