Department of Computer Science

High Throughput GPU Cluster 3 (HTGC3)

General Information

The CS High Throughput GPU Cluster 3 (HTGC3) is an HTCondor cluster which focuses on GPU related computational applications like tensorflow, opencv, and matlab. HTCondor is a job submission and queuing system. This system provides process level parallelization for computational intensive tasks. All CS staff and students having a valid CSLab UNIX account are eligible to use it.

The Cluster

Currently, the HTGC3 is composed of one job submission node and 10 job execution nodes as shown below:

GPU 2+2+4  Nvidia V100 7 x Nvidia A100
Maximum memory size per process 256GB or 512GB 128GB
O/S Ubuntu 18.04 Ubuntu 18.04
GPU Memory 32 GB 32 GB
CUDA Runtime Version 10.1 10.1
CUDA Driver Version 11.0 11.0
CUDA Capability 7.0 7.0
CUDA Device Name Tesla V100-SXM2-32GB Tesla V100S-PCIE-32GB

two dual-GPU nodes

one quad-GPU node

7 single-GPU nodes

The HTGC3 can be accessed by any Secure-Shell Clients connecting to (within CS departmental network)

Please do not run jobs on the submission node. Jobs running for longer than an hour will be killed without prior notice

To compile and test codes, please logon

User Data

Besides users' home directories, all nodes in the HTGC3 mount a 22TB shared NFS storage on path '/public' which is shared with other clusters and the gateway server. Users can make their own folder there. Each user account has a default quota of 200GB disk space in '/public'. There are no backup for files in '/public' and all files not accessed for 30 days will be removed.

Job submission script

To submit jobs to the HTGC3, a submission script is needed. Below is a simple example, create a file called


and put the follow lines to it

executable =     # normally a shell script
requirements = (CUDADeviceName == "Tesla V100-SXM2-32GB")  # optional parameter 
request_GPUs = 2           # optional parameter to request upto 4 GPUs for a job

error      = myproc.err
log        = myproc.log

arguments  = arg1 ...   # command line arguments
input      =    # optional file for stdin
output     = arg1.out   # optional file for stdout
queue                   # submit a single job

executable = myproc2            # submit another job in the same script
arguments  = $(Process) ...     # Process ID as argument
input      = $(Process).in      # optional file depends on Process ID
output     = $(Process).out
queue 4                         # submit 4 jobs with Process ID 0..3


where ‘’ is a normal shell script which can be run under normal ssh terminal sessions.
To submit jobs, simply use the condor_submit command like

# condor_submit sample.condor

No matter how many jobs are submitted, each user can have at most 6 jobs executed at the same time.

Jobs do not specify "request_GPUs" will be run on single-GPU and dual-GPU nodes. Jobs run in dual-GPU nodes will have double resources usage count. To restrict jobs to run on single-GPU nodes, please specify "request_GPUs=1" in condor submit file.

Sample condor demo files can be found at /public/condor_demo

Frequently used HTCondor commands

Job submission:   /usr/bin/condor_submit
Job enquiry:      /usr/bin/condor_q
Job removal:      /usr/bin/condor_rm {Job ID}
HTCondor Status:  /usr/bin/condor_status

For detailed HTCondor references, please refer to the link

For any queries, please send email to support[at]