Department of Computer Science

High Throughput GPU Cluster (HTGC)

General Information

The CS High Throughput GPU Cluster (HTGC) is an HTCondor cluster which focuses on GPU related computational applications like tensorflow, opencv, and matlab. HTCondor is a job submission and queuing system. This system provides process level parallelization for computational intensive tasks. All CS staff and students having a valid CSLab UNIX account are eligible to use it.

The Cluster

Currently, the HTGC is composed of one job submission node and 7 job execution slots as shown below:

GPU 7 x Nvidia V100
Maximum memory size per process 64 GB
O/S Ubuntu 16.04
GPU Memory 16 GB
CUDA Runtime Version 9.0
CUDA Driver Version 9.2
CUDA Capability 7.0
CUDA Device Name Tesla V100-SXM2-16GB

The HTGC can be accessed by any Secure-Shell Clients connecting to (within CS departmental network)

Please do not run jobs on th submission node. Jobs running for longer than an hour will be killed without prior notice

To compile and test codes, please logon

User Data

Besides users' home directories, all nodes in the HTGC mount a 12TB shared NFS storage on path '/public'. Users can make their own folder there. Each user account has a default quota of 200GB disk space in '/public'. There are no backup for files in '/public' and all files not accessed for 30 days will be removed.

Job submission script

To submit jobs to the HTGC, a submission script is needed. Below is a simple example, create a file called


and put the follow lines to it

executable =     # normally a shell script
requirements = (CUDADeviceName == "Tesla V100-SXM2-16GB")  # optional parameter 
error      = myproc.err
log        = myproc.log

arguments  = arg1 ...   # command line arguments
input      =    # optional file for stdin
output     = arg1.out   # optional file for stdout
queue                   # submit a single job

executable = myproc2            # submit another job in the same script
arguments  = $(Process) ...     # Process ID as argument
input      = $(Process).in      # optional file depends on Process ID
output     = $(Process).out
queue 4                         # submit 4 jobs with Process ID 0..3


where ‘’ is a normal shell script which can be run under normal ssh terminal sessions.
To submit jobs, simply use the condor_submit command like

# condor_submit sample.condor

No matter how many jobs are submitted, each user can have at most 5 jobs executed at the same time.

Sample condor demo files can be found at /public/condor_demo

Frequently used HTCondor commands

Job submission:   /usr/bin/condor_submit
Job enquiry:      /usr/bin/condor_q
Job removal:      /usr/bin/condor_rm {Job ID}
HTCondor Status:  /usr/bin/condor_status

For detailed HTCondor references, please refer to the link

For any queries, please send email to support[at]