Main navigation
- About Us
- Facilities
-
Services
- Wireless LAN
- CSLab Computer Accounts
- CSLab VPN (Sonicwall)
- CSLab SSH Gateway
- High Throughput CPU Cluster 1 (HTCC1)
- High Throughput GPU Cluster 1 (HTGC1)
- High Throughput GPU Cluster 2 (HTGC2)
- High Throughput GPU Cluster 3 (HTGC3)
- CSLab Campus-wide Remote Desktop Service
- MacOS Remote Desktop Service
- iMac connect home drive
- Remote Desktop Gateway
- Supports
- Guidelines
High Throughput GPU Cluster 1 (HTGC1)
General Information
The CS High Throughput GPU Cluster (HTGC) is an HTCondor cluster which focuses on GPU related computational applications like tensorflow, opencv, and matlab. HTCondor is a job submission and queuing system. This system provides process-level parallelization for computationally intensive tasks. All CS staff and students having a valid CSLab UNIX account are eligible to use it.
The Cluster
Currently, the HTGC is composed of one job submission node and 7 job execution slots as shown below:
GPU | 7 x Nvidia V100 |
Maximum memory size per process | 64 GB |
O/S | Ubuntu 20.04 |
GPU Memory | 16 GB |
CUDA Runtime Version | 11.2 |
CUDA Driver Version | 11.6 |
CUDA Capability | 7.0 |
CUDA Device Name | Tesla V100-SXM2-16GB |
The HTGC can be accessed by any Secure-Shell Clients connecting to
htgc1.cs.cityu.edu.hk (within CS departmental network)
To compile and test codes, please log on
htgc1t.cs.cityu.edu.hk
User Data
Besides users' home directories, all nodes in the HTGC mount a shared NFS storage on path '/public'. Users can make their own folder there. Each user account has a default quota of 200GB disk space in '/public'. There is no backup for files in '/public' and all files not accessed for 30 days will be removed.
Job submission script
To submit jobs to the HTGC, a submission script is needed. Below is a simple example, create a file called
sample.condor
and put the follow lines to it
executable = myproc.sh # normally a shell script requirements = (CUDADeviceName == "Tesla V100-SXM2-16GB") # optional parameter error = myproc.err log = myproc.log arguments = arg1 ... # command line arguments input = arg1.in # optional file for stdin output = arg1.out # optional file for stdout queue # submit a single job executable = myproc2 # submit another job in the same script arguments = $(Process) ... # Process ID as argument input = $(Process).in # optional file depends on Process ID output = $(Process).out queue 4 # submit 4 jobs with Process ID 0..3 . .
where ‘myproc.sh’ is a normal shell script which can be run under normal ssh terminal sessions.
To submit jobs, simply use the condor_submit command like
# condor_submit sample.condor
No matter how many jobs are submitted, each user can have at most 5 jobs executed at the same time.
Sample condor demo files can be found at /public/condor_demo
Frequently used HTCondor commands
Job submission: /usr/bin/condor_submit Job enquiry: /usr/bin/condor_q -nobatch Job removal: /usr/bin/condor_rm {Job ID} HTCondor Status: /usr/bin/condor_status
For detailed HTCondor references, please refer to the link http://research.cs.wisc.edu/htcondor/
For any queries, please send an email to support[at]cs.cityu.edu.hk