Main navigation
- About Us
- Facilities
-
Services
- Wireless LAN
- CSLab Computer Accounts
- CSLab VPN (Sonicwall)
- CSLab SSH Gateway
- High Throughput CPU Cluster 1 (HTCC1)
- High Throughput GPU Cluster 1 (HTGC1)
- High Throughput GPU Cluster 2 (HTGC2)
- High Throughput GPU Cluster 3 (HTGC3)
- CSLab Campus-wide Remote Desktop Service
- MacOS Remote Desktop Service
- iMac connect home drive
- Remote Desktop Gateway
- Supports
- Guidelines
High Throughput GPU Cluster 2 (HTGC2)
General Information
The CS High Throughput GPU Cluster (HTGC2) is an HTCondor (version 8.6.8) cluster which focuses on single-precision GPU-related computational applications like tensorflow, keras, torch and matlab. HTCondor is a job submission and queuing system. This system provides process-level parallelization for computationally intensive tasks. All CS staff and students having a valid CSLab UNIX account are eligible to use it.
The Cluster
Currently, the HTGC2 is composed of one job submission node and 16 job execution slots as shown below:
GPU | 16 x RTX2080Ti |
Maximum memory size per job | 128 GB |
O/S Ubuntu | 18.04 |
GPU Memory | 11 GB |
CUDA Runtime Version | 10.0 |
CUDA Driver Version | 10.2 |
CUDA Capability | 7.5 |
CUDA Device Name | GeForce RTX 2080 Ti |
The HTGC2 can be accessed by any Secure-Shell (ssh) Clients connecting to
htgc2.cs.cityu.edu.hk (within CS departmental network)
which is the job submission node and does not equip with a GPU card. This cluster only supports python3 for AI/learning applications like tensorflow and Keras.
To compile and test codes, please log on
htgc2t.cs.cityu.edu.hk
User Data
Besides users' home directories, all nodes in the HTGC2 share the NFS storage with other clusters on path '/public'. Users can make their own folders there if not yet do so. Each user account has a default quota of 200GB disk space in '/public'. There are no backup for files in '/public' and all files not accessed for 30 days will be removed.
Job submission script
To submit jobs to the HTGC2, a submission script (version 8.6.8) is needed. Below is a sample text file called
myjob.condor
which has the following contents:
executable = myproc.sh # normally a shell script requirements = (CUDADeviceName == "GeForce RTX 2080 Ti") # optional parameter error = myproc.err log = myproc.log arguments = arg1 ... # command line arguments for myproc.sh input = arg1.in # optional file for stdin output = arg1.out # optional file for stdout queue # submit a single job executable = myproc2 # submit another job in the same script arguments = $(Process) ... # Process ID as argument input = $(Process).in # optional file depends on Process ID output = $(Process).out queue 4 # submit 4 jobs with Process ID 0..3 . .
where ‘myproc.sh’ is a normal executable shell script which can be run under a normal ssh terminal session.
To submit jobs, simply use the condor_submit command like
# condor_submit myjob.condor
No matter how many jobs are submitted, each user can have at most 5 jobs executed concurrently.
To test a job, please submit the job using the '--batch-name Test' option:
# condor_submit --batch-name Test myjob.condor
Test jobs will be terminated after running for 10 minutes.
Sample condor demo files can be found at /public/condor_demo
Frequently used HTCondor commands
Job submission: /usr/bin/condor_submit Job enquiry: /usr/bin/condor_q Job removal: /usr/bin/condor_rm {Job ID} HTCondor Status: /usr/bin/condor_status
For detailed HTCondor references, please refer to the link https://research.cs.wisc.edu/htcondor/manual/v8.6/Contents.html
For any queries, please send an email to support[at]cs.cityu.edu.hk