Main navigation
- About Us
- Facilities
-
Services
- Wireless LAN
- CSLab Computer Accounts
- CSLab VPN (Sonicwall)
- CSLab SSH Gateway
- High Throughput CPU Cluster 1 (HTCC1)
- High Throughput GPU Cluster 1 (HTGC1)
- High Throughput GPU Cluster 2 (HTGC2)
- High Throughput GPU Cluster 3 (HTGC3)
- CSLab Campus-wide Remote Desktop Service
- MacOS Remote Desktop Service
- iMac connect home drive
- Remote Desktop Gateway
- Supports
- Guidelines
High Throughput GPU Cluster 3 (HTGC3)
General Information
The CS High Throughput GPU Cluster 3 (HTGC3) is an HTCondor cluster which focuses on GPU-related computational applications like tensorflow, opencv, and matlab. HTCondor is a job submission and queuing system. This system provides process-level parallelization for computationally intensive tasks. All CS staff and students having a valid CSLab UNIX account are eligible to use it.
The Cluster
Currently, the HTGC3 is composed of one job submission node and 10 job execution nodes as shown below:
GPU | 2+2+4 Nvidia V100 | 7 x Nvidia V100S |
Maximum memory size per process | 256GB or 512GB | 128GB |
O/S | Ubuntu 18.04 | Ubuntu 18.04 |
GPU Memory | 32 GB | 32 GB |
CUDA Runtime Version | 10.1 | 10.1 |
CUDA Driver Version | 11.0 | 11.0 |
CUDA Capability | 7.0 | 7.0 |
CUDA Device Name | Tesla V100-SXM2-32GB | Tesla V100S-PCIE-32GB |
Nodes | two dual-GPU nodes one quad-GPU node |
7 single-GPU nodes |
The HTGC3 can be accessed by any Secure-Shell Clients connecting to
htgc3.cs.cityu.edu.hk (within CS departmental network)
Please do not run jobs on the submission node. Jobs running for longer than an hour will be killed without prior notice
To compile and test codes, please log on
htgc3t.cs.cityu.edu.hk
User Data
Besides users' home directories, all nodes in the HTGC3 mount a shared NFS storage on path '/public' which is shared with other clusters and the gateway server. Users can make their own folder there. Each user account has a default quota of 200GB disk space in '/public'. There is no backup for files in '/public' and all files not accessed for 30 days will be removed.
Job submission script
To submit jobs to the HTGC3, a submission script is needed. Below is a simple example, create a file called
sample.condor
and put the follow lines to it
executable = myproc.sh # normally a shell script requirements = (CUDADeviceName == "Tesla V100-SXM2-32GB") # optional parameter request_GPUs = 2 # optional parameter to request upto 4 GPUs for a job error = myproc.err log = myproc.log arguments = arg1 ... # command line arguments input = arg1.in # optional file for stdin output = arg1.out # optional file for stdout queue # submit a single job executable = myproc2 # submit another job in the same script arguments = $(Process) ... # Process ID as argument input = $(Process).in # optional file depends on Process ID output = $(Process).out queue 4 # submit 4 jobs with Process ID 0..3 . .
where ‘myproc.sh’ is a normal shell script which can be run under normal ssh terminal sessions.
To submit jobs, simply use the condor_submit command like
# condor_submit sample.condor
No matter how many jobs are submitted, each user can have at most 6 jobs executed at the same time.
Jobs do not specify "request_GPUs" will be run on single-GPU and dual-GPU nodes. Jobs run in dual-GPU nodes will have a double resource usage count. To restrict jobs to run on single-GPU nodes, please specify "request_GPUs=1" in the condor submit file.
Sample condor demo files can be found at /public/condor_demo
Frequently used HTCondor commands
Job submission: /usr/bin/condor_submit Job enquiry: /usr/bin/condor_q Job removal: /usr/bin/condor_rm {Job ID} HTCondor Status: /usr/bin/condor_status
For detailed HTCondor references, please refer to the link http://research.cs.wisc.edu/htcondor/
For any queries, please send an email to support[at]cs.cityu.edu.hk