The CS High Throughput Computing Cluster (HTCC) is a local implementation of the HTCondor job submission and queuing system. This system provides process level parallelization for computational intensive tasks. Thousands of computing jobs can be submitted in a single batch command in HTCondor. All CS staff and students having a valid CSLab UNIX account are eligible to use it.
CSLab provides the follow administrative and support services for the HTCC:
- System administration and performance monitoring
- Software installation and maintenance
- User account management
- Job queue management
- Allocation of system resources such as disk quota, CPU shares and etc.
Currently, the HTCC is composed of one job preparation/submission node and 6 job execution nodes. Each node has
|CPU||Two 6-core Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz (24 Logical Processors)|
|NIC||Dual 10GE Ethernet|
|GPU||One Nvidia Tesla K20|
|GPU Memory||4.69 GB|
|GPU CUDA Version||5.5|
|GPU Driver Version||319.37|
The HTCC can be accessed by any Secure-Shell Clients connecting to
- htcc1.cs.cityu.edu.hk (within CS departmental network)
- htcc1g.cs.cityu.edu.hk (from elsewhere)
htcc1.cs.cityu.edu.hk is an alias for host c8k13 which is the HTCondor submission node. The job execution nodes are c8k14, c8k15, ..., c8k19. All regular Linux programs/scripts can run on them.
Job submission procedure has been listed in the logon message of c8k13.
All processes running on the cluster are limited to 32GB memory size. Processes exceed this limit will be terminated automatically.
Besides users' home directories, all nodes in the HTCC mount a 4TB shared NFS storage on path '/public'. Users can make their own folder there. Each user can use upto 100GB disk space in '/public'. '/public' is hosted on a high speed storage system and users are recommended to use it for data involved in computation. However, data and program results should NOT be kept in '/public' for archiving. There are no backup for files in '/public' and all files not accessed for 30 days will be removed.
Research teams can 'Bring Your Own Data Storage' to the HTCC. For projects involving multi-terabyte of data, PI's can purchase their own iSCSI or NFS storage. The HTCC can then mount the data for processing.
Owing to the limited number of concurrent Matlab license, Matlab codes MUST be compiled before submitting to the HTCC. Compiled Matlab code does not occupy Matlab license while running.
Interactive Matlab sessions should ONLY be run on the submission node (c8k13) for coding, testing and compilation. Interactive Matlab sessions running on the job execution nodes will be terminated without prior notice. Also, each user is limited to run at most 3 concurrent interactive matlab sessions on the HTCC.
Listed below are the compilation procedure of a sample Matlab M-script (demo1.m) and M-function (speedup.m).
This script (demo1.m) calls a function speedup(n) which computes the speedup ratio of vector multiplication with and without the use of a GPU.
01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17
# cat speedup.m function s = speedup(n) a = rand(n); b = rand(n); tic; c = a*b; time1 = toc; ga = gpuArray(a); gb = gpuArray(b); tic; gc = ga*gb; time2 = toc; s = time1/time2;
01 02 03 04 05 06
# cat demo1.m for i=1000:2000:9000 disp(sprintf('GPU speedup for array multiplicaton of array size %d is %.0f', i, speedup(i))); end
Use /usr/local/bin/mcc to compile the M-script/M-function
/usr/local/bin/mcc -v -m demo1.m speedup.m
Execute the demo by
where '$MCRROOT' is an environment variable and is currently set to '/usr/local/matlab'.
The output shows the speedup ratio of different array size.
01 02 03 04 05 06 07 08 09 10 11 12 13
------------------------------------------ Setting up environment variables --- LD_LIBRARY_PATH is .:/usr/local/matlab/R2013a/runtime/glnxa64:... Warning: Unable to open display ':0.0'. You will not be able to display graphics on the screen. GPU speedup for array multiplicaton of array size 1000 is 236 GPU speedup for array multiplicaton of array size 3000 is 1644 GPU speedup for array multiplicaton of array size 5000 is 5190 GPU speedup for array multiplicaton of array size 7000 is 11491 GPU speedup for array multiplicaton of array size 9000 is 23064
This vector multiplication test shows that significant speedup could be achieved using GPU for large arrays.