High Performance Computing (HPC)

The Mailman School of Public Health in collaboration with Center for Computational Biology and Bioinformatics (C2B2) is now able to provide faculty with secure, High Performance Computing (HPC) capabilities for research use. This resource is being funded by the Dean’s Office for one year, so that faculty who may not be familiar with the many uses and benefits of using HPC for data analysis have a chance to evaluate the possibilities before making any investment.

C2B2 maintains several high-performance computing systems, including multiple high-performance compute clusters and high-memory systems. For those of you who are tech-minded, this translates into 6,336 CPU-cores and 73,728 CUDA-cores (GPU) with a maximum performance of 212 TFlops. For the rest, just know that the system is on the 2013 Top500 list of supercomputers worldwide.

Access to the computer cluster is managed by Rebecca Yohannes (IT/Biostats), who serves as “Research Computing Liaison.” In this role Rebecca works closely with individual researchers to answer HPC related questions, resolve issues and provide complete operational support – designing custom solutions that encompasses software installation, program optimization, automation of tasks etc.

In addition, Rebecca will be rolling out HPC trainings, workshops and information sessions to give overview of the HPC capabilities, available resources (computing/software/storage) and examples of how our faculty can leverage these resources in their work. If you are interested in using the cluster, please contact Rebecca.

Capabilities

HPC, the compute cluster at Center for Computational Biology and Bioinformatics (C2B2), is a Linux-based (CentOS 6.5) compute cluster consisting of 528 HP blade systems, 2 large (1TB) memory servers, two head nodes and a virtualized pool of login (submit) nodes controlled by Sun Grid Engine (SGE). Each node has 12 cores (two hex-core processors), and either 32GB (480 nodes) or 96GB (48 nodes) of memory.

The cluster provides 6,336 compute cores and 73,728 CUDA-cores (GPU) with 20 TB of total RAM (memory). Each node has a 10 Gbps Ethernet connection to the cluster network, and each of 32 of nodes are linked with 40 Gbps QDR InfiniBand. 

The HPC clusters are housed in two data centers totaling more than 3,000 sq. ft. of floor space. The facility maintains two high-memory systems with 1 TB of system memory each, and a pool of computational servers for compilation, debugging, and job control with storage areas that can meet varying storage objectives for data integrity, performance, and capacity.

C2B2 has designed a variety of best-practices data storage protocols to ensure that all data remains secure, this includes Columbia University Information Technology (CUIT) and HIPAA compliant security measures as well as regular data snapshots, replication, and offsite backup.

Among the wide range of scientific and computational software available are, the latest GNU and Intel compilers for C and Fortran, Perl interpreters, Java SDKs, Matlab, BLAST, EMBOSS, HMMER, MUMmer, clustalW, PAML, PHYLIP, BioConductor, Phred and Phrap, GeneHunter, Fastlink, Merlin, PDT, TRANSMIT, Pseudomarker, Analyze, Autosacan, GOLD, plus many other utilities and programs.

Contact Us

Elizabeth S. Tashiro
Assistant Dean, Information Technology
Mailman School IT
600 West 168th St., 5th Floor, Suite 510
Tel: 212 342 3021
Email: elizabeth.tashiro@columbia.edu

Rebecca Yohannes
Director, HPC Resources
Mailman School IT 
722 W 168 St., 6th Floor, Rm 613
Tel: 212 342 0487
Email: ry111@cumc.columbia.edu