FAQ

Access

How do I get access to the cluster?

To obtain access to the cluster, mailman faculty members can send a request to ry111@cumc.columbia.edu with the following information:

  • Full name
  • UNI
  • Name of Department
  • Project Description 
  • Project Name and 
  • Required Sofware Packages

How do I login?

To log in to the cluster, you will need to SSH (Secure Shell) to login.c2b2.columbia.edu with your provided user/password.You will then be automatically placed on one of the login nodes where you will be able to submit jobs, monitor tasks and start interactive sessions. Windows users can use PuTTY or Cygwin. MacOS users can use the built-in Terminal application.

Note: As the login nodes are intended as a gateway resource to storage and computing systems, no computation should take place on the login nodes.

I forgot my password, how do I reset it?

You can send an email to ry111@cumc.columbia.edu with your uni and we will reset your password for you.

Usage

How do I find and run applications?

Please check this link for information on how to run programs along with sample scripts. 

How do I submit by jobs?

Qsub is used to submit a job to SGE. The qsub command has the following syntax:

  • qsub [ options ] [ scriptfile | -- [ script args ]]

The following basic options may be used to submit the job.

  • -A [account name] -- Specify the account under which to run the job
  • -N [name] -- The name of the job
  • -l h rt=hr:min:sec -- Maximum walltime for this job
  • -r [y,n] -- Should this job be re-runnable (default y)
  • -pe [type] [num] -- Request [num] amount of [type] nodes.
  • -cwd -- Place the output files in the current working directory. The default is to place them in the users home directory.
  • -S [shell path] -- Specify shell to use when running the job script

How do I check the status of my submitted job?

Now that our job has been submitted, you can look at the job’s status in the queue using the command qstat

-bash-3.2$ qstat job-ID prior name  user  state submit/start at 
queue slots ja-task-ID 

3017227  0.00000  hostname  sge_user  qw   05/30/2013  15:49:13  

From this output, we can see that the job is in the qw state which stands for queued and waiting. After a few seconds, the job will transition into a r, or running, state at which point the job will begin executing:

bash-3.2$ qstat job-ID prior   name  user   state   submit/start at 
queue slots ja-task-ID

3017227 0.00000 hostname  sge_user  r  05/30/2013 15:49:25 

Once the job has finished, the job will be removed from the queue and will no longer appear in the output of qstat

How do I delete a job I submitted?

You can delete a job that you submitted using the qdel command in Sun Grid Engine. Below we launch a simple job  ‘test_del.sh’ that that we can kill it using qdel:

-bash-3.2$ qsub test_del.sh 

Your job 3021355 ("test_del.sh") has been submitted

-bash-3.2$ qdel 3021355

sge_user has deleted job 3021355

Support

How do I get support?

Support is provided free-of-charge to members of the School of Public Health. We work closely with individual clients to determine each project's requirements and design a custom solution that encompasses storage requirement, software environment and automation of tasks. Members who need support using the cluster can send an email to ry111@cumc.columbia.edu with the full description of the issue including the submitted script and any errors that may appear.

I want to write HPC into my grant. Is there language I can use?

Yes. Below is NIH blurb you can use in your grant.

The Mailman School of Public Health provides faculty with secure, high performance computing (HPC) capabilities for research use. The multiple high-performance compute clusters, as well as high-memory systems, are housed in two data centers totaling more than 3,000 sq. ft. of floor space. The facility has redundant air conditioning, state-of-the-art networking, a 1 MW universal power supply (UPS), and 24/7/365 security.

The cluster includes 6,336 CPU-cores and 73,728 CUDA-cores (GPU) which will have a maximum performance of 212 TFlops 10 Gb/s Ethernet fabric throughout, 40 Gb/s QDR InfiniBand, GPU-enhanced computing, and lower power hardware architecture. All of the clusters run current variants of the Linux operating systems, and are managed by Univa Grid Engine. We support Java, Perl, Matlab, and R languages, but can support other program sets as needed. We maintain two high-memory systems with 1 TB of system memory each, and a pool of computational servers for compilation, debugging, and job control. In total, we provide over 1.4 PB of high-speed redundant storage for our compute clusters and user data. A secondary Isilon clustered file system provides daily replication of valuable data to a secondary site as well as additional iSCSI Ethernet SAN storage. We have designed a variety of best-practices data storage protocols to ensure that all data remains secure, this includes Columbia University Information Technology (CUIT) and HIPAA compliant security measures as well as regular data snapshots, replication, and offsite backup. The system is on the Top500 list of supercomputers worldwide.

Requirement

What are the charges for using the HPC?

The high-performance computing platform is centrally funded by the School of Public Health and is available for researchers free-of-charge for the first year (minimal payment may apply after that).