Introduction:
This article will guide you through the process of accessing the San Diego Supercomputer Center (SDSC) Comet cluster and running a simple test job. Comet is a powerful resource available to researchers, and this guide will help you get started with accessing the system and submitting your first job. This guide assumes you have an active account on SDSC Comet. If you do not have an account, please refer to the SDSC Comet website for information on how to apply.
Table of Contents:
1. Prerequisites
Before you begin, ensure you have the following:
2. Logging into Comet via Web Consol SSH
This section outlines how to log into Comet using the web console SSH, the preferred method for UCR users.
Access the Web Console: Open your web browser and navigate to the UCR Research Computing web console. (Insert link to UCR Research Computing web console here if applicable, otherwise general instructions below). If you are using a general web consol ssh, you will need to find the specific link provided by UCR Research Computing or SDSC if applicable.
Locate the SSH Client: Within the web console interface, find and launch the “SSH Client” or similar tool. This will open an SSH terminal directly in your browser.
Connect to Comet: In the SSH terminal, you will need to specify the hostname for Comet. The primary login node for Comet is typically comet.sdsc.edu
. You will likely need to enter the following command in the terminal:
ssh <your_comet_username>@comet.sdsc.edu
Replace <your_comet_username>
with your actual SDSC Comet username.
Two-Factor Authentication: After entering your username and pressing Enter, you will be prompted for your password. Enter your Comet password. Following the password prompt, you will be asked for your two-factor authentication code. Enter the code generated by your Duo Mobile app (or your configured two-factor method).
Duo push for Username
Enter a passcode or select one of the following options:
1. Duo Push to XXX-XXX-XXXX
2. Phone call to XXX-XXX-XXXX
Choose an option or enter passcode:
Type 1
and press Enter to receive a Duo Push notification on your registered device. Approve the notification on your device to complete the login.
Successful Login: If your credentials and two-factor authentication are correct, you will be successfully logged into Comet. You should see the Comet command prompt.
3. Setting Up Your Environment (Brief Overview)
Once logged in, you are in your home directory on Comet. For running jobs, it’s important to understand a few key aspects of the environment:
/home/<username>
): This is where you land upon login. It has quotas and is intended for configuration files, scripts, and smaller datasets./scratch/users/<username>
): This is designed for job input and output data. It is typically a high-performance file system but may have purge policies, so it’s not for long-term storage. For test jobs, using your scratch space is recommended.module avail
.4. Writing a Simple Test Job Script (Slurm)
Comet uses Slurm as its job scheduler, which is also the default scheduler on UCR’s HPCC (Ursa Major). You need to create a Slurm job script to define the resources your job needs and the commands to execute.
Create a Script File: Use a text editor (like nano
, vim
, or emacs
available on Comet) to create a new file named test_job.slurm
.
nano test_job.slurm
Add Slurm Directives and Commands: Paste the following content into your test_job.slurm
file. This is a very basic test job that will print the hostname and current date:
#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --partition=compute # Or another appropriate partition (see Comet documentation)
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH --output=test_job.out
#SBATCH --error=test_job.err
# Job script commands start here
echo "Running on host: $(hostname)"
date
echo "Job finished."
Explanation of Slurm Directives:
#!/bin/bash
: Specifies the script interpreter as bash.#SBATCH --job-name=test_job
: Sets the job name for easy identification.#SBATCH --partition=compute
: Specifies the partition (queue) to submit the job to. compute
is a common partition for general compute jobs on Comet. Consult the Comet documentation for available partitions and their characteristics.#SBATCH --nodes=1
: Requests 1 node.#SBATCH --ntasks-per-node=1
: Requests 1 task per node.#SBATCH --cpus-per-task=1
: Requests 1 CPU per task.#SBATCH --mem=1GB
: Requests 1GB of memory.#SBATCH --output=test_job.out
: Specifies the output file for standard output.#SBATCH --error=test_job.err
: Specifies the error file for standard error.Note for UCR Users: As mentioned in the UCR Research Computing facts, you typically do not include time, email notification, and allocation name in your SBATCH commands when submitting jobs within the UCR environment. However, when submitting to SDSC Comet directly, you might need to specify these depending on the requirements and your allocation. For this simple test job, the directives above should suffice for demonstration purposes. Always refer to Comet’s documentation for specific requirements.
Save the Script: In nano
, press Ctrl+X
, then Y
to save, and Enter to confirm the filename.
5. Submitting Your Test Job
Navigate to Script Location: Ensure you are in the directory where you saved your test_job.slurm
script (likely your home directory or a subdirectory within your scratch space).
Submit the Job: Use the sbatch
command to submit your script to the Slurm scheduler:
sbatch test_job.slurm
Job ID Output: Upon successful submission, Slurm will output a job ID. For example:
Submitted batch job 1234567
Make note of this job ID, as you will need it to monitor your job.
6. Monitoring Your Job Status
You can check the status of your job using the squeue
command.
Check Job Queue: To see the status of your job, use the command:
squeue -u <your_comet_username>
Replace <your_comet_username>
with your Comet username.
Interpret Output: The squeue
command will display information about your job, including:
PD
for Pending, R
for Running, CD
for Completed).NODELIST(REASON): The nodes allocated to the job or the reason if pending.
Example squeue
Output:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
1234567 compute test_job your_username R 0:05 1 comet-ln01
Wait until the ST
column shows CD
(Completed), which indicates your job has finished.
7. Checking Your Job Output
Once your job has completed, you can check the output files you specified in your Slurm script (test_job.out
and test_job.err
).
View Output Files: Use the cat
command to view the contents of the output file:
cat test_job.out
You should see output similar to this:
Running on host: comet-cnXXXX
Tue Oct 24 10:30:00 PDT 2023
Job finished.
Check Error File: If there were any errors, they would be in the error file (test_job.err
). For this simple test job, it should be empty. You can check it with:
cat test_job.err
8. Conclusion and Further Resources
Congratulations! You have successfully logged into SDSC Comet and run a test job. This is a basic example to get you started. To utilize Comet effectively for your research, you will need to learn more about:
For further assistance or if you have any questions, please do not hesitate to contact UCR Research Computing:
We are here to help you make the most of high-performance computing resources for your research!