Accessing SDSC Comet and Running a Test Job

Accessing SDSC Comet and Running a Test Job

Introduction:

This article will guide you through the process of accessing the San Diego Supercomputer Center (SDSC) Comet cluster and running a simple test job. Comet is a powerful resource available to researchers, and this guide will help you get started with accessing the system and submitting your first job. This guide assumes you have an active account on SDSC Comet. If you do not have an account, please refer to the SDSC Comet website for information on how to apply.

Table of Contents:

Prerequisites
Logging into Comet via Web Consol SSH
Setting Up Your Environment (Brief Overview)
Writing a Simple Test Job Script (Slurm)
Submitting Your Test Job
Monitoring Your Job Status
Checking Your Job Output
Conclusion and Further Resources

1. Prerequisites

Before you begin, ensure you have the following:

An Active SDSC Comet Account: You must have a valid user account on the SDSC Comet system.
Duo Mobile or another Two-Factor Authentication Method: SDSC Comet requires two-factor authentication for login. Ensure you have your Duo Mobile app or another configured method ready.
Web Consol SSH Access (Preferred Method): UCR Research Computing recommends using the web console SSH for accessing remote resources, including SDSC Comet. This method simplifies the connection process.

2. Logging into Comet via Web Consol SSH

This section outlines how to log into Comet using the web console SSH, the preferred method for UCR users.

Access the Web Console: Open your web browser and navigate to the UCR Research Computing web console. (Insert link to UCR Research Computing web console here if applicable, otherwise general instructions below). If you are using a general web consol ssh, you will need to find the specific link provided by UCR Research Computing or SDSC if applicable.
Locate the SSH Client: Within the web console interface, find and launch the “SSH Client” or similar tool. This will open an SSH terminal directly in your browser.
Connect to Comet: In the SSH terminal, you will need to specify the hostname for Comet. The primary login node for Comet is typically comet.sdsc.edu. You will likely need to enter the following command in the terminal:
```
ssh <your_comet_username>@comet.sdsc.edu
```
Replace <your_comet_username> with your actual SDSC Comet username.
Two-Factor Authentication: After entering your username and pressing Enter, you will be prompted for your password. Enter your Comet password. Following the password prompt, you will be asked for your two-factor authentication code. Enter the code generated by your Duo Mobile app (or your configured two-factor method).
- Example Duo Prompt:
```
Duo push for Username
Enter a passcode or select one of the following options:

1. Duo Push to XXX-XXX-XXXX
2. Phone call to XXX-XXX-XXXX

Choose an option or enter passcode:
```
  Type 1 and press Enter to receive a Duo Push notification on your registered device. Approve the notification on your device to complete the login.
Successful Login: If your credentials and two-factor authentication are correct, you will be successfully logged into Comet. You should see the Comet command prompt.
- Visual Aid: A screenshot showing a successful SSH login to Comet with the Comet command prompt would be helpful here.

3. Setting Up Your Environment (Brief Overview)

Once logged in, you are in your home directory on Comet. For running jobs, it’s important to understand a few key aspects of the environment:

File System: Comet has different file systems for different purposes.
- Home Directory (/home/<username>): This is where you land upon login. It has quotas and is intended for configuration files, scripts, and smaller datasets.
- Scratch File System (/scratch/users/<username>): This is designed for job input and output data. It is typically a high-performance file system but may have purge policies, so it’s not for long-term storage. For test jobs, using your scratch space is recommended.
Modules: Comet uses modules to manage software environments. Modules allow you to easily load and unload different software packages and versions. For simple test jobs, you might not need to load any modules, but for more complex tasks, modules will be essential. You can list available modules using the command module avail.

4. Writing a Simple Test Job Script (Slurm)

Comet uses Slurm as its job scheduler, which is also the default scheduler on UCR’s HPCC (Ursa Major). You need to create a Slurm job script to define the resources your job needs and the commands to execute.

Create a Script File: Use a text editor (like nano, vim, or emacs available on Comet) to create a new file named test_job.slurm.
```
nano test_job.slurm
```
Add Slurm Directives and Commands: Paste the following content into your test_job.slurm file. This is a very basic test job that will print the hostname and current date:
```
#!/bin/bash
#SBATCH --job-name=test_job
#SBATCH --partition=compute  # Or another appropriate partition (see Comet documentation)
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1GB
#SBATCH --output=test_job.out
#SBATCH --error=test_job.err

# Job script commands start here
echo "Running on host: $(hostname)"
date

echo "Job finished."
```
Explanation of Slurm Directives:
- #!/bin/bash: Specifies the script interpreter as bash.
- #SBATCH --job-name=test_job: Sets the job name for easy identification.
- #SBATCH --partition=compute: Specifies the partition (queue) to submit the job to. compute is a common partition for general compute jobs on Comet. Consult the Comet documentation for available partitions and their characteristics.
- #SBATCH --nodes=1: Requests 1 node.
- #SBATCH --ntasks-per-node=1: Requests 1 task per node.
- #SBATCH --cpus-per-task=1: Requests 1 CPU per task.
- #SBATCH --mem=1GB: Requests 1GB of memory.
- #SBATCH --output=test_job.out: Specifies the output file for standard output.
- #SBATCH --error=test_job.err: Specifies the error file for standard error.
Note for UCR Users: As mentioned in the UCR Research Computing facts, you typically do not include time, email notification, and allocation name in your SBATCH commands when submitting jobs within the UCR environment. However, when submitting to SDSC Comet directly, you might need to specify these depending on the requirements and your allocation. For this simple test job, the directives above should suffice for demonstration purposes. Always refer to Comet’s documentation for specific requirements.
Save the Script: In nano, press Ctrl+X, then Y to save, and Enter to confirm the filename.

5. Submitting Your Test Job

Navigate to Script Location: Ensure you are in the directory where you saved your test_job.slurm script (likely your home directory or a subdirectory within your scratch space).
Submit the Job: Use the sbatch command to submit your script to the Slurm scheduler:
```
sbatch test_job.slurm
```
Job ID Output: Upon successful submission, Slurm will output a job ID. For example:
```
Submitted batch job 1234567
```
Make note of this job ID, as you will need it to monitor your job.

6. Monitoring Your Job Status

You can check the status of your job using the squeue command.

Check Job Queue: To see the status of your job, use the command:
```
squeue -u <your_comet_username>
```
Replace <your_comet_username> with your Comet username.
Interpret Output: The squeue command will display information about your job, including:
- JOBID: The job ID.
- PARTITION: The partition the job is running in.
- NAME: The job name.
- USER: The username.
- ST: The job state (e.g., PD for Pending, R for Running, CD for Completed).
- TIME: The elapsed time for running jobs.
- NODES: Number of nodes allocated.
- NODELIST(REASON): The nodes allocated to the job or the reason if pending.
- Example squeue Output:
```
JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
   1234567    compute test_job  your_username  R       0:05      1 comet-ln01
```
Wait until the ST column shows CD (Completed), which indicates your job has finished.

7. Checking Your Job Output

Once your job has completed, you can check the output files you specified in your Slurm script (test_job.out and test_job.err).

View Output Files: Use the cat command to view the contents of the output file:
```
cat test_job.out
```
You should see output similar to this:
```
Running on host: comet-cnXXXX
Tue Oct 24 10:30:00 PDT 2023
Job finished.
```
Check Error File: If there were any errors, they would be in the error file (test_job.err). For this simple test job, it should be empty. You can check it with:
```
cat test_job.err
```

8. Conclusion and Further Resources

Congratulations! You have successfully logged into SDSC Comet and run a test job. This is a basic example to get you started. To utilize Comet effectively for your research, you will need to learn more about:

Comet Partitions and Queues: Understand the different partitions available and choose the appropriate one for your job requirements.
Slurm Scripting in Detail: Learn more about the various Slurm directives to customize resource requests, manage dependencies, and optimize job execution.
Software Modules on Comet: Explore available software modules and how to load them for your specific applications.
Data Management on Comet: Learn best practices for transferring data to and from Comet and managing data within the Comet file systems.
Comet Documentation: The official SDSC Comet documentation is your primary resource for in-depth information and up-to-date details. (Link to SDSC Comet documentation here).

For further assistance or if you have any questions, please do not hesitate to contact UCR Research Computing:

Email: research-computing@ucr.edu
UCR Research Computing Slack: https://ucr-research-compute.slack.com/

We are here to help you make the most of high-performance computing resources for your research!