The Center for Research Integrated Computing (CIRC) of SOME SCHOOL is so hard to use due to the significant lack of proper documentation and examples. My man, no one has this much time participating your 3-hour long training session for weeks, especially people who really need it are PhDs, who are typically really busy. Just provide good documentation and example, so people can start working on it.
To this end, I am creating this repostiroy to store necessary commands and tips for everyone to start using this.
Use this link: https://registration.circ.rochester.edu/account
There are primarily two ways to use CIRC:
-
Using Terminal and SLURM job scheduler.
Pro:
- Transparent and flexible: you can work just like on any other remote server.
- Great for running scripts
- After creating virtual environment, you don't need to connect it to Jupyter kernel
- You have to set up your Terminal anyways, and it is a good pratice to upload/download files via ssh.
Con:
- Bad experience if you never used server and worked in Terminal.
- Learning curve for SLURM for anyone who never used computing cluster scheduler (meaning all of us).
- Hard to use R without RStudio or R in JupyterLab
- Have to write a script first and test it several times locally.
-
Using JupyterLab.
Pro:
- Easy to use. No need to learn Terminal or SLURM.
- Most people are used to working in Jupyter Notebook.
- Easy for R.
Con:
- You have to set up your Terminal anyways.
- Kernel dead problem is serious when connect your own environment to JupyterLab kernel. CIRC allows really low flexibility, so it is also hard to debug it.
- Unclear waiting time and duration.
This is something you have to do because a virtual envinroment must be created.
ssh YourNetIDHere@bluehive.circ.rochester.eduThe default python version should be 3.6.x, which is really low. However, the system does have other versions. You can see them by running:
module avail python3You can unload the current python and load a new python version with:
module unload python3
module load python3/3.12.4/b1Python 3.12.4 is the only version I confirmed with CIRC that contains ssl library which you need for pip. You cannot install ssl because it requires sudo.
Here is an example of creating an virtual environment for a project mainly using UMAP library. As you don't have sudo, packages like pipx and virtualenv can't be installed. This is why I am using the python-inherited virtual environment creation tool.
python3 -m venv ~/myvenv/umapUse the following line to activate your virtual environment before installing any python package.
source ~/myvenv/umap/bin/activateMany high performance Python packages rely on C which requires GCC compiler to install so you have to load it as well. I only tested the following version of the GCC compiler.
module unload gcc
module load gcc/13.2.0/b1The followings are package I need to do the project using UMAP and HDBSCAN. You can feel free to install anything you want.
pip install pyarrow
pip install polars
pip install pandas
pip install atomicwrites
pip install "matplotlib<3.10"
pip install "numpy >= 1.23"
pip install "scipy >= 1.3.1"
pip install "scikit-learn >= 1.6"
pip install "numba >= 0.51.2"
pip install "pynndescent >= 0.5"
pip install tqdm
pip install umap-learn
pip install hdbscan
pip install fast_hdbscan
pip install git+https://github.com/Jacobsonradical/rabbitlord.gitYou have two stroage spaces. The first one is in
/home/YourNetIDHereThis only has 20 GB spaces, so I suggest not to upload anything here. This directory only contains your virtual environments.
The second one is in
/scratch/YourNetIDHereHere, you have 200 GB spaces, so your scripts/data should be here. Note that this section is not backed up by CIRC, so after computing, download them immeidately.
The good thing about CIRC is that they have rsync, so please also install rsync in your computer. To upload, you want to use the following scripts
If you only want to upload a file
rsync -avz --info=progress2 -e ssh local/file/path YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/If you want to upload the whole folder
rsync -avz --info=progress2 -e ssh local/folder/directory YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/If you want to upload all the things within a folder
rsync -avz --info=progress2 -e ssh local/folder/directory/ YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/If you only want to download a file
rsync -avz --info=progress2 -e ssh YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/your/filepath /local/directoryIf you want to download the whole folder
rsync -avz --info=progress2 -e ssh YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/your/directory /local/directoryIf you want to download all the things within a folder
rsync -avz --info=progress2 -e ssh YourNetIDHere@bluehive.circ.rochester.edu:/scratch/YourNetIDHere/your/directory/ /local/directoryTo run a SLURM schedule, you need to create a .slurm file. Let us see an example of such a file:
#!/bin/bash
#SBATCH -p gpu
#SBATCH -t 2-00:00:00
#SBATCH --nodelist=bhg[0012-0018,0020,0022,0024-0027]
#SBATCH --job-name=run123
#SBATCH --output=logs/run123_%j.out
#SBATCH --error=logs/run123_%j.errPolars does not support those CPUs that do not support avx2, fma, bmi1, bmi2, lzcnt, movbe. Bascially, these CPUs are really outdated (pre 2015 or something), and they are really cheap. I don't know about other libraries, but if you are not doing really basic stuff, perhaps avoid these CPUs as well, or at least, you know what to possibly blame when your code has a problem.
Instead of listing restrictions that should be avoided, I list the restrctions you should prefer. I list them here for each partition, because you need to request by partition anyways:
- debug: None
- gpu: E52695v4, Gold6140, Gold6130, Gold6330, Gold6226R
- gpu-debug: None
- gpu-interactive: E52695v4
- highmem: None
- interactive: E52695v4
- preempt: E52695v4, E52699v4, E2650v4, Gold6148, Gold6126, epyc7501, Gold6148, Gold6248, Platinum8268, Platinum8268, Gold6330, Gold6330, 4114, Gold6230, 4214R, AMD7413, Gold6338
- standard: E52695v4, Gold6330
- visual: None
The best way to access this is using JupyterHub.
- Go to: https://info.circ.rochester.edu/#Web_Applications/JupyterHub/
- Click the "JupyterHub" link in the first sentence.
- Log in.
- You have a request window to start a session as follows. I will talk about it in detail later. But once you requested your session, it will spawn a JupyterLab, and then you can play around.

Okay here is the hardest part.
- Go to this link: https://info.circ.rochester.edu/#BlueHive/Compute_Nodes/
- You can see a bunch of these things:

- Click one, say, I click gpu:

Yeah, as you can see, we don't have many nodes for GPUs. Let's take the first row as an example. It means, we have 12 nodes, and each node contains 24 cores, 62GB CPU RAM, 2 Tesla K20Xm GPUs (I don't even know why we provide this, this costs about 200 dollars nowadays).
How this thing works is that the system assigns you node, not specific resources. Hence, simply request the node that is >= what you need. To request, in the request window, ignore everything else, directly put the following command in the additional option:
-p gpu -t 2-00:00:00 --nodelist=bhg[0012-0018,0020,0022,0024-0027]It tells the system you want
- gpu partition (so for example, if you want to debug partition, you do -p debug)
- 2 hours of time
- Any of the node in the node list.
Another example:
-p preempt -t 2-00:00:00 --nodelist=bhg0059- Go to the same link: https://info.circ.rochester.edu/#BlueHive/Compute_Nodes/
- At the very above you can see this:

- It tells you the maximal amount of time you can use. For example, for GPU nodes, you have maximal 5 days.
You have to do this or you cannot use this environment for your Jupyter session.
First, ensure that your virtual envirnoment is runnning:
source ~/myvenv/umap/bin/activateThen, install ipykernel
pip install ipykernelNext, run the following line with your own modification.
python -m ipykernel install --user --name=umap --display-name="Python3.12.4-b1 (UMAP)"You can customize the --display-name to whatever you like. However, keep the --name to the folder name before /bin/activate. For example, if your activate virtual environment by the code source /some/directory/to/hello/bin/activate, then you should use --name=hello.
Now, go your JupyterLab, refresh your browser. You should see something like this. As you can see, the "myvenv" folder is there.

Next, click the "New" botton on the top right corner, scroll down, you should see your environment, like mine:

If you want to uninstall this setting from Jupyter:
jupyter kernelspec uninstall umapChange umap to anything you put for --name. For example, if you put --name=hello, then replace umap with hello.
