Multinode quantum simulation using HTCondor on GCP

In this tutorial, you will configure HTCondor to run multiple simulations of a quantum circuit in parallel across multiple nodes. This method can be used to accelerate Monte Carlo simulations of noisy quantum circuits.

Objectives of this tutorial:

  • Use terraform to deploy a HTCondor cluster
  • Run a multinode simulation using HTCondor
  • Query cluster information and monitor running jobs in HTCondor
  • Use terraform to destroy the cluster

1. Configure your environment

Although this tutorial can be run from your local computer, we recommend the use of Google Cloud Shell. Cloud Shell has many useful tools pre-installed.

Once you have completed the Before you begin tutorial, open the Cloud Shell in the Cloud Console.

Clone this repo

In your Cloud Shell window, clone this Github repo.

git clone <a href="https://github.com/quantumlib/qsim.git">https://github.com/quantumlib/qsim.git</a>

If you get an error saying something like qsim already exists, you may need to delete the qsim directory with rm -rf qsim and rerun the clone command.

Change directory

Change directory to the tutorial:

cd qsim/docs/tutorials/multinode/terraform

This is where you will use terraform to create the HTCondor cluster required to run your jobs.

Edit init.sh file to match your environment

Using your favorite text file editor, open the init.sh file. The first few lines should look like this:

# ---- Edit below -----#

export TF_VAR_project=[USER_PROJECT]
export TF_VAR_zone=us-east4-c
export TF_VAR_region=us-east4

Replace [USER_PROJECT] with the project name you chose on the Before you begin page.

The other lines can optionally be modified to adjust your environment.

  • The TF_VAR_zone and TF_VAR_region lines can be modified to select where your project will create new jobs.

Find out more

Source the init.sh file

The edited init.sh file should be "sourced" in the cloud shell:

source init.sh

Respond Agree to any pop-ups that request permissions on the Google Cloud platform.

The final outcome of this script will include:

  • A gcloud config setup correctly
  • A service account created
  • The appropriate permissions assigned to the service account
  • A key file created to enable the use of Google Cloud automation.

This will take up to 60 seconds. At the end you will see output about permissions and the configuration of the account.

2. Run terraform

After the previous steps are completed, you can initialize terraform to begin your cluster creation. The first step is to initialize the terraform state.

terraform init

A successful result will contain the text:

Terraform has been successfully initialized!

Run the make command

For convenience, some terraform commands are prepared in a Makefile. This means you can now create your cluster, with a simple make command.

make apply

A successful run will show:

Apply complete! Resources: 4 added, 0 changed, 0 destroyed.

3. Connect to the submit node for HTCondor

Although there are ways to run HTCondor commands from your local machine, the normal path is to login to the submit node. From there you can run commands to submit and monitor jobs on HTCondor.

List VMs that were created by HTCondor

To see the VMs created by HTCondor, run:

gcloud compute instances list

At this point in the tutorial, you will see two instances listed:

NAME: c-manager
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE:
INTERNAL_IP: X.X.X.X
EXTERNAL_IP: X.X.X.X
STATUS: RUNNING

NAME: c-submit
ZONE: us-central1-a
MACHINE_TYPE: n1-standard-1
PREEMPTIBLE:
INTERNAL_IP: X.X.X.X
EXTERNAL_IP: X.X.X.X
STATUS: RUNNING

Connecting to the submit node

To connect to the submit node, click the Compute Engine item on the Cloud dashboard. This will open the VM Instances page, where you should see the two instances listed above. In the c-submit row, click on the SSH button to open a new window connected to the submit node. During this step, you may see a prompt that reads Connection via Cloud Identity-Aware Proxy Failed; simply click on Connect without Identity-Aware Proxy and the connection should complete.

This new window is logged into your HTCondor cluster. You will see a command prompt that looks something like this:

[mylogin@c-submit ~]$

The following steps should be performed in this window.

Checking the status

You can run condor_q to verify if the HTCondor install is completed. The output should look something like this:

-- Schedd: c-submit.c.quantum-htcondor-14.internal : <10.150.0.2:9618?... @ 08/18/21 18:37:50
OWNER BATCH_NAME      SUBMITTED   DONE   RUN    IDLE   HOLD  TOTAL JOB_IDS

Total for query: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for drj: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
Total for all users: 0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended

If you get command not found, you will need to wait a few minutes for the HTCondor install to complete.

4. Get the sample code and run it

The HTCondor cluster is now ready for your jobs to be run. For this tutorial, sample jobs have been provided in the Github repo.

Clone the repo on your cluster

On the submit node, you can clone the repo to get access to previously created submission files:

git clone <a href="https://github.com/quantumlib/qsim.git">https://github.com/quantumlib/qsim.git</a>

Then cd to the tutorial directory.

cd qsim/docs/tutorials/multinode

Submit a job

Now it is possible to submit a job:

condor_submit noiseless.sub

This job will run the code in noiseless3.py, which executes a simple circuit and prints the results as a histogram. If successful, the output will be:

Submitting job(s).
1 job(s) submitted to cluster 1.

You can see the job in queue with the condor_q command.

The job will take several minutes to finish. The time includes creating a VM compute node, installing the HTCondor system and running the job. When complete, the following files will be stored in the out directory:

  • out/log.1-0 contains a progress log for the job as it executes.
  • out/out.1-0 contains the final output of the job.
  • out/err.1-0 contains any error reports. This should be empty.

To view one of these files in the shell, you can run cat out/[FILE], replacing [FILE] with the name of the file to be viewed.

5. Run multinode noise simulations

Noise simulations make use of a Monte Carlo method for quantum trajectories.

The noise.sub file

To run multiple simulations, you can define a "submit" file. noise.sub is an example of this file format, and is shown below. Notable features include:

  • universe = docker means that all jobs will run inside a docker container.
  • queue 50 submits 50 separate copies of the job.
universe                = docker
docker_image            = gcr.io/quantum-builds/github.com/quantumlib/jupyter_qsim:latest
arguments               = python3 noise3.py
should_transfer_files   = YES
transfer_input_files    = noise3.py
when_to_transfer_output = ON_EXIT
output                  = out/out.$(Cluster)-$(Process)
error                   = out/err.$(Cluster)-$(Process)
log                     = out/log.$(Cluster)-$(Process)
request_memory          = 10GB
queue 50

The job can be submitted with the condor_submit command.

condor_submit noise.sub

The output should look like this:

Submitting job(s)..................................................
50 job(s) submitted to cluster 2.

To monitor the ongoing process of jobs running, you can take advantage of the Linux watch command to run condor_q repeatedly:

watch "condor_q; condor_status"

The output of this command will show you the jobs in the queue as well as the VMs being created to run the jobs. There is a limit of 20 VMs for this configuration of the cluster.

When the queue is empty, the command can be stopped with CTRL-C.

The output from all trajectories will be stored in the out directory. To see the results of all simulations together, you can run:

cat out/out.2-*

The output should look something like this:

Counter({3: 462, 0: 452, 2: 50, 1: 36})
Counter({0: 475, 3: 435, 1: 49, 2: 41})
Counter({0: 450, 3: 440, 1: 59, 2: 51})
Counter({0: 459, 3: 453, 2: 51, 1: 37})
Counter({3: 471, 0: 450, 2: 46, 1: 33})
Counter({3: 467, 0: 441, 1: 54, 2: 38})
Counter({3: 455, 0: 455, 1: 50, 2: 40})
Counter({3: 466, 0: 442, 2: 51, 1: 41})
.
.
.

6. Shutting down

If your Cloud Shell is still open, simply run:

make destroy

If your Cloud Shell closed at any point, you'll need to re-initialize it. Open a new shell and run:

cd qsim/docs/tutorials/multinode/terraform
source init.sh
make destroy

After these commands complete, check the Compute Instances dashboard to verify that all VMs have been shut down. This tutorial makes use of an experimental autoscaling script to bring up and turn down VMs as needed. If any VMs remain after several minutes, you may need to shut them down manually, as described in the next section.

Next steps

The file being run in the previous example was noise3.py. To run your own simulations, simply create a new python file with your circuit and change the noise3.py references in noise.sub to point to the new file.

A detailed discussion of how to construct various types of noise in Cirq can be found here.

For more information about managing your VMs, see the following documentation from Google Cloud:

As an alternative to Google Cloud, you can download the Docker container or the qsim source code to run quantum simulations on your own high-performance computing platform.