qsim and qsimh are designed to be extensible to a variety of different
applications. The base versions of each are qsim_base
and qsimh_base
;
sample extensions are provided in
apps. To compile the
code, just run make qsim
. Binaries of the form qsim(h)_*.x
will be added
to the apps
directory.
Sample circuits are provided in circuits.
qsim_base usage
./qsim_base.x -c circuit_file -d maxtime -t num_threads -f max_fused_size -v verbosity -z
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d maxtime |
maximum time |
-t num_threads |
number of threads to use |
-f max_fused_size |
maximum fused gate size |
-v verbosity |
verbosity level (0,1,2,3,4,5) |
-z |
set flush-to-zero and denormals-are-zeros MXCSR control flags |
qsim_base computes all the amplitudes and just prints the first eight of them (or a smaller number for 1- or 2-qubit circuits).
Verbosity levels are described in the following table.
Verbosity level | Description |
---|---|
0 | no additional information |
1 | add total simulation runtime |
2 | add initialization runtime and fuser runtime |
3 | add basic fuser statistics |
4 | add simulation runtime for each fused gate |
5 | additional fuser information (qubit indices for each fused gate) |
Example:
./qsim_base.x -c ../circuits/circuit_q24 -d 16 -t 8 -v 1
qsim_von_neumann usage
./qsim_von_neumann.x -c circuit_file -d maxtime -t num_threads -f max_fused_size -v verbosity -z
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d maxtime |
maximum time |
-t num_threads |
number of threads to use |
-f max_fused_size |
maximum fused gate size |
-v verbosity |
verbosity level (0,1,2,3,4,5) |
-z |
set flush-to-zero and denormals-are-zeros MXCSR control flags |
qsim_von_neumann computes all the amplitudes and calculates the von Neumann entropy. Note that this can be quite slow for large circuits and small thread numbers as the calculation of logarithms is slow.
Example:
./qsim_von_neumann.x -c ../circuits/circuit_q24 -d 16 -t 4 -v 1
qsim_amplitudes usage
./qsim_amplitudes.x -c circuit_file \
-d times_to_save_results \
-i input_files \
-o output_files \
-f max_fused_size \
-t num_threads -v verbosity -z
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d times_to_save_results |
comma-separated list of circuit times to save results at |
-i input_files |
comma-separated list of bitstring input files |
-o output_files |
comma-separated list of amplitude output files |
-t num_threads |
number of threads to use |
-f max_fused_size |
maximum fused gate size |
-v verbosity |
verbosity level (0,1,2,3,4,5) |
-z |
set flush-to-zero and denormals-are-zeros MXCSR control flags |
qsim_amplitudes reads input files of bitstrings, computes the corresponding amplitudes at specified times and writes them to output files.
Bitstring files should contain bitstings (one bitstring per line) in text format.
Example:
./qsim_amplitudes.x -c ../circuits/circuit_q24 -t 4 -d 16,24 -i ../circuits/bitstrings_q24_s1,../circuits/bitstrings_q24_s2 -o ampl_q24_s1,ampl_q24_s2 -v 1
qsim_qtrajectory_cuda usage
./qsim_qtrajectory_cuda.x -c circuit_file \
-d times_to_calculate_observables \
-a amplitude_damping_const \
-p phase_damping_const \
-t traj0 -n num_trajectories \
-f max_fused_size \
-v verbosity
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d times_to_calculate_observables |
comma-separated list of circuit times to calculate observables at |
-a amplitude_damping_const |
amplitude damping constant |
-p phase_damping_const |
phase damping constant |
-t traj0 |
starting trajectory |
-n num_trajectories |
number of trajectories to run starting with traj0 |
-f max_fused_size |
maximum fused gate size |
-v verbosity |
verbosity level (0,1,2,3,4,5) |
qsim_qtrajectory_cuda runs on GPUs. qsim_qtrajectory_cuda performs quantum trajactory simulations with amplitude damping and phase damping noise channels. qsim_qtrajectory_cuda calculates observables (operator X at each qubit) at specified times.
Example:
./qsim_qtrajectory_cuda.x -c ../circuits/circuit_q24 -d 8,16,32 -a 0.005 -p 0.005 -t 0 -n 100 -f 4 -v 0
qsimh_base usage
./qsimh_base.x -c circuit_file \
-d maxtime \
-k part1_qubits \
-w prefix \
-p num_prefix_gates \
-r num_root_gates \
-t num_threads -v verbosity -z
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d maxtime |
maximum time |
-k part1_qubits |
comma-separated list of qubit indices for part 1 |
-w prefix |
prefix value |
-p num_prefix_gates |
number of prefix gates |
-r num_root_gates |
number of root gates |
-t num_threads |
number of threads to use |
-v verbosity |
verbosity level (0,1,4,5) |
-z |
set flush-to-zero and denormals-are-zeros MXCSR control flags |
qsimh_base just computes and just prints the first eight amplitudes. The hybrid
Schrödinger-Feynman method is used. The lattice is split into two parts.
A two level checkpointing scheme is used to improve performance. Say, there
are N
gates on the cut. We split those into three parts: p+r+s=N
, where
p
is the number of "prefix" gates, r
is the number of "root" gates and
s
is the number of "suffix" gates. The first checkpoint is executed after
applying all the gates up to and including the prefix gates and the second
checkpoint is executed after applying all the gates up to and including the
root gates. The full summation over all the paths for the root and suffix gates
is performed.
The path for the prefix gates is specified by prefix
. It is just a value of
bit-shifted path indices in the order of occurrence of prefix gates in the
circuit file. This is primarily used for distributed execution - see the
Distributed execution
section below for more details.
Example (running on one machine):
./qsimh_base.x -c ../circuits/circuit_q30 -d 16 \
-k 0,1,2,6,7,8,12,13,14,18,19,20,24,25,26 \
-t 8 -w 0 -p 0 -r 5 -v 1
Choosing flag values for qsimh
-k defines how the lattice will be split up. In the examples above, the
lattice has the structure below (cuts are denoted by the |
symbol):
0 1 2 | 3 4 5
6 7 8 | 9 10 11
12 13 14 | 15 16 17
18 19 20 | 21 22 23
24 25 26 | 27 28 29
Deciding which cuts are optimal for a given circuit is computationally hard. However, splitting the grid into roughly equal parts with the fewest cuts possible (as is done for the lattice above) produces a circuit that performs reasonably well in most cases.
The runtime of an execution is heavily influenced by -p, as there is no summation over the "prefix" gates. The unique "prefix" path is specified by -w; see the "Distributed execution" section below for details on this.
-r implicitly specifies the number of the "suffix" gates: the total number of gates on the cut minus the values specified by -p and -r. For performance, the "suffix" gates should typically be the gates on the cut with maximum "time".
qsimh_amplitudes usage
./qsimh_amplitudes.x -c circuit_file \
-d maxtime \
-k part1_qubits \
-w prefix \
-p num_prefix_gates \
-r num_root_gates \
-i input_file -o output_file \
-t num_threads -v verbosity -z
Flag | Description |
---|---|
-c circuit_file |
circuit file to run |
-d maxtime |
maximum time |
-k part1_qubits |
comma-separated list of qubit indices for part 1 |
-w prefix |
prefix value |
-p num_prefix_gates |
number of prefix gates |
-r num_root_gates |
number of root gates |
-i input_file |
bitstring input file |
-o output_file |
amplitude output file |
-t num_threads |
number of threads to use |
-v verbosity |
verbosity level (0,1,4,5) |
-z |
set flush-to-zero and denormals-are-zeros MXCSR control flags |
qsimh_amplitudes reads the input file of bitstrings, computes the corresponding amplitudes and writes them to the output file. The hybrid Schrödinger-Feynman method is used, see above.
Bitstring files should contain bitstrings (one bitstring per line) in text format.
Example (do not execute - see below):
./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 0 -p 0 -r 13 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1 -v 1
This command could take weeks to run, since parallelism on a single machine is limited by the -t flag and the available cores on the device. For large circuits like this, distributed execution is recommended.
Distributed execution
By setting -p to be greater than zero, the workload of qsimh_amplitudes can be
distributed across multiple machines. Each machine should use the same
arguments to ./qsimh_amplitudes.x
, with the exception of the -w flag, which specifies the path that machine will evaluate.
Example:
# Machine 1
./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 0 -p 9 -r 4 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1_w0 -v 1
# Machine 2
./qsimh_amplitudes.x -c ../circuits/circuit_q40 -d 47 -k 0,1,2,3,4,5,6,7,8,9,10,13,14,15,16,17,23,24 -t 8 -w 1 -p 9 -r 4 -i ../circuits/bitstrings_q40_s1 -o ampl_q40_s1_w1 -v 1
# ...additional executions...
Each execution above computes a portion of the overall amplitude for the specified bitstrings. Summing across these results will give the final amplitudes, with fidelity dependent on the number of paths executed.