Cirq Triage

Original RFC: bit.do/cirq-triage

Objective

The goals for this document are as follows:

  • to define a set of lightweight community processes that make it easy for users and maintainers to understand where certain issues stand, when and how they are going to be resolved, what is blocking them

  • provide visibility for project and release status

Automation: Triage party and GitHub Actions

Triage Party is a stateless web app to optimize issue and PR triage for large open-source projects using the GitHub API.

Our deployed version is here (a static IP, domain request is in progress): http://bit.do/cirq-triage-party

GitHub Actions is GitHub's workflow automation platform. We use it for continuous integration testing as well as for stale issue handling later described here.

Issue states and labels

Issue kinds

The following are the kind of issues that Cirq uses:

  • kind/bug-report the user found a bug
  • kind/feature-request for new functionality
  • kind/question in case an issue turns out to be a question, please mark it with kind/question and close it after answering. Also point the user to Quantum Computing Stack Exchange for usage questions and to cirq-dev@googlegroups.com list instead for contribution related questions.
  • kind/health For CI/testing/release process/refactoring/technical debt items
  • kind/docs documentation problems, ideas, requests
  • kind/roadmap-item for higher level roadmap items to capture conversations and feedback (not for project tracking)
  • kind/task for tracking progress on larger efforts

For most issues there are phases of

  • triage do we want to take on this task at all,
  • prioritization how urgent it is,
  • identifying feature area,
  • signalling difficulty,
  • signalling work,
  • assigning work,
  • and closing.

We will explore these phases one by one.

Triage

Triage states are

  • triage/accepted - there is consensus amongst maintainers that this is a real bug or a reasonable feature to add with a reasonable design, hence it is ready to be implemented.
  • triage/discuss - can be applied to any of the issue types to bring them up during our weekly Cirq Cynque meeting (join cirq-dev) to get an invite!) and/or to signal need for decision. If you mark an issue with triage/discuss, consider pinging the maintainers on the issue who need to come to consensus around the issue.
  • triage/needs-reproduction - for bugs only
  • triage/needs-feasibility - for feature requests (maybe bugs).
  • triage/needs-more-evidence - for feature requests - the feature request seems plausible but we need more understanding if it is valuable for enough users to warrant implementing and maintaining it.
  • triage/stale - GitHub actions automatically marks some of the issues stale and then it closes them in case of 30 days of inactivity.
  • triage/duplicate - we mark duplicated issues with this label.

While these are fairly straightforward and intuitive the workflows are depicted below.

Bug report triage

bug workflow

Figure 1. Bug workflow (to edit, see mermaid source)

Feature request triage

feature request workflow

Figure 2. Feature request workflow (to edit, see mermaid source)

Other issue types

For kind/docs, the label triage/accepted has to be added by at least one of the maintainers.

For kind/health, kind/roadmap-item and kind/task there is no particular intake workflow, as we assume that only maintainers create them to track specific work items.

Prioritization

Labels for priority capture the community's intent around when a certain feature/bug/task should be done by. It is decided by the Triage team, based on the negotiation with the user who opened the issue. Priority is expected to be modified throughout the lifetime of the issue as the expectations evolve around it.

  • priority/p0 should be very rare, only cases of emergency, and when a major critical user journey is blocked (e.g. users are exposed to a security vulnerability or they can't install Cirq)
  • priority/p1 is reserved for issues that need to be addressed for high priority work (e.g. a publication that is planned earlier than the next release)
  • priority/p2 and priority/p3 are used to tie into release planning conversations and to signal contributors important work that can be picked up.

Features and Bugs with no priority label on them will still be up for grabs for contributors. Community contributors assigned to an issue that has no priority have the discretion to choose which release they will finish the issue by.

Labels for feature area

The goal of feature area labels are to enable easy filtering to certain areas. This can help during planning, exploring problematic areas, and finding duplicate issues. Multiple area/* labels can be added to a single issue.

Signalling difficulty

Difficulty is a function of

  • complexity - the size/hardness of the issue
  • the skills required by the issue and the contributor's skills

Complexity

  • complexity/low - involves introducing/modifying less than 1-2 concepts, should take 1-2 days max for an advanced contributor
  • complexity/medium - involves introducing/modifying 3-5 concepts, takes max up to a month for an advanced contributor
  • complexity/high - involves introducing/modifying 6+ concepts, can take more than a month for an advanced contributor to work through it, and/or modifies core concepts in Cirq

Skill level required (skill/)

  • none: no special background knowledge required
  • beginner: little to no background knowledge is required in the given area/* labels
  • advanced: requires solid understanding at least one of the areas signalled by the area/* labels
  • expert: requires deep insight about one or more area/* labels to design the right abstractions

Signalling work for contributors

  • good first issue: (level/beginner in the areas needed and complexity/low to complexity/medium) - the issue is relatively small, self contained, doesn't require too much QC knowledge
  • good for learning: (level/advanced in the areas needed and complexity/low) - the issue is relatively small, self contained, but requires digging into some areas and develop a solid understanding. Should be a bit harder than "good first issues".
  • good part time project - (level/advanced and complexity/medium) - the issue might take up a couple of months, needs a design and multiple conversations, can require digging deep into a couple of papers. It is still self-contained, doesn't have too much dependencies on the rest of Cirq.
  • help wanted - If a project lead wants help on a certain task or a high priority item needs to be done but no one is assigned to it yet, we should put the help wanted label on it.

Implementation and design

After an issue arrives to triage/accepted there can be two avenues: it is ready to be implemented (most of the cases) or it needs design work upfront.

When an issue is ready to be implemented, no extra label is required to signal the readiness, that is the default.

However, when there is a need for design, we add the label needs agreed design. The design could be as lightweight as a discussion in the issue itself or a full fledged RFC proposal which should be clear from the comments.

Assigning work

Assignment should be a function of

  • willingness - contributors should volunteer to take issues or maintainers should take them actively.
  • priority - critical issues shouldn't depend on part time work.
  • complexity - highly complex, large pieces are not feasible/rewarding part time necessarily.
  • skills - if someone does not have the skills for a given issue, they will have to factor in the learning that's required to do it.

Closing

Issues should be automatically closed by PRs using the Fixes #XYZD. phrase in their description or manually, referring to the PR in case the PR author forgot to add the phrase.

Stale issues

  • Bugs and Feature requests in states triage/needs-reproduction and triage/needs-design-work, i.e. where the author is required to provide more details get an automated comment "This issue has not received any updates in 30 days" and then is marked as triage/stale after 60 days and are closed.
  • Documentation (kind/docs) issues without triage/accepted or triage/discuss are subject to 60 days staleness policy as well.
  • Roadmap-items and Tasks, and issues in triage/accepted or triage/discuss state never get stale automatically, they are subject to review during daily / weekly triage and the twice a year Bug Smash.

To summarize, all issues are subject to staleness-check, except the following:

  • triage/accepted
  • triage/discuss
  • kind/health
  • kind/roadmap-item
  • kind/task

The staleness check automation is implemented via GitHub Actions, the latest definition of staleness is defined in our staleness GitHub Action workflow.

Processes

Daily triage

Goals:

  • P0 - notice high priority issues as soon as possible and organize a fix for them.
  • P1 - keep the issue and PR backlog clean
    • maintain a backlog that makes it easy to match contributors as well as maintainers to work items.
    • for pull requests we are aiming for
      • responsiveness - people can get their work done - we don't want to block community / our team members.
      • clean workspace - stale PRs are wasteful as clutter is cognitive cost for maintainers. Stale PRs also a resource cost on GitHub - eating into other contributors' capacity to execute GitHub Actions / checks.

Who

  • [mandatory] Cirq maintainers on weekly Cirq rotation - key thing is to cover p0 bugs.
  • [optional] any maintainer who has Triage access rights to the repo.

When

  • daily, continuously - Cirq maintainer rotation is weekly

What

Issues: Daily triage should make sure that each issue has the following labels:

  • triage/*
  • area/*
  • complexity/*
  • skill/*

Pull requests:

  • As a triager it is your responsibility to review as many PRs as possible during your triage week.

Weekly discussions

Goals

  • make design decisions together with the maintainers on items that need to be discussed
  • provide a forum for feedback and blockages
  • plan together features and releases as a community

Who

  • everyone on the cirq-dev email list is invited

When:

  • 11:00AM-12:00PM PST Wednesdays

What:

Cirq Cynque should be the place to discuss

  • as much of the triage/discuss items as possible and make decisions about controversial bugs and feature requests.
  • prioritization requests - stakeholders, like quantum platform providers, research teams should be able to advocate for raising the priority of certain items
  • release planning / status: only issues with owners should be added to milestones. The owners are responsible to notify the maintainers in case the issue won't be resolved until the release.

Bug smash - every 6 months

Goals:

  • keep the triage alive: catch up on untriaged issues
  • keep the backlog of issues clean and relevant
  • use the outstanding backlog as the driver for roadmap planning

Who:

  • core maintainers

When:

  • every 6 months

What:

Every 6 months, after every other release, the team should come together and review triage/accepted items and revisit them. This is also a chance to catchup on daily triage in case it slipped.