Cirq Triage

Original RFC: bit.do/cirq-triage

Objective

The goals for this document are as follows:

  • Define a set of lightweight community processes that make it easy for users and maintainers to understand where certain issues stand, when and how they are going to be resolved, what is blocking them

  • Provide visibility for project and release status

Automation: Triage party and GitHub Actions

Triage Party is a stateless web app to optimize issue and PR triage for large open-source projects using the GitHub API.

Our deployed version is here (a static IP, domain request is in progress): http://bit.do/cirq-triage-party

GitHub Actions is GitHub's workflow automation platform. We use it for continuous integration testing as well as for stale issue handling later described here.

Issue states and labels

Issue kinds

The following are the kinds of issues that Cirq uses:

  • kind/bug-report: a report that something doesn’t work
  • kind/design-issue: a topic that needs software design
  • kind/feature-request: a request for new functionality
  • kind/question: the issue is actually a question (in which case, it should be closed after being answered, and the user may be pointed to other resources if appropriate, such as the Quantum Computing Stack Exchange for general usage questions)
  • kind/health: for CI/testing/release process/refactoring/technical debt items
  • kind/docs: documentation problems, ideas, requests
  • kind/roadmap-item: for higher-level roadmap items to capture conversations and feedback (not for project tracking)
  • kind/task: for tracking progress on larger efforts

For most issues, there are the following phases:

  • Triage to decide whether we want to take on this issue at all,
  • Prioritization of how urgent it is,
  • Identifying the relevant feature area,
  • Signalling its anticipated difficulty,
  • Signalling the amount of expected work,
  • Assigning work,
  • And (eventually) closing.

In the following subsections, we explore these phases one by one.

Triage

The triage states are:

  • triage/accepted – there is consensus among the maintainers that this is a real bug, or a reasonable feature to add with a reasonable design, and hence it is ready to be implemented
  • triage/duplicate – for issues that turn out to be essentially the same as other existing issues (in which case, they're likely to be closed)
  • triage/discuss – can be added to any issue type to bring them up during the next Cirq Cynq meeting and/or to signal need for a decision (in addition, consider also pinging the maintainers who need to come to consensus around the issue)
  • triage/needs-reproduction – for bugs only
  • triage/needs-feasibility – for feature requests (maybe bugs)
  • triage/needs-more-evidence – for plausible requests that nevertheless need more compelling evidence about their value for enough users to warrant implemention and maintenance
  • triage/stale – issues are automatically marked stale after 90 days of inactivity, and closed 60 days after that
  • triage/wont-fix – for when the decision is against pursuing something (perhaps because the thing in question is working as designed, or because changing it is impractical, or some other reason)

The workflows are illustrated below.

Bug report triage

bug workflow

Figure 1. Bug workflow (to edit, see mermaid source)

Feature request triage

feature request workflow

Figure 2. Feature request workflow (to edit, see mermaid source)

Other issue types

For kind/docs, the label triage/accepted has to be added by at least one of the maintainers.

For kind/health, kind/roadmap-item and kind/task, there is no particular intake workflow, as we assume that only maintainers create them to track specific work items.

Prioritization

Labels for priority capture the community's intent around when a certain feature/bug/task should be done by. It is decided by the Triage team, based on the negotiation with the user who opened the issue. Priority is expected to be modified throughout the lifetime of the issue as the expectations evolve around it.

  • priority/p0 should be very rare, only cases of emergency, and when a major critical user journey is blocked (e.g., users are exposed to a security vulnerability or they can't install Cirq)
  • priority/p1 is reserved for issues that need to be addressed for high priority work (e.g., a publication that is planned earlier than the next release)
  • priority/p2 and priority/p3 are used to tie into release planning conversations and to signal contributors important work that can be picked up.

Features and Bugs with no priority label on them will still be up for grabs for contributors. Community contributors assigned to an issue that has no priority have the discretion to choose which release they will finish the issue by.

Labels for feature area

The goal of feature area labels are to enable easy filtering to certain areas. This can help during planning, exploring problematic areas, and finding duplicate issues. Multiple area/* labels can be added to a single issue.

Signalling difficulty

Difficulty is a function of

  • Complexity – the size/hardness of the issue
  • The skills required by the issue and the contributor's skills

Complexity

  • complexity/low – involves introducing/modifying less than 1-2 concepts, should take 1-2 days max for an advanced contributor
  • complexity/medium – involves introducing/modifying 3-5 concepts, takes max up to a month for an advanced contributor
  • complexity/high – involves introducing/modifying 6+ concepts, can take more than a month for an advanced contributor to work through it, and/or modifies core concepts in Cirq

Skill level required (skill/level)

  • none: no special background knowledge required
  • beginner: little to no background knowledge is required in the given area/* labels
  • advanced: requires solid understanding at least one of the areas signalled by the area/* labels
  • expert: requires deep insight about one or more area/* labels to design the right abstractions

Signalling work for contributors

  • good first issue: (level/beginner in the areas needed and complexity/low to complexity/medium) – the issue is relatively small, self contained, doesn't require too much quantum-computing knowledge
  • good for learning: (level/advanced in the areas needed and complexity/low) – the issue is relatively small, self contained, but requires digging into some areas and develop a solid understanding. Should be a bit harder than "good first issues".
  • good part time project – (level/advanced and complexity/medium) – the issue might take up a couple of months, needs a design and multiple conversations, can require digging deep into a couple of papers. It is still self-contained, doesn't have too much dependencies on the rest of Cirq.
  • help wanted – If a project lead wants help on a certain task or a high-priority item needs to be done but no one is assigned to it yet, we should put the help wanted label on it.

Implementation and design

After an issue arrives to triage/accepted there can be two avenues: it is ready to be implemented (most of the cases) or it needs design work upfront.

When an issue is ready to be implemented, no extra label is required to signal the readiness, because that is the default.

However, when there is a need for design, we add the label needs agreed design. The design could be as lightweight as a discussion in the issue itself or a full-fledged RFC proposal which should be clear from the comments.

Assigning work

Assignment should be a function of

  • Willingness – contributors should volunteer to take issues or maintainers should take them actively.
  • Priority – critical issues shouldn't depend on part time work.
  • Complexity – highly complex, large pieces that are not necessarily feasible/rewarding as part-time work.
  • Skills – if someone does not have the skills for a given issue, they will have to factor in the learning that's required to do it.

Closing

Issues should be automatically closed by PRs using the Fixes #XYZD. phrase in their description or manually, referring to the PR in case the PR author forgot to add the phrase.

Stale issues

  • Bugs and Feature requests in states triage/needs-reproduction and triage/needs-design-work, i.e., where the author is required to provide more details get an automated comment "This issue has not received any updates in 90 days" and then is marked as triage/stale after 60 days and are closed.
  • Documentation (kind/docs) issues without triage/accepted or triage/discuss are subject to 90 days staleness policy as well.
  • Roadmap-items and Tasks, and issues in triage/accepted or triage/discuss state never get stale automatically, they are subject to review during daily/weekly triage and the twice a year Bug Smash.

To summarize, all issues are subject to staleness-check, except the following:

  • triage/accepted
  • triage/discuss
  • kind/health
  • kind/roadmap-item
  • kind/task

The staleness check automation is implemented via GitHub Actions, the latest definition of staleness is defined in our staleness GitHub Action workflow.

Processes

Daily triage

Goals:

  • P0 – notice high priority issues as soon as possible and organize a fix for them.
  • P1 – keep the issue and PR backlog clean – Maintain a backlog that makes it easy to match contributors as well as maintainers to work items. – For pull requests, aim for * Responsiveness – people can get their work done – we don't want to block community/our team members. * Clean workspace – stale PRs are wasteful as clutter is cognitive cost for maintainers. Stale PRs also a resource cost on GitHub – eating into other contributors' capacity to execute GitHub Actions/checks.

Who

  • [mandatory] Cirq maintainers on weekly Cirq rotation – key thing is to cover p0 bugs.
  • [optional] Any maintainer who has Triage access rights to the repo.

When

  • Daily, continuously – Cirq maintainer rotation is weekly

What

Issues: Daily triage should make sure that each issue has the following labels:

  • triage/*
  • area/*
  • complexity/*
  • skill/*

Pull requests:

  • As a triager it is your responsibility to review as many PRs as possible during your triage week.

Weekly discussions

Goals

  • Make design decisions together with the maintainers on items that need to be discussed
  • Provide a forum for feedback and blockages
  • Plan together features and releases as a community

Who

  • Everyone on the cirq-dev email list is invited

When:

  • 10:00AM-11:00PM Pacific Time Wednesdays

What:

Cirq Cynq (join cirq-dev to get an invite!) should be the place to discuss the following:

  • As much of the triage/discuss items as possible and to make decisions about controversial bugs and feature requests
  • Prioritization requests – stakeholders should be able to advocate for raising the priority of certain items
  • Release planning/status – only issues with owners should be added to milestones, and the owners are responsible to notify the maintainers in case the issue won't be resolved until the release

Bug smash – every 6 months

Goals:

  • Keep the triage alive: catch up on untriaged issues
  • Keep the backlog of issues clean and relevant
  • Use the outstanding backlog as the driver for roadmap planning

Who:

  • Core maintainers

When:

  • Every 6 months

What:

Every 6 months, after every other release, the team should come together and review triage/accepted items and revisit them. This is also a chance to catchup on daily triage in case it slipped.