Batch Changes design
Why is the Batch Changes feature designed the way it is?
Principles
-
Declarative API (not imperative). You declare your intent, such as "lint files in all repositories with a
package.json
file" -
Define a batch change in a file (not some online API). The source of truth of a batch change's definition is a file that can be stored in version control, reviewed in code review, and re-applied by CI. This is in the same spirit as IaaC (infrastructure as code; e.g., storing your Terraform/Kubernetes/etc. files in Git). We prefer this approach over a (worse) alternative where you define a batch change in a UI with a bunch of text fields, checkboxes, buttons, etc., and need to write a custom API client to import/export the batch change definition.
-
Shareable and portable. You can share your batch specs, and it's easy for other people to use them. A batch spec expresses an intent that's high-level enough to (usually) not be specific to your own particular repositories. You declare and inject configuration and secrets to customize it instead of hard-coding those values.
-
Large-scale. You can run batch changes across 10,000s of repositories. It might take a while to compute and push everything, and the current implementation might cap out lower than that, but the fundamental design scales well.
-
Accommodates a variety of code hosts and review/merge processes. Specifically, we don't want Batch Changes to only work for GitHub pull requests. (See current support list.)
Comparison to other distributed systems
Kubernetes is a distributed system with an API that many people are familiar with. Batch Changes is also a distributed system. All APIs for distributed systems need to handle a similar set of concerns around robustness, consistency, etc. Here's a comparison showing how these concerns are handled for a Kubernetes Deployment and a batch change. In some cases, we've found Kubernetes to be a good source of inspiration for the Batch Changes API, but resembling Kubernetes is not an explicit goal.
Kubernetes Deployment | Sourcegraph Batch Changes | |
---|---|---|
What underlying thing does this API manage? | Pods running on many (possibly unreliable) nodes | Branches and changesets on many repositories that can be rate-limited and externally modified (and our authorization can change) |
Spec YAML | ||
How desired state is computed |
|
|
Desired state consists of... |
|
|
Where is the desired state computed? | The deployment controller (part of the Kubernetes cluster) consults the DeploymentSpec and continuously computes the desired state. |
|
Reconciling desired state vs. actual state | The "deployment controller" reconciles the resulting PodSpecs against the current actual PodSpecs (and does smart things like rolling deploy). | The "Batch Changes controller" (i.e., our backend) reconciles the resulting ChangesetSpecs against the current actual changesets (and does smart things like gradual roll-out/publishing and auto-merging when checks pass). |
These docs explain more about Kubernetes' design: