Scaling Sourcegraph on Kubernetes
Sourcegraph can scale to accommodate large codebases and many users.
Increase resources according to the Scaling Overview per Service if you notice slower search or navigation.
Cluster resource guidelines
For production environments, we recommend allocate resources based on your instance size. See our resource estimator for estimates.
Improving performance with a large number of repositories
Here is a simplified list of the key parameters to tune when scaling Sourcegraph to many repositories:
sourcegraph-frontend
CPU/memory resource allocationssearcher
replica countindexedSearch
replica count and CPU/memory resource allocationsgitserver
replica countsymbols
replica count and CPU/memory resource allocationsgitMaxConcurrentClones
, becausegit clone
andgit fetch
operations are IO and CPU-intensiverepoListUpdateInterval
(in minutes), because each interval triggersgit fetch
operations for all repositories
Notes:
- If your change requires restarting
gitserver
pods and they are rescheduled to other nodes, they may go offline briefly (showing aMulti-Attach
error). This is due to volume detach/reattach. Contact us for mitigation steps depending on your cloud provider. - See the docs to understand each service's role:
Improving performance with large monorepos
Here is a simplified list of key parameters to tune when scaling Sourcegraph to large monorepos:
sourcegraph-frontend
CPU/memory resource allocationssearcher
CPU/memory resource allocations (allocate enough memory to hold all non-binary files in your repositories)indexedSearch
CPU/memory resource allocations (for thezoekt-indexserver
pod, allocate enough memory to hold all non-binary files in your largest repository; for thezoekt-webserver
pod, allocate enough memory to hold ~2.7x the size of all non-binary files in your repositories)symbols
CPU/memory resource allocationsgitserver
CPU/memory resource allocations (allocate enough memory to hold your Git packed bare repositories)
Configuring faster disk I/O for caches
Many parts of Sourcegraph's infrastructure benefit from using SSDs for caches. This is especially important for search performance. By default, disk caches will use the Kubernetes hostPath and will be the same IO speed as the underlying node's disk. Even if the node's default disk is a SSD, however, it is likely network-mounted rather than local.