Friday, November 22, 2024

PlatformCon: How Spotify Manages Infrastructure with GitOps

Must read

For Spotify, speeding up feature development and deployment all comes down to templates and pipelines.

The release cycle is almost completely automated. “The only people left in the process are developers,” said Tim Hansen, senior engineer, working on Backstage, at music-sharing service Spotify, speaking at this week’s PlatformCon 2024 virtual conference. “This allows us to release continuously release thousands of microservices, many times a day.”

The recipe comes from GitOps, an implementation of infrastructure as code (IaC). With GitOps, system configuration information, usually stored in YAML or as flat text files, is kept in a source code repository, most often git or git-base code-hosting service, where it then can be called by a provisioning tool for deployment.

An agent then continually compares the “desired state,” as outlined in the configuration files, with the actual running state, and applies any changes needed to synchronize the two.

“By having everything declarative in code, and having standard technology stacks, [the developer] basically gets infrastructure for free,” he said. “And if you need to make modifications, it’s easy.”

Built on GitOps

A GitOps system requires three aspects to work, Hansen explained:

  • A desired state of the system should be defined declaratively.
  • Each new state should have a different version number and be immutable (with any changes requiring a new version number)
  • Changes to the production environment are applied automatically

In a typical developer environment, developers make changes to code and check them into source control, and the code gets compiled and built with Continuous Integration (CI) tools. But if the resource needs new resources, or requires changes in a database schema, it is handled by an administrator. Manual testing and decommissioning the previous release may also be part of the process.

“So this process is a bit slow, and there are a few people involved here,” Hansen said. This process is also difficult to scale.

Spotify’s process is a bit more streamlined. The development process is fairly routine, following the steps above, but cloud resources are automatically synced, based on declarations of code. Database changes are based on code migrations. If there are new requirements for Kubernetes pods, those are based on code declarations as well.

Automating the infrastructure steps automates and speeds the release cycle. And git provides the review process, versioning, history and a rollback mechanism.

Stepping Through Build Process

Spotify created its own build system, called Tingle.  It was built in-house, through works similar to GitHub Actions. The entire state of the site build is stored declaratively, Hansen explained. Tingle uses GitHub webhooks to set off new builds, once a change is made to code in the repository.

Tingle has two sets of pipelines, one for a review build and one for a master build. There is a YAML-based pipeline process for each process, where the build tools — such as Maven, are specified.

Pipeline definitions can be quite large and copied across multiple locations, thanks to the company’s microservices architecture. So the Tingle agent uses a smaller template file that refers back to the original, which is consulted by the build process. Some apps need a full configuration, though most can be stood up with just the template itself.

“This is a superpower for GitOps: Abstraction and simplification,” Hansen said.  “By using GitOps and standard text stacks, we can make builds absolutely brainless for developers.”

While Tingle was built in house, there are plenty of other tools available to run your own internal build pipelines, including GitLab CI, Google Cloud Build, Tekton Pipelines, Argo Workflows and Dagger.

What Is GitHub’s Deployment Process?

Tugboat logo

For deployment, Tingle kicks off a central deployment system called Tugboat.  A custom YAML file once again establishes a set of defaults, making it easy to define powerful features. For a backend service that relies on Kubernetes, Tugboats seeks out a root-level Kubernetes folder in the source code repository, which contains standard Kubernetes instructions for services, deployments and custom definitions.

“So here we are not only storing instructions on how to deploy with git, but also declarations of what to deploy,” Hansen said.

Some GitOps-friendly automated infrastructure provisioning tools include Terraform (and its open source equivalent OpenTofu), Google Config Connector, Crossplane and Pulumi.

For monitoring, Spotify uses Grafana. Once again, it is deployed centrally by a declarative language, and it is triggered by Tingle. A set of templates defines graphs that can be used to visually display usage on Grafana dashboards. Alerts are also pre-configured for new apps coming online.  The production team can update a Grafana setting once for all the different applications, simply by updating the template itself.

Each service gets its own set of metadata, describing the deployment and other services, such as monitoring, that are required. This data is held by Backstage, an open source portal for developers originally developed by Spotify.

A Spotify software catalog entry for Backstage.

A Spotify software catalog entry for Backstage.

“The software catalog stories information about the service, along with information about relationships the service has about other software and resources,” Hansen said. “This means our catalog is not just a list of services, but a graph. ”

Backstage provides a single plane of glass for all of Spotify’s developers. Much like with the build and deployment process, Spotify provides a set of templates, also defined in git, for easily creating a new application. They are pre-populated with the templates for the build and deployment process. In addition to build and deployment templates, the project receives an entry in the software catalog and monitoring and alerting services are integrated in as well.

“By using a customized Template on Backstage, and by filling out a few fields, a developer at Spotify can create a service for Spotify with just a few clicks,” Hansen said. “They never have to touch the YAML.”

Spotify has since donated Backstage to the Cloud Native Computing Foundation, and is now used by thousands of developers globally.

Enjoy the video of the full talk here:

Group Created with Sketch.

Latest article