Modernizing the Netflix TV UI Deployment Process


At Netflix, we are always looking for ways to improve our member experience. We discovered that while we were developing the TV UI at a great velocity, the real bottleneck was in the canary deployment process of the UI. Let’s take a look at our existing ESN-based canary approach.

First of all, what is an ESN?

An ESN, or Netflix Electronic Serial Number, is a globally-unique identifier for each device. Even if you have two of the same TVs at home: same brand, model and year, and they both have the Netflix TV app installed, they do not have the same ESN.

Illustration of 2 TVs — each has a unique ESN

How do ESN-based canaries work?

Example Device buckets

Problems with ESN-based canaries

Example resulting device bucket after hashing algorithm is applied to ESN
Legacy TV UI deployment workflow

The Journey

We had several goals in mind when we started looking at an overhaul of the process. The must-haves:

What is Murphy?

Murphy is an internal framework for automating the delivery of Netflix client applications. Murphy runs as a Titus service, is composable, pluggable and testable. As a client team, we are able to automate all of the tasks that were previously done via Jenkins by writing plugins. Each plugin is a Javascript class that accomplishes a unit of work. Plugins may be specific to a project (e.g. computing metadata to be used in a deployment) or generally useful (e.g. enabling posting messages to a slack channel). Most importantly, Murphy provides libraries which abstract the backend ABlaze (our centralized A/B testing platform) interactions, which makes A/B-test-based canaries possible.

Murphy Framework

How do we leverage Murphy?

Each project defines a config that lists which plugins are available to its namespace (also sometimes referred to as a config group). At Murphy server runtime, an action server will be created from this config to handle action requests for its namespace. As mentioned in the previous section, each Murphy plugin represents the automation of a unit of task. Each unit of task is represented as an action, which is simply a Murphy client command. Actions run inside isolated Titus containers, which get submitted to the TVUI Action Server. Our deployment pipeline leverages Spinnaker to chain these actions together, which can be configured to automatically perform retries on Titus jobs to minimize any potential infrastructure impact.

Interactions between Murphy Plugins and Murphy TVUI Action Server
Sample Murphy plugin handler request and response

Deployment Workflow Improvements

Our deployment slack channel has become the single source of truth for tracing our deployment process since the adoption of Murphy. Slack notifications are posted by our custom slack bot (you guessed it, it’s called MurphyBot). MurphyBot posts a slack message to our deployment slack channel when the canary deployment begins; the message also contains the link to the Spinnaker deployment pipeline, as well as the link to rollback to the previous build. Throughout the deployment process, it keeps updating the same slack thread with links to the ACA reports and deployment status.

Sample Deployment Rollout Slack Message
Sample Deployment Slack Thread

What about A/B-test-based canaries?

A/B-test-based canaries have unlocked our ability to perform an “apples-to-apples” comparison of the baseline and canary builds. Users allocated to test cell 1 receive the baseline build, while users allocated to test cell 2 receive the canary build. Leveraging the power of our ABlaze platform, we are now confident that the population of cell 1 and cell 2 are close to identical in terms of their device representations across cells.

ACA report showed canary build had an elevated Javascript Exception count than the baseline
ACA report showed canary build had an elevated App Memory usage than the baseline

The Future

In engineering, we always strive to make a good process even better. In the near future, we plan to explore the idea of device cohorts in our ACA reports. However, there will inevitably be new devices that Netflix wants to support and older devices that have such a low volume of traffic that become hard to monitor statistically. We believe that grouping and monitoring devices with similar configurations and operating systems is going to provide better statistical power than monitoring individual devices alone (there are only so many of those “signature” devices that we can keep track of!). An example device cohort, grouped by operating system would be “Android TV devices”, and another example would be “low memory devices” where we would be monitoring devices with memory constraints.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Netflix Technology Blog

Netflix Technology Blog

Learn more about how Netflix designs, builds, and operates our systems and engineering organizations