Data movement for Google services at Netflix

  • Application owners and data scientists were managing security credentials and reusing code to authenticate with Google. They would otherwise have to set up separate GCP cloud projects to authenticate their projects. This made the bar higher for transferring data between Google services and our data warehouse.
  • There was also a huge observability gap about what data was moving from the Data warehouse to Google and vice versa.

Google suite client architecture

Google’s suite of services discussed here include Docs, Slides, Drive and Sheets. Google provides REST APIs for each of these services and SDKs for different languages which have APIs to easily convert user intent into the corresponding REST calls.

Fig 1: Google suite architecture with a Python client accessing Drive APIs and Java accessing Sheets.

Google proxy

Let us see how we can add a layer between Google services and applications accessing these services. We wanted to be able to touch a minimal amount of client code and have users be completely abstracted away from the fact that they are actually not talking to Google directly. We have a Spring boot application in Java which depends on the Google API client jar. This application acts as a proxy for all of Google services which need to be accessed from within Netflix. Google API clients rely on HTTP headers and body information being populated correctly in the request. We make use of service accounts provided by Google Cloud in order to securely talk to Google. Google Cloud provides a way to link an email address to a service account. The workflow to on board a user is as follows -

  1. The user shares the concerned document with the email address associated with a service account.
  2. The application running inside the Netflix ecosystem has credentials managed by an internal tool called Metatron. This tool also manages credentials on the proxy.
  3. When the client makes a request to modify (or read) a resource on Google the proxy receives the request, verifies the credentials and makes the request to Google on behalf of the client. Results of the request are then returned back to the client.

Client

The Google SDK works by providing hooks which convert user intent to HTTP calls for the REST endpoints provided by Google. This way different languages can just work on translating the intent to HTTP calls and the server would always have a single way of handling clients irrespective of the language that was used.

Fig 2: Google client SDK
Fig 3: Proxy service

Data movement job

We wanted to make data transfer between our data warehouse and external Google services very easy for anybody within Netflix to use. So, we built this into our data portal where people can access tables in Iceberg using queries from Presto or Spark. When people query data from tables we provide an easy way to set up a job which can be scheduled to run at a specified cadence so that they can keep the data in the table refreshed in a particular Google Sheet.

Lineage

This system has been used extensively in our production environment. We currently have around 1 million transactions per week (and rising). In this system we’re moving data between organizations and would need to have visibility about the type of data and the direction of movement. We built this into the proxy so that we have an audit trail.

Fig 4: Lineage logging
  • Source (could be the source table, Google Sheet etc)
  • Destination (destination table, Google Sheet etc.)
  • User / Application information (captured from the Metatron authentication context)

Production

We’ve been running our proxy in production for more than one and a half years with more than 500 scheduled jobs which move data. Along with this, we have services using the proxy in real time to get or post information from Google services. We’re managing more than 20,000,000 objects with this system and this is growing as more titles are handled by our studio organization.

Stay Tuned

We’re looking to build more into the architecture discussed here. For example, the current system does not provide a way for applications to restrict access to the objects at an application level and we’re looking into what would be the best way to add that capability. Please post your comments below and stay tuned for updates on how we’re handling problems in the Data Platform organization within Netflix.

Acknowledgements

We would like to thank the following persons and teams for contributing to the Google Proxy service: Data Integrations Platform team (Andreas Andreakis, Yun Wang), Production Foundations Engineering (Sasha Joseph), Content Engineering (Ismael Gonzalez), Data Science and Engineering (Dao Mi, Girish Lingappa), Data Engineering Infrastructure(Ajoy Majumder), Information Security (Rob Cerda, Spencer Varney) and Jordan Carr.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Netflix Technology Blog

Netflix Technology Blog

323K Followers

Learn more about how Netflix designs, builds, and operates our systems and engineering organizations