RecSysOps: Best Practices for Operating a Large-Scale Recommender System

Issue Detection

  • From the member perspective, the problem is pretty straightforward. If a member chooses an item that was not ranked high by the serving ranking model, it is a potential issue. Thus, monitoring and analyzing these cases is important to identify problems and are also a great source of inspiration for future innovations.
  • From the items’ perspective we need to make sure to engage with teams responsible for items and understand their concerns. In the case of Netflix, these teams indicated concerns about proper item cold-starting and potential production bias. These are both active research areas in the RecSys community, but to start with we helped those teams define metrics around their concerns and build tools to monitor them. We also helped them provide insight into whether or not those problems were occurring on a per-item basis. We later integrated those tools directly into our issue detection component. This enabled us to 1) expand the issue detection coverage and 2) proactively address key issues related to items and build trust with our stakeholders.

Issue Prediction

Issue Diagnosis

Issue Resolution

  • Is it possible to detect the issue faster? or maybe predict it?
  • Is it possible to improve our tools to diagnose the issue faster?
  • Make sure that checks in detection or prediction components are running on a regular automated basis.
  • If human judgment is needed at some step, e.g. diagnosis or resolutions, make sure that person has all required information ready. This will enable them to make informed decisions quickly
  • Make sure that deploying a hotfix is as simple as a couple of clicks


In this blog post we introduced RecSysOps with a set of best practices and lessons that we’ve learned at Netflix. RecSysOps consists of four components: issue detection, issue prediction, issue diagnosis and issue resolution. We think these patterns are useful to consider for anyone operating a real-world recommendation system to keep it performing well and improve it over time. Developing such components for a large-scale recommendation system is an iterative process with its own challenges and opportunities for future work. For example, different kinds of models may be needed for doing issue detection and prediction. For issue diagnosis and resolution, a deeper understanding of ML architectures and design assumptions is needed. Overall, putting these aspects together has helped us significantly reduce issues, increased trust with our stakeholders, and allowed us to focus on innovation.


  • [1] Eric Breck, Shanqing Cai, Eric Nielsen, Michael Salib, and D. Sculley. 2017. The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction. In Proceedings of IEEE Big Data.Google Scholar
  • [2] Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Advances in Neural Information Processing Systems 30, I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett(Eds.). Curran Associates, Inc., 4765–4774.
  • [3] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. ”Why Should I Trust You?”: Explaining the Predictions of Any Classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (San Francisco, California, USA) (KDD ’16). Association for Computing Machinery, New York, NY, USA, 1135–1144



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Netflix Technology Blog

Netflix Technology Blog


Learn more about how Netflix designs, builds, and operates our systems and engineering organizations