Words Matter: Testing Copy With Shakespeare

The Backstory

Since the DVD days, Netflix’s success has been shaped by product innovation validated by a/b testing — experiments where two or more variants of an experience are shown to users at random and statistical analysis determines which variation performs better for a given goal. Our UI is constantly evolving, and the experience you get today may not be the same one you find tomorrow.

  1. Copy others?
  2. Go with whatever we hear about most on Twitter?
  3. Hire an expert and let them decide?
  4. Let leadership make the call?
  5. Vote on what we think will work best?

Introducing Shakespeare

To figure out a solution, the Internationalization team started by closely examining how copy testing was handled. Major pain points included:

  • Every test required non-trivial engineering support.
  • Tests weren’t easy to set up, configure or clean up afterwards.
  • Tests took days to weeks to deploy.
  • We didn’t have a uniform way to test across platforms.
  • There was no easy way to test localized or transcreated copy.
  • We lacked the infrastructure for continuous explore/exploit tests (for which real-time analysis detects winning versions early and then allocates more users to those cells vs. traditional, equally distributed a/b tests).
  • Engineering dependencies are greatly reduced.
  • Tests are easier to set up, configure and clean up.
  • We have the ability to make real-time copy updates.
  • We’re able to consistently test across platforms.
  • Localized or transcreated copy can be tested independent from the English source.
  • We have the option for continuous explore/exploit.

Building Shakespeare

To create Shakespeare, it was essential that we start with an understanding of how strings are stored and fetched.

String storage

From an engineer’s code, strings are sent to a message repo that’s linked to our translation tool.

String fetching

From the message repo, a centralized platform service layer acts as an agent between the client apps. Since the platform service layer is a simple pass-through, the client apps need to retain the test-copy to test-cell mapping information.

Improving the Process

We designed Shakespeare to abstract the manual steps and business logic. The person running the Shakespeare test just needs to tell the system where to find the production copy and enter the test variants through the Shakespeare Web UI. From there, Shakespeare automatically takes care of the mapping between test copy and test cell. (Note that Shakespeare can only be used to test new versions of existing UI strings, when there’s no other design variant.)

Setting up a Shakespeare test

To run a copy test, the Content Designer, Language Manager or PM simply enters a pointer to the production copy and then enters the various copy variants.

Establishing test copy with a/b test-cell mappings

The other significant question we needed to answer was how to most effectively handle the mappings between the tests cells and test copy. We opted to designate that responsibility to a rules engine. Once the copy-test runner defines the copy variant for each a/b test-cell number, the Shakespeare Web UI saves the mapping information and passes that info down to the rules engine for cloud data publishing.

How strings are fetched with Shakespeare

The Shakespeare API examines a/b allocation for the Netflix user and retrieves the correct copy for that user based on their cell allocation and rules mapping.

Continuous Integration (CI) and Continuous Delivery/Deployment (CD)

The Shakespeare mapping rules are built, validated and deployed continuously, in real time.

In summary

To create Shakespeare, we took it step-by-step.

  • The Shakespeare Web UI makes it easy to enter copy variants.
  • A rules engine extracts test-cell copy mapping logic.
  • A data-subscription service handles rules distribution.
  • Our proprietary tool ABlaze allocates tests.
  • Shakespeare returns real-time user test-cell examination and copy override.
  • Continuous Integration and Continuous Delivery/Deployment provides easy integration and real-time deployment.

The Content Design Perspective

The lighter-weight copy testing made possible by Shakespeare provides more user insights into the copy we’re creating. Language-focused areas of testing that our Content Designers, Language Managers and others have explored or are planning to explore include:

  • Word choice for microcopy. Even the smallest change can have a huge impact.
  • Tone. Our voice attributes are Helpful, Warm, Playful, Relevant and Provocative. When should we lean into different tones? Shakespeare is helping us find out.
  • Global relevance. Sometimes a language hypothesis created in Silicon Valley or L.A. doesn’t resonate in other areas of the world and feels more natural when it’s customized to the market.
  • UX best practices. By adding step numbers to the copy in our onboarding flow, we were able to increase the completion rate because people knew how many steps to expect.
  • Style. We avoid all caps because they can feel shouty, but is there ever an exception to this rule?
  • Clarity. Is there a simpler, more intuitive or more inclusive way to explain something?
  • Context awareness. When the coronavirus pandemic began, we were able to quickly modify the text in our sign-up flow, since the “before” version felt tone-deaf in light of travel restrictions and more time spent at home.

Beyond the Metrics

Besides metric and UX wins, a bonus Shakespeare benefit is the way it’s brought together Engineers, Content Designers, Globalization experts, PMs, Data Scientists and other cross-functional partners in new and unexpected ways. As Netflix has grown its membership to 200 million global members and counting, it’s more important than ever to represent diverse perspectives in our product — including with the people who are building it.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Netflix Technology Blog

Netflix Technology Blog

322K Followers

Learn more about how Netflix designs, builds, and operates our systems and engineering organizations