Page MenuHomePhabricator

[SPIKE] Determine instrumentation requirements for A/B test section snippets
Closed, ResolvedPublic

Description

Background

Search engines have been experimenting with surfacing subsection links and snippets for structured pages. Their goals are to make it easier for users to find the most relevant section of a page for their search. We would like to test whether allowing our content to be displayed as snippets affects readership metrics

We would like to perform an A/B test across multiple languages. The A/B test will show the snippets on a certain set of pages (potentially 50%). We will study the results of the test to ensure that there are no other effects on pageviews or other important readership metrics.
Languages for the test:

  • English
  • Spanish
  • French
  • Portuguese
  • Arabic
  • Hindi
  • Bengali
  • Indonesian
  • Japanese
  • Russian

Hypotheses

Pages that are in the test group (which are marked to display snippets within Google search results) will have more pageviews than the control group from external search referrals for all languages studied
Overall pageviews in the test group will not be affected negatively for all languages studied

Questions to answer

Event Timeline

Big thanks to @Jdlrobson for pointing me in the right direction and pulling me out of the wrong rabbit hole - he provided the approaches to changing the robots meta tag and the code snippet below. And another big thanks to @phuedx for talking through the simplest approach for accessing the requested data.

What instrumentation is necessary to confirm the above hypotheses ?

Presumably we'll want to track the following metrics for this instrument:

  • the page-id of articles in the test and control groups
  • an indicator of whether or not the page contains these rich snippets (i.e. "max-snippet" meta tag)
  • an indicator that the page was accessed via an organic search referral
  • the organic search source

In order to collect this data, we'll need to bucket pages of the participating wikis according to the requested sampling rate. The related ticket in the description included support for a method and associated class that can hopefully be re-purposed in this case to add the max-snippet directive to the robots meta tag in the header section of test group pages.

As far as how to support adding this method back in (previous work was done in the Wikibase extension), I'm erring towards the 2nd of the following approaches:

  1. Add the PageSplitTester class (temporarily for the duration of the A/B test) to core (inside ../includes/page?) that can be imported for use in OutputPage.php along with the isSchemaTreatmentActiveForPageId method in a new temporary hook (or as a temporary method in OutputPage).
    • In OutputPage::getHeadLinksArray()(relevant line), isSchemaTreatmentActiveForPageId can be used to add the rich snippet:
$p = self::isSchemaTreatmentActiveForPageId() ?
  "{$this->mIndexPolicy},{$this->mFollowPolicy},max-snippet=-1" : 
  "{$this->mIndexPolicy},{$this->mFollowPolicy}";
    • Add a new instrumentation class in WikimediaEvents with a hook onOutputPageAfterGetHeadLinksArray to add max-snippet to the X-Analytics header (see below) for tracking this data point.
  1. Add a new instrumentation class in WikimediaEvents with the methods/properties of the PageSplitTester class and isSchemaTreatmentActiveForPageId to WikimediaEventsHooks.php (again temporarily).
    • Add hook onOutputPageAfterGetHeadLinksArray to the new class to change the value of $tags['meta-robots'] and append max-snippet to the content key as well as a true/false value for a new max-snippet key in the X-Analytics header (see below).

We can leverage similar config that was used in T208755 for sampling and bucketing.

For the eventlogging piece, we can take advantage of the X-Analytics special header and add a new key-value pair for tracking whether max-snippet has been added to the page. For bucketed test pages, this value will be true, false otherwise. This value will be available to the data analyst through the wmf.webrequest data stream when querying for pageviews. We'll want to document this on the X-Analytics Wikitech page.

Regarding the organic search referral properties, the data analyst can extract the needed info from the webrequest table using UDFs (user-defined functions). GetRefererSearchEngineUDF.java provides get_referer_search_engine which returns a classification string of a referer.

Can we do this with existing instrumentation?

We can hopefully re-use some of the historical code artifacts from https://gerrit.wikimedia.org/r/c/mediawiki/extensions/Wikibase/+/469926/ and from relevant tickets that were part of the epic T198970 to implement the instrument as proposed above.

No new schema for eventlogging should be necessary in this case if we can add a max-snippet key-value pair to the X-Analytics header.

Can we use a similar setup that we did with T206868: [Spike 24hrs] How do we measure the effects of the sameAs property on pageviews using an A/B test?

I do think we can bucket pages for this instrument similarly to how it was done previously as outlined in approaches above, but we'd have to do some work for the current state of core or WikimediaEvents to get the requested data points.


I'm fairly confident about the methods for adding the max-snippet directive since that came from Jon and the measurement piece since that came from Sam. More/other input is always welcome. Happy to defer to more experienced/knowledgeable opinions.

cjming updated the task description. (Show Details)
cjming subscribed.
cjming removed cjming as the assignee of this task.Feb 7 2022, 3:01 PM
cjming moved this task from Doing to Code Review on the Web-Team-Backlog (Kanbanana-FY-2021-22) board.

looks good @cjming! I think option "2" is the cleanest approach and fits well with the responsibilities of WikimediaEvents.

It looks like the meta-robots tag is conditional as I don't see it in the HTML of the Barack Obama page, but I'm guessing we can also add the meta tag using the onOutputPageAfterGetHeadLinksArray hook if its not there (or by using the addMeta method in OutputPage).

The x-analytics header is good to know about. I wasn't aware that existed 😊

Thank you @cjming and @nray! @cjming - would it be okay to assign this to you for setting up the implementation ticket for option 2?

would it be okay to assign this to you for setting up the implementation ticket for option 2?

sure thing

Somehow I missed T211262 which is an old proposal to move PageSplitTester to core. Presumably no longer relevant since we're deciding here to move that class into WikimediaEvents but wanted to note it here for posterity.