Page MenuHomePhabricator

Measure current article text highlighting behavior
Closed, ResolvedPublic5 Estimated Story Points

Description

Measure:

  • How many readers come in from a Google search that drops them directly into a highlighted excerpt of an article
  • How many readers actively highlight article text on their own (NavigatorShare API?)

...for the purposes of understanding who's currently actively highlighting article text and/or seeing a Google highlight to inform the Share Highlight experiment

Test Kitchen docs

Requirements

  • Determine audience for experiment
    • audience buckets: enwiki, ambassador wikis, all-others
    • define a percentage to run on
  • Set up an experiment for data collection
  • Spec the data model for instrumentation
    • count search hits from a fragment (count only)
    • count text selection highlight completion events (count only; debounce at say 5 seconds)
    • any other information needed? (page size, bucketed; use same bucket scheme as previous experiments)
  • Instrument incoming fragment link check (when available) and heuristic (when not available)
  • Instrument text selection highlighting

QA notes

  • Must not break when experiment disabled
  • Should record the data as expected when experiment enabled
    • Note Firefox and Safari may have trouble detecting fragment links, this is a known limitation

Event Timeline

Here are some initial findings about what we can and can't measure in terms of text fragments in URLs:

Problem: Text fragment directives are not exposed by design

Text fragment directives (#:~:text=...) are not exposed to JavaScript. The browser strips the :~:text= portion from all URL-related APIs (location.hash, location.href, etc.) before any script executes. This is an intentional privacy decision in the spec — the concern is that fragment text may contain sensitive search terms.

There's also no server-side path here. Browsers treat the fragment directive the same as a regular anchor — it's never sent to the server, so it won't show up in request logs or Varnish.

The DOM APIs that seem like they might help don't:

  • document.fragmentDirective exists but returns an empty object — it's purely a feature-detection flag, not a way to inspect the active directive
  • document.querySelector(':target') only matches classic #id anchors, not text fragments
  • getComputedStyle(el, '::target-text') isn't supported — highlight pseudo-elements can't be inspected from JS
  • CSS.highlights only exposes highlights you've registered yourself, not the browser's internal text fragment highlight
Potential Solution: Heuristics-based approach

We don't need to know what was highlighted — we just want a general sense of how often readers are arriving on article pages with an active text fragment highlight. For that, we can combine a few indirect signals:

// Run early in page lifecycle
const hasFragmentSupport = !!document.fragmentDirective;
const scrolledOnLoad = window.scrollY > 0;
const noHashAnchor = !location.hash;
const fromSearchEngine = /google\.|bing\.|duckduckgo\./.test( document.referrer );

// Chromium-only: Performance API sometimes retains the full URL
// including the :~:text= directive. This is an acknowledged browser
// bug, not a spec feature — could be patched at any time.
let chromiumFragmentDetected = false;
try {
    const navEntry = performance.getEntriesByType( 'navigation' )[ 0 ];
    if ( navEntry && navEntry.name.includes( ':~:text=' ) ) {
        chromiumFragmentDetected = true;
    }
} catch ( e ) {}

const likelyTextFragment = hasFragmentSupport && scrolledOnLoad && noHashAnchor;

The strongest signal is scrollY > 0 with an empty location.hash early in the page lifecycle — if the browser scrolled the page before the user had a chance to, and there's no #section anchor in the URL, a text fragment match is the most likely explanation. Combining that with a search engine referrer check increases confidence further.

The Chromium Performance API leak gives us a direct detection path for Chrome/Edge users specifically, which is a large share of our traffic. But it's a bug, not a feature, so we shouldn't depend on it long-term.

This won't be a perfect count — we'll miss cases and occasionally get false positives — but it should be sufficient to establish a rough baseline for how common text fragment arrivals are, which is what we need to inform the Share Highlight experiment design.

Text selection and Navigator Share APIs

These are normal browser APIS that are fully accessible to us, so instrumenting anything here should be straightforward.

HSwan-WMF set the point value for this task to 5.Feb 11 2026, 5:52 PM

Change #1249424 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/ReaderExperiments@master] WIP Metrics module for share highlight experiment baseline

https://gerrit.wikimedia.org/r/1249424

Change #1251190 had a related patch set uploaded (by Bvibber; author: Bvibber):

[operations/mediawiki-config@master] Enable ReaderExperiments Share Highlight subfeature for metrics

https://gerrit.wikimedia.org/r/1251190

Change #1249424 merged by jenkins-bot:

[mediawiki/extensions/ReaderExperiments@master] Metrics module for share highlight experiment baseline

https://gerrit.wikimedia.org/r/1249424

Change #1251194 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/ReaderExperiments@wmf/1.46.0-wmf.18] Metrics module for share highlight experiment baseline

https://gerrit.wikimedia.org/r/1251194

Change #1251195 had a related patch set uploaded (by Bvibber; author: Bvibber):

[mediawiki/extensions/ReaderExperiments@wmf/1.46.0-wmf.19] Metrics module for share highlight experiment baseline

https://gerrit.wikimedia.org/r/1251195

Change #1251194 abandoned by Bvibber:

[mediawiki/extensions/ReaderExperiments@wmf/1.46.0-wmf.18] Metrics module for share highlight experiment baseline

Reason:

not needed, .18 is dead long live .19

https://gerrit.wikimedia.org/r/1251194

Change #1251190 merged by jenkins-bot:

[operations/mediawiki-config@master] Enable ReaderExperiments Share Highlight subfeature for metrics

https://gerrit.wikimedia.org/r/1251190

Change #1251195 merged by jenkins-bot:

[mediawiki/extensions/ReaderExperiments@wmf/1.46.0-wmf.19] Metrics module for share highlight experiment baseline

https://gerrit.wikimedia.org/r/1251195

Mentioned in SAL (#wikimedia-operations) [2026-03-12T22:32:28Z] <bvibber@deploy2002> Started scap sync-world: Backport for [[gerrit:1251190|Enable ReaderExperiments Share Highlight subfeature for metrics (T416945)]], [[gerrit:1251195|Metrics module for share highlight experiment baseline (T416945)]]

Mentioned in SAL (#wikimedia-operations) [2026-03-12T22:34:25Z] <bvibber@deploy2002> bvibber: Backport for [[gerrit:1251190|Enable ReaderExperiments Share Highlight subfeature for metrics (T416945)]], [[gerrit:1251195|Metrics module for share highlight experiment baseline (T416945)]] synced to the testservers (see https://wikitech.wikimedia.org/wiki/Mwdebug). Changes can now be verified there.

Mentioned in SAL (#wikimedia-operations) [2026-03-12T22:39:17Z] <bvibber@deploy2002> Finished scap sync-world: Backport for [[gerrit:1251190|Enable ReaderExperiments Share Highlight subfeature for metrics (T416945)]], [[gerrit:1251195|Metrics module for share highlight experiment baseline (T416945)]] (duration: 06m 49s)

Prod data QA

Instrumentation spec: https://docs.google.com/spreadsheets/d/1N32fMXbDNqzG0DNiZxP0RXmhiyuuxLdy7a_y2ffCnaQ/edit?gid=0#gid=0

mfossati@stat1009:~$ kafkacat -C -b kafka-jumbo1016.eqiad.wmnet:9092 -t eqiad.product_metrics.web_base -o 1000000 | jq 'select(.experiment.enrolled == "share-highlight-baseline")'

  • navigate-to-highlight
  • select select-text

Sampling

The amount of events for Indonesian and Vietnamese Wikipedias has been tiny in a few days span. The experiment is set to run for a couple of weeks, so I suggest to raise the sampling rate.

Sampling rate has been updated:

  • 10% on ar/id/vi/fr/zh
  • 0.1% (max) for en

Actions per target wiki ordered by total unique sessions as per active_browsing_session_token (i.e., mw.eventLog.id.getSessionId(), see https://wikitech.wikimedia.org/wiki/Data_Platform/Sessions#Web):

SELECT mediawiki.database AS wiki, action, action_subtype, COUNT(DISTINCT performer.active_browsing_session_token) AS session_count
FROM event.product_metrics_web_base
WHERE year = 2026
AND experiment.enrolled = 'share-highlight-baseline'
AND instrument_name = 'ShareHighlightInstrument'
GROUP BY wiki, action, action_subtype
ORDER BY wiki, session_count DESC

+------+---------------------+--------------+-------------+
|wiki  |action               |action_subtype|session_count|
+------+---------------------+--------------+-------------+
|arwiki|navigate-to-highlight|heuristic     |30285        |
|arwiki|select               |select_text   |9493         |
|arwiki|navigate-to-highlight|confirmed     |9249         |
|arwiki|select               |select_mixed  |538          |
|arwiki|select               |select_image  |8            |
|enwiki|navigate-to-highlight|heuristic     |11740        |
|enwiki|select               |select_text   |4276         |
|enwiki|navigate-to-highlight|confirmed     |2405         |
|enwiki|select               |select_mixed  |109          |
|enwiki|select               |select_image  |9            |
|frwiki|navigate-to-highlight|heuristic     |103232       |
|frwiki|select               |select_text   |31797        |
|frwiki|navigate-to-highlight|confirmed     |30497        |
|frwiki|select               |select_mixed  |1317         |
|frwiki|select               |select_image  |47           |
|idwiki|navigate-to-highlight|heuristic     |13957        |
|idwiki|select               |select_text   |4277         |
|idwiki|navigate-to-highlight|confirmed     |3935         |
|idwiki|select               |select_mixed  |127          |
|idwiki|select               |select_image  |5            |
+------+---------------------+--------------+-------------+