Page MenuHomePhabricator

References in headers are not stripped anymore for search
Closed, ResolvedPublic

Description

Cirrus Search is supposed to remove references from headings, however now it does not work anymore, witnessed by:

http://cirrustest-cirrus-browser-bot.wmflabs.org/wiki/HasHeadingsWithReference?action=cirrusDump

Event Timeline

Restricted Application added projects: Discovery, Discovery-Search. · View Herald TranscriptApr 14 2016, 6:03 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Looks like this code is to blame in PageDataBuilder.php:

			$heading = preg_replace( '/<sup>\s*\[\s*\d+\s*\]\s*<\/sup>/', '', $heading );

Change 283495 had a related patch set uploaded (by Smalyshev):
Use HtmlFormatter to strip tags

https://gerrit.wikimedia.org/r/283495

Change 283495 merged by jenkins-bot:
Fix reference handling

https://gerrit.wikimedia.org/r/283495

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 11:02 PM
Deskana closed this task as Resolved.Apr 21 2016, 10:13 PM
Deskana assigned this task to Smalyshev.
Deskana triaged this task as Normal priority.
Deskana moved this task from in progress to Done on the Discovery-Search (Current work) board.
Deskana added a subscriber: Deskana.

This is done. Updating task information for posterity.