Page MenuHomePhabricator

References in headers are not stripped anymore for search
Closed, ResolvedPublic

Description

Cirrus Search is supposed to remove references from headings, however now it does not work anymore, witnessed by:

http://cirrustest-cirrus-browser-bot.wmflabs.org/wiki/HasHeadingsWithReference?action=cirrusDump

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

Looks like this code is to blame in PageDataBuilder.php:

			$heading = preg_replace( '/<sup>\s*\[\s*\d+\s*\]\s*<\/sup>/', '', $heading );

Change 283495 had a related patch set uploaded (by Smalyshev):
Use HtmlFormatter to strip tags

https://gerrit.wikimedia.org/r/283495

Change 283495 merged by jenkins-bot:
Fix reference handling

https://gerrit.wikimedia.org/r/283495

Deskana assigned this task to Smalyshev.
Deskana triaged this task as Medium priority.
Deskana moved this task from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Deskana added a subscriber: Deskana.

This is done. Updating task information for posterity.