Page MenuHomePhabricator

Content intended to be hidden appears in text extract
Closed, ResolvedPublic

Description

Description

Steps to reproduce

  1. Go to the Bethe formula article.
  2. Tap on the speed of light link under the formula section,

Expected results

A summary for the speed of light article is shown.

Actual results

A summary is shown but it includes a black spade character.

Response

https://en.wikipedia.org/api/rest_v1/page/summary/Speed_of_light
{
  "title": "Speed of light",
  "extract": "The speed of light in vacuum, commonly denoted c, is a universal physical constant important in many areas of physics. Its precise value is 7008299792458000000♠299792458 metres per second (approximately 7008300000000000000♠3.00×108 m/s), since the length of the metre is defined from this constant and the international standard for time. According to special relativity, c is the maximum speed at which all matter and information in the universe can travel. It is the speed at which all massless particles and changes of the associated fields (including electromagnetic radiation such as light and gravitational waves) travel in vacuum. Such particles and waves travel at c regardless of the motion of the source or the inertial reference frame of the observer.",
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Earth_to_Sun_-_en.png/320px-Earth_to_Sun_-_en.png",
    "width": 320,
    "height": 181
  },
  "lang": "en",
  "dir": "ltr"
}
https://en.wikipedia.org/w/api.php?action=query&format=json&formatversion=2&prop=extracts%7Cpageimages&redirects=true&exsentences=5&explaintext=true&piprop=thumbnail%7Cname&pithumbsize=320&titles=Speed_of_light
{
  "batchcomplete": true,
  "query": {
    "normalized": [
      {
        "from": "Speed_of_light",
        "to": "Speed of light"
      }
    ],
    "pages": [
      {
        "pageid": 28736,
        "ns": 0,
        "title": "Speed of light",
        "extract": "The speed of light in vacuum, commonly denoted c, is a universal physical constant important in many areas of physics. Its precise value is 7008299792458000000♠299792458 metres per second (approximately 7008300000000000000♠3.00×108 m/s), since the length of the metre is defined from this constant and the international standard for time. According to special relativity, c is the maximum speed at which all matter and information in the universe can travel. It is the speed at which all massless particles and changes of the associated fields (including electromagnetic radiation such as light and gravitational waves) travel in vacuum. Such particles and waves travel at c regardless of the motion of the source or the inertial reference frame of the observer.",
        "thumbnail": {
          "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/2/2e/Earth_to_Sun_-_en.png/320px-Earth_to_Sun_-_en.png",
          "width": 320,
          "height": 181
        },
        "pageimage": "Earth_to_Sun_-_en.png"
      }
    ]
  }
}

Environments observed

Service version: deploy/2016-02-02/68e38ec
App version: 00a1c69
Android OS versions: API 23
Device model: Nexus 6P
Device language: English

Event Timeline

Niedzielski updated the task description. (Show Details)
Niedzielski raised the priority of this task from to Normal.
Niedzielski added a subscriber: Niedzielski.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 9 2016, 1:52 PM
Niedzielski updated the task description. (Show Details)Feb 10 2016, 8:17 PM
Niedzielski set Security to None.
bearND added a subscriber: bearND.

As @Niedzielski mentioned in the edit, the same happens with the api.php endpoint. The issue must be upstream in the TextExtracts functionality.

This is ultimately caused by the Val template inserting sort keys into the actual article text, wrapped in display:none spans that are then flattened to plain text when TextExtracts is called with explaintext=true.

Its precise value is <b><span class="nowrap"><span style="display:none" class="sortkey">7008299792458000000♠</span>299<span style="margin-left:
.25em;">792</span><span style="margin-left:.25em;">458</span>&#160;<a href="/wiki/Metres_per_second" class="mw-redirect" title="Metres per 
second">metres per second</a></span></b> (approximately <span class="nowrap"><span style="display:none" class="sortkey">7008300000000000000♠
</span>3.00<span style="margin-left:0.25em;margin-right:0.15em;">×</span>10<sup>8</sup>&#160;m/s</span>), since the length of the metre is 
defined from this constant and the <a href="/wiki/Second#International_second" title="Second">international standard for time</a>.
Jdlrobson added a subscriber: Jdlrobson.

Looks like something that will need to be fixed locally (on wiki).

Mholloway renamed this task from [Bug] Article summary shows encoding issue to [Bug] Content intended to be hidden appears in text extract.Sep 29 2016, 7:59 PM
Niedzielski renamed this task from [Bug] Content intended to be hidden appears in text extract to Content intended to be hidden appears in text extract.Nov 9 2016, 7:35 PM
Niedzielski added a project: Android-app-Bugs.
TheDJ added a subscriber: TheDJ.

Shouldn't we add the sortkey class to the list of elements that need to be stripped from extract ? wgExtractsRemoveClasses

We do the same for coordinates for instance.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptMar 25 2017, 3:30 AM

Change 344742 had a related patch set uploaded (by Kaldari):
[mediawiki/extensions/TextExtracts@master] Adding sortkey class to ExtractsRemoveClasses

https://gerrit.wikimedia.org/r/344742

Change 344742 merged by jenkins-bot:
[mediawiki/extensions/TextExtracts@master] Adding sortkey class to ExtractsRemoveClasses

https://gerrit.wikimedia.org/r/344742

phuedx assigned this task to bmansurov.Mar 29 2017, 10:33 PM
phuedx added a subscriber: phuedx.

At the very least, this can be verified tomorrow (Thursday, 30th) after the MediaWiki train rolls on by.

This appears to still be an issue on prod for both endpoint examples given. :/

This appears to still be an issue on prod for both endpoint examples given. :/

This appears to be a caching issue. I purged the cache on Speed of light and now the action API query no longer returns the extraneous data. This will slowly roll out as pages are edited or otherwise have their cache reset.

phuedx added a comment.EditedApr 3 2017, 9:00 AM

This appears to still be an issue on prod for both endpoint examples given. :/

Sorry. I {could,should}'ve explained this better. Thanks for clarifying the situation and providing steps for testing/sign off @Deskana!


I reacquainted myself with the extension so that I could add a little more detail if necessary. However, I discovered that:

  • Extracts are stored in memcache indefinitely until the associated page is touched; and
  • The cache keys that the extension uses don't vary with the ExtractsRemoveClasses config variable (or some notion of a version of the codebase).

Changes to the ExtractsRemoveClasses config variable won't be reflected until the page is touched or cache entries are evicted due to memory pressure. I'm not sure how often eviction occurs but it's not limited to those keys generated by TextExtracts and keys with finite expiries are evicted first. Simply put, I can't be sure extracts for long tail pages will be affected.

Matching the parser cache's TTL doesn't seem unreasonable.

bmansurov removed bmansurov as the assignee of this task.Apr 3 2017, 12:51 PM
bmansurov closed this task as Resolved.
bmansurov added a subscriber: bmansurov.

The immediate problem has been fixed. Feel free to create a separate task considering the second part of T126331#3150048.

phuedx added a subscriber: Legoktm.Jun 20 2017, 4:04 PM

Matching the parser cache's TTL doesn't seem unreasonable.

This was done in rETEX43f3539a7cea: Set an expiry for memcache entries. Thanks, @Legoktm!