Page MenuHomePhabricator

[Bug] No page/summary extract content for specific article
Closed, ResolvedPublic

Description

Steps to reproduce

  1. Visit https://en.wikipedia.org/wiki/Intellectual_capital.
  2. Hover over "balance sheets".

Expected results

  • Preview is shown

Actual results

  • No preview is shown because no extract is available from RESTBase. The extract is available from MediaWiki.

Environments observed

  • Browser version: Chromium v74.0.3729.169
  • OS version: Ubuntu v19.04
  • Device model: Desktop
  • Device language: English
RESTBase
https://en.wikipedia.org/api/rest_v1/page/summary/Financial_statement

{
  "type": "standard",
  "title": "Financial statement",
  "displaytitle": "Financial statement",
  "namespace": {
    "id": 0,
    "text": ""
  },
  "wikibase_item": "Q192907",
  "titles": {
    "canonical": "Financial_statement",
    "normalized": "Financial statement",
    "display": "Financial statement"
  },
  "pageid": 93755,
  "thumbnail": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Ledger.png/320px-Ledger.png",
    "width": 320,
    "height": 168
  },
  "originalimage": {
    "source": "https://upload.wikimedia.org/wikipedia/commons/e/e6/Ledger.png",
    "width": 1706,
    "height": 896
  },
  "lang": "en",
  "dir": "ltr",
  "revision": "891354485",
  "tid": "e52c0a10-6321-11e9-bf14-fd0cdf6cc1d9",
  "timestamp": "2019-04-07T11:34:23Z",
  "description": "formal record of the financial activities and position of a business, person, or other entity",
  "content_urls": {
    "desktop": {
      "page": "https://en.wikipedia.org/wiki/Financial_statement",
      "revisions": "https://en.wikipedia.org/wiki/Financial_statement?action=history",
      "edit": "https://en.wikipedia.org/wiki/Financial_statement?action=edit",
      "talk": "https://en.wikipedia.org/wiki/Talk:Financial_statement"
    },
    "mobile": {
      "page": "https://en.m.wikipedia.org/wiki/Financial_statement",
      "revisions": "https://en.m.wikipedia.org/wiki/Special:History/Financial_statement",
      "edit": "https://en.m.wikipedia.org/wiki/Financial_statement?action=edit",
      "talk": "https://en.m.wikipedia.org/wiki/Talk:Financial_statement"
    }
  },
  "api_urls": {
    "summary": "https://en.wikipedia.org/api/rest_v1/page/summary/Financial_statement",
    "metadata": "https://en.wikipedia.org/api/rest_v1/page/metadata/Financial_statement",
    "references": "https://en.wikipedia.org/api/rest_v1/page/references/Financial_statement",
    "media": "https://en.wikipedia.org/api/rest_v1/page/media/Financial_statement",
    "edit_html": "https://en.wikipedia.org/api/rest_v1/page/html/Financial_statement",
    "talk_page_html": "https://en.wikipedia.org/api/rest_v1/page/html/Talk:Financial_statement"
  },
  "extract": "",
  "extract_html": ""
}
MediaWiki
https://en.wikipedia.org/w/api.php?action=query&format=json&prop=info%7Cextracts%7Cpageimages%7Crevisions%7Cinfo&formatversion=2&redirects=true&exintro=true&exchars=525&explaintext=true&piprop=thumbnail&pithumbsize=640&pilicense=any&rvprop=timestamp&inprop=url&titles=Financial_statement&smaxage=300&maxage=300&uselang=content&origin=*

{
  "batchcomplete": true,
  "query": {
    "normalized": [
      {
        "fromencoded": false,
        "from": "Financial_statement",
        "to": "Financial statement"
      }
    ],
    "pages": [
      {
        "pageid": 93755,
        "ns": 0,
        "title": "Financial statement",
        "contentmodel": "wikitext",
        "pagelanguage": "en",
        "pagelanguagehtmlcode": "en",
        "pagelanguagedir": "ltr",
        "touched": "2019-06-06T10:13:30Z",
        "lastrevid": 891354485,
        "length": 18764,
        "fullurl": "https://en.wikipedia.org/wiki/Financial_statement",
        "editurl": "https://en.wikipedia.org/w/index.php?title=Financial_statement&action=edit",
        "canonicalurl": "https://en.wikipedia.org/wiki/Financial_statement",
        "extract": "Financial statements (or financial reports) are formal records of the financial activities and position of a business, person, or other entity.\nRelevant financial information is presented in a structured manner and in a form which is easy to understand. They typically include four basic financial statements accompanied by a management discussion and analysis:\nA balance sheet or statement of financial position, reports on a company's assets, liabilities, and owners equity at a given point in time.\nAn income statement—or profit...",
        "thumbnail": {
          "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/e/e6/Ledger.png/640px-Ledger.png",
          "width": 640,
          "height": 336
        },
        "revisions": [
          {
            "timestamp": "2019-04-07T11:34:23Z"
          }
        ]
      }
    ]
  }
}

There have been no recent changes to the article.

Event Timeline

ovasileva triaged this task as Medium priority.Jun 11 2019, 1:43 PM

Thanks for the report, @Niedzielski. I'd call this a case of GIGO with the stray citation to a statistics article above the leading text, but I think we should investigate how TextExtracts is able to handle this case, and try to incorporate that.

@CKoerner_WMF could you please link to the reports of other similar cases that @Niedzielski mentioned? I don't believe that just purging the page would help in this case, which leads me to suspect that there might be something else going on separately.

Thanks, @Mholloway. I jumped to conclusions on this scenario being similar. These are the two reports which seem more like caching issues:

https://www.mediawiki.org/wiki/Topic:V0vpwesmmqofrl6w
https://it.wikipedia.org/wiki/Wikipedia:Officina#Anteprima_popup_anonimi

It turns out that TextExtracts isn't doing anything magical here; it just strips a couple more selectors that we don't but probably should.

Change 516550 had a related patch set uploaded (by Mholloway; owner: Michael Holloway):
[mediawiki/services/mobileapps@master] Summary: Strip additional classes in preprocessing

https://gerrit.wikimedia.org/r/516550

Change 516550 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Summary: Strip additional classes in preprocessing

https://gerrit.wikimedia.org/r/516550