Page MenuHomePhabricator

API returns error messages in place destined for end user
Closed, InvalidPublic

Description

I believe to have found a bug that hasn't been reported before: the API seemingly returns error messages in its response at an improper place. The query

https://de.wikipedia.org/w/api.php?action=query&format=json&prop=coordinates%7Cpageimages%7Cextracts&colimit=10&pithumbsize=100&pilimit=1&explaintext=1&exintro=1&exlimit=1&pageids=5780474

responds with the body

{
  "batchcomplete": "",
  "query": {
    "pages": {
      "5780474": {
        "pageid": 5780474,
        "ns": 0,
        "title": "Psychologische Hochschule Berlin",
        "coordinates": [
          {
            "lat": 52.51280975,
            "lon": 13.4157896,
            "primary": "",
            "globe": "earth"
          }
        ],
        "thumbnail": {
          "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Psychologische_Hichschule_Berlin_%28PHB%29_am_K%C3%B6llnischen_Park_2.jpg/100px-Psychologische_Hichschule_Berlin_%28PHB%29_am_K%C3%B6llnischen_Park_2.jpg",
          "width": 100,
          "height": 67
        },
        "pageimage": "Psychologische_Hichschule_Berlin_(PHB)_am_Köllnischen_Park_2.jpg",
        "extract": "Vorlage:Infobox Hochschule/Logo fehltVorlage:Infobox Hochschule/Mitarbeiter fehlt\n\nDie Psychologische Hochschule Berlin (PHB) wurde 2010 vom Berufsverband Deutscher Psychologinnen und Psychologen gegründet und ist im Haus der Psychologie am Köllnischen Park in Berlin-Mitte untergebracht. Sie wurde am 5. Mai 2010 vom Berliner Senat für Bildung, Wissenschaft und Forschung als nichtstaatliche Hochschule staatlich anerkannt und startete ihren Lehrbetrieb zum Wintersemester 2010/2011."
      }
    }
  }
}

The extract property starts with Vorlage:Infobox Hochschule/Logo fehltVorlage:Infobox Hochschule/Mitarbeiter fehlt, which appear to be error messages about missing stuff in the infobox. I believe these error messages should not be at this place, much like they don't show up when the user visits the https://de.wikipedia.org/wiki/Psychologische_Hochschule_Berlin page.

Event Timeline

Robert-RtC3V raised the priority of this task from to Needs Triage.
Robert-RtC3V updated the task description. (Show Details)
Robert-RtC3V subscribed.
Robert-RtC3V renamed this task from API returns error messages in to API returns error messages in place destined for end user.Nov 26 2015, 12:17 PM
Robert-RtC3V set Security to None.
Anomie subscribed.

If you check the source of the page, you'll see

<p><span style="display: none;"><a href="/w/index.php?title=Vorlage:Infobox_Hochschule/Logo_fehlt&amp;action=edit&amp;redlink=1" class="new" title="Vorlage:Infobox Hochschule/Logo fehlt (Seite nicht vorhanden)">Vorlage:Infobox Hochschule/Logo fehlt</a></span><span style="display: none;"><a href="/w/index.php?title=Vorlage:Infobox_Hochschule/Mitarbeiter_fehlt&amp;action=edit&amp;redlink=1" class="new" title="Vorlage:Infobox Hochschule/Mitarbeiter fehlt (Seite nicht vorhanden)">Vorlage:Infobox Hochschule/Mitarbeiter fehlt</a></span></p>

i.e. the text is there, just hidden with CSS. The TextExtracts extension is picking up this text when generating its extracts.

fehlt... Sounds like "tracking" links to find back errors in templates. Which is dandy, but display:none is not enough to deal with such pieces of content.

Define a semantic class, and make sure that such a class is filtered by the various tools (as english wikipedia uses the metadata class for instance). In this case, probably even the pre existing semantic class "error" would work I guess, if you keep the "display:none;" in addition to the class....

Jhernandez subscribed.

While this has not been fixed, is there any workaround? Getting HTML instead of plain text, i.e. removing the explaintext=1 parameter (https://de.wikipedia.org/w/api.php?action=query&format=json&prop=coordinates|pageimages|extracts&colimit=10&pithumbsize=100&pilimit=1&exintro=1&exlimit=1&pageids=5780474), does not help.
It results in

[...]"extract": "<p><span>Vorlage:Infobox Hochschule/Logo fehlt</span><span>Vorlage:Infobox Hochschule/Mitarbeiter fehlt</span></p>\n\n\n[...]

where the span lacks the style="display: none;" attribute.

While this has not been fixed, is there any workaround?

Looking at the source of the extension, I think that a potential workaround might be: Always make sure that "tracking links" are wrapped inside a <div> element (and not just a <span> element).

This should be done for other reasons anyway (styling issues with margins).

Note: In the meantime, I have added the workaround from T119702#2188976 to the template in question, so error messages from this specific template will not appear in TextExtracts anymore.

In general, putting such "tracking" links into templates is a very popular practice at dewiki (it is even documented on a help page), and I don't think it is realistic to assume that this will stop anytime soon.

Jdlrobson subscribed.

Solution is to mark this up with the appropriate class. See https://www.mediawiki.org/wiki/Extension:TextExtracts#FAQ