Page MenuHomePhabricator

API returns error messages in place destined for end user
Closed, InvalidPublic

Description

I believe to have found a bug that hasn't been reported before: the API seemingly returns error messages in its response at an improper place. The query

https://de.wikipedia.org/w/api.php?action=query&format=json&prop=coordinates%7Cpageimages%7Cextracts&colimit=10&pithumbsize=100&pilimit=1&explaintext=1&exintro=1&exlimit=1&pageids=5780474

responds with the body

{
  "batchcomplete": "",
  "query": {
    "pages": {
      "5780474": {
        "pageid": 5780474,
        "ns": 0,
        "title": "Psychologische Hochschule Berlin",
        "coordinates": [
          {
            "lat": 52.51280975,
            "lon": 13.4157896,
            "primary": "",
            "globe": "earth"
          }
        ],
        "thumbnail": {
          "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/99/Psychologische_Hichschule_Berlin_%28PHB%29_am_K%C3%B6llnischen_Park_2.jpg/100px-Psychologische_Hichschule_Berlin_%28PHB%29_am_K%C3%B6llnischen_Park_2.jpg",
          "width": 100,
          "height": 67
        },
        "pageimage": "Psychologische_Hichschule_Berlin_(PHB)_am_Köllnischen_Park_2.jpg",
        "extract": "Vorlage:Infobox Hochschule/Logo fehltVorlage:Infobox Hochschule/Mitarbeiter fehlt\n\nDie Psychologische Hochschule Berlin (PHB) wurde 2010 vom Berufsverband Deutscher Psychologinnen und Psychologen gegründet und ist im Haus der Psychologie am Köllnischen Park in Berlin-Mitte untergebracht. Sie wurde am 5. Mai 2010 vom Berliner Senat für Bildung, Wissenschaft und Forschung als nichtstaatliche Hochschule staatlich anerkannt und startete ihren Lehrbetrieb zum Wintersemester 2010/2011."
      }
    }
  }
}

The extract property starts with Vorlage:Infobox Hochschule/Logo fehltVorlage:Infobox Hochschule/Mitarbeiter fehlt, which appear to be error messages about missing stuff in the infobox. I believe these error messages should not be at this place, much like they don't show up when the user visits the https://de.wikipedia.org/wiki/Psychologische_Hochschule_Berlin page.

Event Timeline

Robert-RtC3V raised the priority of this task from to Needs Triage.
Robert-RtC3V updated the task description. (Show Details)
Robert-RtC3V added a project: MediaWiki-API.
Robert-RtC3V added a subscriber: Robert-RtC3V.
Restricted Application added subscribers: StudiesWorld, Aklapper. · View Herald TranscriptNov 26 2015, 12:11 PM
Robert-RtC3V renamed this task from API returns error messages in to API returns error messages in place destined for end user.Nov 26 2015, 12:17 PM
Robert-RtC3V set Security to None.
Anomie added a subscriber: Anomie.

If you check the source of the page, you'll see

<p><span style="display: none;"><a href="/w/index.php?title=Vorlage:Infobox_Hochschule/Logo_fehlt&amp;action=edit&amp;redlink=1" class="new" title="Vorlage:Infobox Hochschule/Logo fehlt (Seite nicht vorhanden)">Vorlage:Infobox Hochschule/Logo fehlt</a></span><span style="display: none;"><a href="/w/index.php?title=Vorlage:Infobox_Hochschule/Mitarbeiter_fehlt&amp;action=edit&amp;redlink=1" class="new" title="Vorlage:Infobox Hochschule/Mitarbeiter fehlt (Seite nicht vorhanden)">Vorlage:Infobox Hochschule/Mitarbeiter fehlt</a></span></p>

i.e. the text is there, just hidden with CSS. The TextExtracts extension is picking up this text when generating its extracts.

TheDJ added a subscriber: TheDJ.Nov 26 2015, 1:31 PM

fehlt... Sounds like "tracking" links to find back errors in templates. Which is dandy, but display:none is not enough to deal with such pieces of content.

Define a semantic class, and make sure that such a class is filtered by the various tools (as english wikipedia uses the metadata class for instance). In this case, probably even the pre existing semantic class "error" would work I guess, if you keep the "display:none;" in addition to the class....

Jhernandez triaged this task as Medium priority.Nov 26 2015, 6:09 PM
Jhernandez added a subscriber: Jhernandez.

While this has not been fixed, is there any workaround? Getting HTML instead of plain text, i.e. removing the explaintext=1 parameter (https://de.wikipedia.org/w/api.php?action=query&format=json&prop=coordinates|pageimages|extracts&colimit=10&pithumbsize=100&pilimit=1&exintro=1&exlimit=1&pageids=5780474), does not help.
It results in

[...]"extract": "<p><span>Vorlage:Infobox Hochschule/Logo fehlt</span><span>Vorlage:Infobox Hochschule/Mitarbeiter fehlt</span></p>\n\n\n[...]

where the span lacks the style="display: none;" attribute.

While this has not been fixed, is there any workaround?

Looking at the source of the extension, I think that a potential workaround might be: Always make sure that "tracking links" are wrapped inside a <div> element (and not just a <span> element).

This should be done for other reasons anyway (styling issues with margins).

Jhernandez updated the task description. (Show Details)Jun 22 2016, 4:46 PM

Note: In the meantime, I have added the workaround from T119702#2188976 to the template in question, so error messages from this specific template will not appear in TextExtracts anymore.

In general, putting such "tracking" links into templates is a very popular practice at dewiki (it is even documented on a help page), and I don't think it is realistic to assume that this will stop anytime soon.

Jdlrobson closed this task as Invalid.Jun 15 2017, 9:34 PM
Jdlrobson added a subscriber: Jdlrobson.

Solution is to mark this up with the appropriate class. See https://www.mediawiki.org/wiki/Extension:TextExtracts#FAQ