Page MenuHomePhabricator

Investigate references endpoint
Closed, ResolvedPublic5 Story Points

Description

Motivation
The references endpoint provides a structured output of reference lists found on a particular page, and also classifies footnotes into types (see T182652). Let's find out how the references endpoint can benefit reference previews.

Accceptance Criteria

  • It can be estimated how many points would be the switch to using the references endpoint
  • We know on what basis it is decided which type a reference has

Notes
In case of questions @bearND would be a great person to talk to :)

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 29 2019, 1:21 PM
Lea_WMDE updated the task description. (Show Details)Jan 29 2019, 2:36 PM
Lea_WMDE set the point value for this task to 5.

Some notes from today:

Change 490305 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Popups@master] [WIP] Add RESTbased gateway for reference previews

https://gerrit.wikimedia.org/r/490305

Change 490964 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Popups@master] Add all reference type icons and messages

https://gerrit.wikimedia.org/r/490964

We know on what basis it is decided which type a reference has

The endpoint happily delivers one of these types: book, journal, news, web, and generic.

NOTE: This is technically a very trivial, global list of whitelisted CSS class names. Not a list of template names to be configured per wiki, as I expected. The communities are expected to use, for example, <cite class="web"> in their templates. The relevant code can be seen at https://github.com/wikimedia/mediawiki-services-mobileapps/blob/master/lib/transformations/references/structureReferenceListContent.js#L10 and below.
NOTE: Our code must duplicate this whitelist because it can only support reference types with icons and headlines it knows about.

It can be estimated how many points would be the switch to using the references endpoint

Yes, we can.

Questions:

  • What configuration do we want to introduce? One approach is full auto-detection, another is to follow what page previews already do:
    • PopupsReferenceGateway could specified one of two or three types: scraping and restbase, and possibly none to disable reference previews (allows us to remove PopupsReferencePreviews).
    • PopupsReferenceRestGatewayEndpoint specifies the endpoint URL.
  • By switching to the restbased endpoint we might accidentally drop support for certain fake references the endpoint is not able to deliver, but our HTML scraping is.
  • It turns out we can replicate the type detection in our fallback scraping code, as it is just a whitelist of CSS class names.
    • Major advantages: No extra HTTP requests. No need to implement caching. Just one relatively trivial gateway for all users.
    • Disadvantages: Code duplication.

Change 491521 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Popups@master] Add reference type detection to HTML scraping gateway

https://gerrit.wikimedia.org/r/491521

Change 490964 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Add all reference type icons and messages

https://gerrit.wikimedia.org/r/490964

Lea_WMDE closed this task as Resolved.Feb 20 2019, 9:11 AM
Lea_WMDE claimed this task.
Lea_WMDE moved this task from Demo to Done on the WMDE-QWERTY-Sprint-2019-02-06 board.

Change 491722 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/Popups@master] Reference gateway accepts .mw-reference-text and .reference-text

https://gerrit.wikimedia.org/r/491722

Change 491722 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Reference gateway accepts .mw-reference-text and .reference-text

https://gerrit.wikimedia.org/r/491722

Change 491521 merged by jenkins-bot:
[mediawiki/extensions/Popups@master] Add reference type detection to HTML scraping gateway

https://gerrit.wikimedia.org/r/491521