Page MenuHomePhabricator

Develop Metadata JSON API
Closed, ResolvedPublic

Description

Develop a JSON API that returns all structured data of a page not included in the other PCS JSON APIs (Summary, References, Gallery)

Documentation of the data to include is here:
https://docs.google.com/spreadsheets/d/1RoP3gzbd-DbthjTbfim5z-c-qgjKXJOQE5rgZH3k4d0/edit#gid=0

Some important data this will include:

  • Table of Contents
  • Categories
  • Hatnotes
  • Page issues
  • Geo data
  • Spoken version URL
  • Protection status

The API shout return an array of image objects (with thumb and full sizes) like in other MCS APIs with the other information included.

Additionally, we should return the section that each image is contained in as well as a link to the image on that page (so it can be found)

Example output (as of 26 Feb 2018)

http://localhost:6927/en.wikipedia.org/v1/page/metadata/Meridian_High_School_(Washington)

{
    "revision": "823906120",
    "tid": "9a7c8cfe-128d-11e8-90dd-a48f3970329f",
    "hatnotes": [
        {
            "section": 0,
            "html": "For other schools of a similar name, see <a rel=\"mw:WikiLink\" href=\"./Meridian_High_School_(disambiguation)\" title=\"Meridian High School (disambiguation)\" class=\"mw-redirect\">Meridian High School</a>."
        }
    ],
    "issues": [
        {
            "section": 0,
            "html": "This article <b>does not <a rel=\"mw:WikiLink\" href=\"./Wikipedia:Citing_sources\" title=\"Wikipedia:Citing sources\">cite</a> any <a rel=\"mw:WikiLink\" href=\"./Wikipedia:Verifiability\" title=\"Wikipedia:Verifiability\">sources</a></b>.<span class=\"hide-when-compact\"> Please help <a rel=\"mw:ExtLink\" href=\"//en.wikipedia.org/w/index.php?title=Meridian_High_School_(Washington)&amp;action=edit\">improve this article</a> by <a rel=\"mw:WikiLink\" href=\"./Help:Introduction_to_referencing_with_Wiki_Markup/1\" title=\"Help:Introduction to referencing with Wiki Markup/1\">adding citations to reliable sources</a>. Unsourced material may be challenged and <a rel=\"mw:WikiLink\" href=\"./Wikipedia:Verifiability#Burden_of_evidence\" title=\"Wikipedia:Verifiability\">removed</a>.</span>  <small><i>(August 2011)</i></small><small class=\"hide-when-compact\"><i> (<a rel=\"mw:WikiLink\" href=\"./Help:Maintenance_template_removal\" title=\"Help:Maintenance template removal\">Learn how and when to remove this template message</a>)</i></small>"
        },
        {
            "section": 0,
            "html": "This article <b>may be in need of reorganization to comply with Wikipedia's <a rel=\"mw:WikiLink\" href=\"./Wikipedia:Manual_of_Style/Layout\" title=\"Wikipedia:Manual of Style/Layout\">layout guidelines</a></b>.<span class=\"hide-when-compact\"> Please help by <a rel=\"mw:ExtLink\" href=\"//en.wikipedia.org/w/index.php?title=Meridian_High_School_(Washington)&amp;action=edit\">editing the article</a> to make improvements to the overall structure.</span>  <small><i>(March 2008)</i></small><small class=\"hide-when-compact\"><i> (<a rel=\"mw:WikiLink\" href=\"./Help:Maintenance_template_removal\" title=\"Help:Maintenance template removal\">Learn how and when to remove this template message</a>)</i></small>"
        }
    ],
    "toc": {
        "title": "Contents",
        "entries": [
            {
                "toclevel": 1,
                "tocsection": 1,
                "tocnumber": "1",
                "href": "Athletics",
                "text": "Athletics"
            },
            {
                "toclevel": 2,
                "tocsection": 2,
                "tocnumber": "1.1",
                "href": "State_championships",
                "text": "State championships"
            },
            {
                "toclevel": 1,
                "tocsection": 3,
                "tocnumber": "2",
                "href": "External_links",
                "text": "External links"
            }
        ],
        "flags": {}
    },
    "categories": [
        {
            "ns": 14,
            "title": "Category:All articles lacking sources",
            "hidden": true
        },
        {
            "ns": 14,
            "title": "Category:Articles lacking sources from August 2011",
            "hidden": true
        },
        {
            "ns": 14,
            "title": "Category:Articles with multiple maintenance issues",
            "hidden": true
        },
        {
            "ns": 14,
            "title": "Category:Coordinates on Wikidata",
            "hidden": true
        },
        {
            "ns": 14,
            "title": "Category:High schools in Whatcom County, Washington",
            "hidden": false
        },
        {
            "ns": 14,
            "title": "Category:Public high schools in Washington (state)",
            "hidden": false
        },
        {
            "ns": 14,
            "title": "Category:Wikipedia articles needing reorganization from March 2008",
            "hidden": true
        }
    ],
    "coordinates": [
        {
            "lat": 48.85611111,
            "lon": -122.49027778,
            "primary": true,
            "globe": "earth"
        }
    ],
    "protection": {}
}

Event Timeline

Change 409074 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: initial stub

https://gerrit.wikimedia.org/r/409074

Change 409074 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: initial stub

https://gerrit.wikimedia.org/r/409074

Change 410618 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add coordinates

https://gerrit.wikimedia.org/r/410618

Change 410624 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add spoken wikipedia files

https://gerrit.wikimedia.org/r/410624

Change 410625 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add revision + tid

https://gerrit.wikimedia.org/r/410625

Change 410624 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add spoken wikipedia files

https://gerrit.wikimedia.org/r/410624

Change 410625 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add revision + tid

https://gerrit.wikimedia.org/r/410625

@Dbrant @Mhurd @JoeWalsh Are you guys ok using just the HTML version for page issues and hatnotes? In other words can we drop the plain text version of these?

To add some context:

The MCS mobile-sections-lead endpoint response includes 'issues' and 'hatnotes' properties that report the page issues and hatnotes, if present (but only for the lead section). Unless I'm missing something, the Android app isn't using either of these properties, but is parsing page issues from the page HTML in client-side JS. Since iOS isn't using MCS for page loads at all, I dont' suppose it's relying on these either.

The current format for hatnotes in mobile-sections-lead is an array of html strings:

hatnotes: [ '<b>foo</b>', '<b>bar</b>' ]

For page issues, the current format is as follows:

issues: [ { text: '<b>foo</b>' }, { text: '<b>bar</b>' } ]

For the new metadata endpoint, i've proposed standardizing the structure for both so that they contain both the HTML and plain text representations of the issue or hatnote, as well as the section ID, since the entire article will be handled.

hatnotes: [
  {
    section: 0,
    html: '<b>foo</b>',
    text: foo
  },
  {
    section: 1,
    html: '<b>bar</b>',
    text: bar
  }
]

The question is: would the plain text version of these be useful to you, or should we leave them off?

Side note: it's unfortunate that the current mobile-sections-lead 'issues' property provides HTML strings in a property called 'text'. Part of the goal of the proposed structure would be to reduce the ambiguity here.

Change 412998 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/extensions/GeoData@master] Output 'primary' as a boolean if formatversion=2

https://gerrit.wikimedia.org/r/412998

Change 412998 merged by jenkins-bot:
[mediawiki/extensions/GeoData@master] Output 'primary' as a boolean if formatversion=2

https://gerrit.wikimedia.org/r/412998

Update (to T177428#3980479, T177428#3980622): we're taking out text. Holler if you need it and we'll put it back.

Change 413224 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: add protection

https://gerrit.wikimedia.org/r/413224

Change 410618 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add coordinates

https://gerrit.wikimedia.org/r/410618

Change 413224 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: add protection

https://gerrit.wikimedia.org/r/413224

Change 414741 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add table of contents

https://gerrit.wikimedia.org/r/414741

Change 414856 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add Wikidata sitelinks

https://gerrit.wikimedia.org/r/414856

Change 415323 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Handle notoc, forcetoc, toclimit in TOC construction

https://gerrit.wikimedia.org/r/415323

@Mholloway would you list some example pages where those behavioral flags are used so it's easier to test?

@Mholloway would you list some example pages where those behavioral flags are used so it's easier to test?

Sure:
toclimit(=3): http://localhost:6927/en.wikipedia.org/v1/page/metadata/Albert_Einstein
notoc: http://localhost:6927/en.wikipedia.org/v1/page/metadata/Glen_Roy_Conservation_Park
forcetoc: http://localhost:6927/en.wikipedia.org/v1/page/metadata/Alistair_Fox

I'll add these to the commit message in the next iteration as well.

BTW, https://en.wikipedia.org/wiki/Special:PagesWithProp is pretty handy for finding other pages with notoc or forcetoc if you want. toclimit seems really common, espcially on bigger articles.

Change 414741 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add table of contents

https://gerrit.wikimedia.org/r/414741

Change 415323 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Handle notoc, forcetoc, toclimit in TOC construction

https://gerrit.wikimedia.org/r/415323

Change 415863 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add langlinks + langlinkscount

https://gerrit.wikimedia.org/r/415863

Change 414856 abandoned by Mholloway:
Metadata: Add Wikidata sitelinks

Reason:
Superseded by I98b033d315a65df6049dc817526499df776e399e

https://gerrit.wikimedia.org/r/414856

Change 415863 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata: Add langlinks

https://gerrit.wikimedia.org/r/415863

Change 417026 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata endpoint response tweaks

https://gerrit.wikimedia.org/r/417026

Change 417026 merged by jenkins-bot:
[mediawiki/services/mobileapps@master] Metadata endpoint response tweaks

https://gerrit.wikimedia.org/r/417026