Page MenuHomePhabricator

Extract categories as a structured array in the metadata endpoint
Closed, ResolvedPublic

Description

We should pull out categories of a page into an array as part of the new /page/metadata/{title} endpoint.

If we can use a MW API for this, we can make a call directly. Otherwise we should parse the HTML to get at these.

Event Timeline

Please no... categories should remain out of the payload of the MCS endpoint. In mobile we only need to show them with a categories button. The HTML for categories can be quite large and is not necessary for an initial view and arguably not needed even for a below the fold view. I'd recommend the creation of a new endpoint for categories if needed for scalability...

Is anything needed here beyond a simple list of categories?

It would be trivially easy to get these directly from the Parsoid HTML with something like

const categoryLinks = doc.querySelectorAll('link[rel="mw:PageProp/Category"]');
return [].map.call(categoryLinks, link => link.getAttribute('href'));

Interesting. Parsoid HTML currently doesn't expose hidden categories. I wonder how @ssastry would feel about adding those.

Change 410627 had a related patch set uploaded (by Mholloway; owner: Mholloway):
[mediawiki/services/mobileapps@master] Metadata: Add page categories

https://gerrit.wikimedia.org/r/410627

Doing this from the MW API for now so we're not blocked on T186919.

bearND renamed this task from Extract categories as a structured array to Extract categories as a structured array in the metadata endpoint.Feb 15 2018, 3:56 AM
bearND updated the task description. (Show Details)

Updated title and description of this task to reflect the current plan.

Change 410627 merged by Mholloway:
[mediawiki/services/mobileapps@master] Metadata: Add page categories

https://gerrit.wikimedia.org/r/410627