Page MenuHomePhabricator
Paste P4281

WikiDev16-API-Notes-T122818.txt
ActivePublic

Authored by RobLa-WMF on Oct 21 2016, 3:46 PM.
Tags
None
Referenced Files
F4637443: WikiDev16-API-Notes-T122818.txt
Oct 21 2016, 3:46 PM
Subscribers
None
Session name: MediaWiki Action API design discussion: the amazing/good/bad/ugly
Meeting goal: Anomie has been working on the mediawiki API, let's gather ideas
Meeting style: Problem-solving(problem discovery?): surveying many possible solutions
Phabricator task link: https://phabricator.wikimedia.org/T122818
Topics for discussion:
Use cases
Bots/tools/gadgets
historical primary use-case
need to query content & perform actions
action API geared towards information about lots of pages
Google: want to get clean wikipedia data. They've written wikitext parser (parse to structured data). Access templates from API. Access templates; contents are still different from what's visible on HTML page. What the user sees is different from the template. Trying to clean templates to unify implementations. Similar to Wikidata's goal: human and machine-readable data.
If you access infobox by template vs. html: even the number of infoboxes on the page is different.
Broader issue: language agnosticism. Action API for specific installation; RESTBase is a "Cassandra-backed persistent cache layer", with modules.
Pain points
What is the best way to query infobox information? ...can there be better ways?
one problem with infoboxes is that they are written by different people, different inputs and outputs, wikidata is one answer to standardise that
See also content format discussions https://phabricator.wikimedia.org/T119022
Discoverability of existing features
for example it is hard to understand what each API module will give back
cirrus is another example, people might not be interested in that
automatically generated documentation: https://en.wikipedia.org/w/api.php
human-(un)maintained documentation https://www.mediawiki.org/wiki/API:Main_page
API sandbox https://en.wikipedia.org/wiki/Special:ApiSandbox
currently undergoing a rewrite by anomie
modules are hard to categorise and relate to each other (e.g. "if you are doing x on page see also module y")
Ctrl-F stopped working with the API redesign
all help in a single page https://en.wikipedia.org/w/api.php?action=help&recursivesubmodules=1 (!!!!)
The way the XML dumps, the database and the API represent deleted fields is different and poorly documented.
Related https://phabricator.wikimedia.org/T114019
Inconsistencies between API access and dumps (e.g. bitfields)
A lot of the "actions" aren't actually an action. action=query, action=edit makes sense. action=flow doesn't help me flow something "action" has become a top-level categorization
YES.
Following on from the point about best practices when writing API modules, this is an important part of the code review process (as well as clear documentation)
"action" is really which module to ask to
Too many ways of doing similar but not identical tasks (e.g. fetching current page text)
part of the problem is fragmentation, often the solution is to ask somebody who has come across the same problem
Versioning: let's talk about it. Versioning modules. Brad: where possible, add a new parameter instead of versioning. Issues: complexity creep, how to balance?
Versioning could help substantially with addressing the inconsistencies between data (API/XML/Database/etc). Without versioning, we can't refactor without breaking things.
Design features
Querying revisions independent of page/user (SELECT * FROM revision WHERE rev_timestamp BETWEEN "2014" and "2015")
check out the allrevisions module (https://www.mediawiki.org/wiki/API:Allrevisions)
example of discoverability issues
Useful: provide a link to the example queries in API Sandbox (in api.php module docs)
More caching:
Can caching work for sub-modules of the action API?
possible, but needs someone willing to work on it. anomie happy to review.
restbase being single-page-oriented is easier to cache/purge, action api not so much since it operates on many pages
Mobile views API module should work on more than one article at a time. (depends on the MobileFrontend extension)
Can we query the API via PHP in mediawiki? Most queries/actions internally directly access the databases.
not ATM, going back and change that is a huge amount of work to properly separate things
Would the team be interested in someone working on this with them? Yes! "I'd like to review that code." --anomie
Can standardize how we access data because there are some nuances in normalization/etc.
Standardization on this can provide common language
Unified way of accessing page properties
[discoverability] Grouping of actions--what goes together? E.g. Cirrus-related could go together so only people who care about it notice it
possible GCI/hackathon project; make a place for information to go, maybe on mw.org
Grouping of actions would deal with the action=flow issue (mentioned above). Where that action is essentially a group of everything Flow
General notes
Is there a long-term plan for the action API? (Currently work is done ad-hoc)
https://www.mediawiki.org/wiki/Requests_for_comment/API_roadmap
https://www.mediawiki.org/wiki/API/Architecture_work/Planning
bd808's notion of code pioneer/settler/city planner for code (http://blog.gardeviance.org/2015/03/on-pioneers-settlers-town-planners-and.html among others)
Is the purpose to avoid dealing with wikitext? No, not really--you can get HTML out of it, but also handle wikitext.
API in layers--wikitext, template, other information to allow user parsing?
quarry (web interface for db queries) records queries, can be a useful learning too for newcomers. replicate the same for api sandbox?
on the same theme, see also jupyterhub on labs to control pywikibot
Action items with owners:
Fhocutt: suggest API use-case categorization for hackathon
!Brad: ask Brad/anomie to review code for API modules, and set aside time to deal with resulting comments. Add anomie as a reviewer on an API-related patch, and if he's not looking at it ping him via email/IRC.
vague, no one is assigned to it: fix up API documentation. Make a list of pages that need fixing?
Conversations to have:
Attendees:
Aaron Halfaker
Filippo Giunchedi
Darian Fitzpatrick
Niklas Laxström
Jordan Adler (Google)
Bryan Davis
Zhicheng Zheng (Google)
Yanan Qian (Google)
Stas Malyshev
Frances Hocutt
Sam Smith
Joaquin Hernandez
DON’T FORGET: When the meeting is over, copy any relevant notes (especially areas of agreement or disagreement, useful proposals, and action items) into the Phabricator task.
See https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit_2016/Session_checklist for more details.