There are a lot of Python libraries for doing work with MediaWiki. It turns out that there's at least 10 for interacting with the API! Let's meet up to talk about what's working and what's not. Let's also talk about opportunities where we could consolidate effort, share code, work together, etc.
Description
Related Objects
Event Timeline
Are you still planning to run this session? Would you prefer to have it scheduled in advanced (i.e. to promote it in our circles)?
@Qgil yes. I'd like this session to take place. Is there something you feel that we have missed? Maybe there's a specific place that you think we should promote this but we haven't yet?
Interested mainly in regard to Flow. pywikibot support is in progress as a GSOC (T67119), would be glad to reach out to the other library developers if applicable.
It is time to promote Wikimedia-Hackathon-2015 activities in the program (training sessions and meetings) and main wiki page (hacking projects and other ongoing activities). Follow the instructions, please. If you have questions, about this message, ask here.
Notes pasted at P675
1 | People: jayvdb, legoktm, yurik, Pierre Selim, Jean-Fred, Lyon Epitech students (Antoine and Dimitri), valhallasw, multichill, halfak, ladsgroup, とある白い猫, hashar, yuri, Yuvi |
---|---|
2 | Title: Meeting: Developers of python libraries for MediaWiki |
3 | Task: https://phabricator.wikimedia.org/T97950 |
4 | |
5 | == Round of introductions == |
6 | We all hate compat |
7 | |
8 | |
9 | Existing Python libraries: |
10 | |
11 | API libraries |
12 | See https://www.mediawiki.org/wiki/API:Client_code |
13 | See https://www.mediawiki.org/wiki/API:Client_code/Evaluations |
14 | pywikibot-core -- BOT OPERATING SYSTEM |
15 | pywikibot-compat |
16 | https://github.com/goldsmith/Wikipedia -- API library and BOT OPERATING SYSTEM |
17 | mwapi https://github.com/yuvipanda/python-mwapi -- Basic API library |
18 | mwclient https://github.com/mwclient/mwclient/ -- Basic API library |
19 | wikitools https://github.com/alexz-enwp/wikitools -- API library that implements API Structure |
20 | https://github.com/halfak/Mediawiki-Utilities -- General collection of utilities for extracting mediawiki data (research focus). API, DB, XML & utilites for extracting sessions, reverts and title parsing |
21 | https://github.com/wikimedia/analytics-zero-sms/blob/master/scripts/api.py |
22 | http://git.wikimedia.org/blob/integration%2Fjenkins.git/master/bin%2Fmw-api-siteinfo.py (thin python layer for api.php query parameters) |
23 | |
24 | TL;DR: there are too many of them! |
25 | |
26 | OAuth |
27 | https://github.com/wikimedia/MediaWiki-OAuth -- General mediawiki OAuth utility |
28 | https://github.com/valhallasw/flask-mwoauth -- Flask mediawiki OAuth routes |
29 | |
30 | Machine learning / Artificial Intelligence |
31 | https://github.com/halfak/Revision-Scoring -- Machine learning and feature extraction system that focuses on the revision |
32 | https://github.com/halfak/Objective-Revision-Evaluation-Service (ORES) -- Restful web service |
33 | https://github.com/halfak/Wiki-Labels -- Handcoding utility (Restful server & gadget) |
34 | https://github.com/halfak/Wiki-Class -- A machine learning system for predicting WP 1.0 assessment classes |
35 | |
36 | Live systems support |
37 | https://github.com/halfak/MediaWiki-events -- A generalize event datasource (reads DB, API, RCStream and [IRC]) |
38 | |
39 | Data extraction utilities |
40 | https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (mwcites) |
41 | https://github.com/halfak/mwrefs -- Extracts <ref> tags from XML dumps (both current and historically) |
42 | https://github.com/halfak/mwmetrics -- Extracts basic behavioral stats for new editors using the MediaWiki DB |
43 | https://github.com/halfak/MediaWiki-Streaming -- A hadoop stream processing framework for extracting information from XML dumps |
44 | https://github.com/halfak/MediaWiki-edit-quality-scoring -- A content persistence extraction system that uses the MediaWiki API |
45 | https://github.com/earwig/mwparserfromhell -- Parses WikiText into abstract syntax trees |
46 | |
47 | == Discussion == |
48 | * halfak: figure out what libraries are out there and try and work together |
49 | * yurik: pywikibot's Page object is a kitchen sink, Site should be used to interact with the API |
50 | * yurik: library should use requests, and do very simple things, separation of storage objects and "Site" object |
51 | * legoktm: pywikibot is heavy, it needs user-config.py |
52 | |
53 | We could publish pywikibot to pypi if we wanted to do. Would need a stable release that we could maintain. |
54 | Yurik: asking about splitting pywikibot to a thin/lower level and the heavy one |
55 | Jay: yeah we are actually doing it |
56 | valhallasw: can do it already! You can do this already. |
57 | Jay: that is the Site object, named methods correspond to the API requests |
58 | Yurik: makes it hard to catch up with upstream changes |
59 | valhallasw: there is a network layer, a simple API layer, and then a layer on top of that (Site) |
60 | jay: ...explore api programmatically via action=paraminfo |
61 | hashar: lacking docs on how to use lower level parts of pywikibot, made it easier for me to write what I needed instead of reusing existing code. |
62 | halfak: smallest library we can possibly use |
63 | Jean-Fred: pywikibot was a hassle to install a lot like in venvs (using tox) for testing − install should be made easier (pip install pywikibot ??) |
64 | jay: problem with the list of wiki famillies growing. |
65 | - one library that loads interwiki.cdb, builds interwiki matrix |
66 | - i18n is in its own package since it keeps being updated over and over (ex: namespaces, edit comments) |
67 | halfak: how so that is an utility! not a library!! A pain in the ass is all the things meant for bots |
68 | |
69 | |
70 | ACTION: define the layers of pywikibot and where we draw the limit. |
71 | |
72 | Segments of pywikibot |
73 | - interwiki.cdb (families) |
74 | - i18n data |
75 | - pywikibot (Recursive?) |
76 | - scripts |
77 | - API |
78 | |
79 | valhallasw: you can use pip install right now, also by pointing it to the git repo |
80 | jay: some dependencies are just optionals |
81 | |
82 | legoktm: agreement to split the framework in well isolated libraries. |
83 | valhallasw: someone who want to run a script would want to download a tarball that has all the dependencies included. |
84 | ToAruShiroiNeku: everything through pip, tarballs for people who want them - not necesarrily with any programming knowledge (just to run scripts perhaps) |
85 | hashar: can use wheels maybe? :-D |
86 | lang=qqq!!! |
87 | yurik: one reference implementation per language to set standard for low level libraries, well known, commonly |
88 | |
89 | |
90 | KEY IDEAS/ACTIONS |
91 | |
92 | pywikibot |
93 | |
94 | * API Auto Documentation for the low layers of pywikibot |
95 | ** Jay, Marteen, Antoine, Amir, legoktm |
96 | ** AGREED to use Sphinx and .rst |
97 | ** AGREED Publish it do doc.wikimedia.org |
98 | *** English up to date docs first, then look into how to maybe (??) localize them |
99 | |
100 | * Second documentation work is to write specs/RFC/architecture/design documentation |
101 | * AGREED For now we agree that we won't do i18n on generated documentation which is geared toward devs |
102 | * AGREEDScripts we keep the (i18ned) documentation on MediaWiki.org |
103 | * ACTION: define user groups and their doc requirements |
104 | |
105 | Expected outcome: easier for developers to reuse the code potentially as a library. |
106 | |
107 | |
108 | low level API library "mwapi" |
109 | * mwapi is a bad name - has to be python specific |
110 | ** AGREED to bikeshed about renaming on mailling list later on. One potential proposal: pymwapi |
111 | * AGREED minimum dependencies e.g. "requests" |
112 | * AGREED no hardcoding of any API names (exceptions: login) |
113 | * AGREED Supports Login and Session (non-stored but accessible) |
114 | * AGREED no automatic badtoken handling (middleware should handle that). Ie a badtoken should raise an exception and would not attempt to relogin automatically |
115 | * AGREED Supports all api value types, e.g. timestamps, list -> "str|str|str" |
116 | * AGREED Does not handle errors. Only reports them wrapped in an APIError |
117 | * AGREED Configurable retries for HTTP errors and exponential back-off? Left up to the user who constructs the session object. |
118 | * AGREED Query / continuation (uses new style continuation, does not handle errors during continuation) |
119 | * maxlag <-- Proposal? |
120 | * Discussion |
121 | ** Can we repurpose the name "mwapi"? Who owns that utility? YUVI! Yay! https://gerrit.wikimedia.org/r/#/admin/projects/pywikiapi -- use this one, its empty, i created it (if you want of course) |
122 | ** Can create another repo as needed. |
123 | ** Where do we draw the limit between layers? |
124 | |
125 | Amir: https://github.com/wikimedia/pywikibot-core/tree/master/pywikibot/comms |
126 | |
127 | Middle level? |
128 | * Handles errors and stuff? I guess? |
129 | * OAUTH ??? |
130 | |
131 | |
132 | other libraries? |
133 | |
134 | Revscoring intgration with pywikibot |
135 | |
136 | We are all crazy. |
137 | + 1 |
138 | mais pas du tout mon bon monsieur! |
139 | Oh mon dieu du francais =P |
140 | |
141 | |
142 | |
143 | |
144 | |
145 | |
146 | |
147 | |
148 | |
149 | |
150 | |
151 | |
152 | |
153 | |
154 | |
155 | |
156 | |
157 | |
158 | |
159 | |
160 | |
161 | |
162 | |
163 | |
164 | |
165 | |
166 | |
167 | |
168 | |
169 |