Page MenuHomePhabricator

Meeting: Developers of python libraries for MediaWiki
Closed, ResolvedPublic

Assigned To
Authored By
Halfak
May 3 2015, 7:23 PM
Referenced Files
None
Tokens
"Like" token, awarded by Mattflaschen-WMF."Love" token, awarded by Ricordisamoa."Like" token, awarded by Harej."Love" token, awarded by yuvipanda.

Description

There are a lot of Python libraries for doing work with MediaWiki. It turns out that there's at least 10 for interacting with the API! Let's meet up to talk about what's working and what's not. Let's also talk about opportunities where we could consolidate effort, share code, work together, etc.

http://etherpad.wikimedia.org/p/python

Event Timeline

Halfak raised the priority of this task from to Needs Triage.
Halfak updated the task description. (Show Details)
Halfak moved this task to Meeting proposals on the Wikimedia-Hackathon-2015 board.
Halfak subscribed.
Legoktm renamed this task from Developers of python libraries for mediawiki to Developers of python libraries for MediaWiki meetup at 2015 Lyon hackathon.May 3 2015, 7:26 PM
Legoktm updated the task description. (Show Details)
Legoktm set Security to None.
Legoktm subscribed.

Are you still planning to run this session? Would you prefer to have it scheduled in advanced (i.e. to promote it in our circles)?

@Qgil yes. I'd like this session to take place. Is there something you feel that we have missed? Maybe there's a specific place that you think we should promote this but we haven't yet?

Interested mainly in regard to Flow. pywikibot support is in progress as a GSOC (T67119), would be glad to reach out to the other library developers if applicable.

It is time to promote Wikimedia-Hackathon-2015 activities in the program (training sessions and meetings) and main wiki page (hacking projects and other ongoing activities). Follow the instructions, please. If you have questions, about this message, ask here.

This will be at 17:00 in the Croix Rousse room (right after the action API session).

Halfak renamed this task from Developers of python libraries for MediaWiki meetup at 2015 Lyon hackathon to Meeting: Developers of python libraries for MediaWiki.May 23 2015, 7:41 AM
Halfak updated the task description. (Show Details)

Notes pasted at P675

1People: jayvdb, legoktm, yurik, Pierre Selim, Jean-Fred, Lyon Epitech students (Antoine and Dimitri), valhallasw, multichill, halfak, ladsgroup, とある白い猫, hashar, yuri, Yuvi
2Title: Meeting: Developers of python libraries for MediaWiki
3Task: https://phabricator.wikimedia.org/T97950
4
5== Round of introductions ==
6We all hate compat
7
8
9Existing Python libraries:
10
11API libraries
12 See https://www.mediawiki.org/wiki/API:Client_code
13 See https://www.mediawiki.org/wiki/API:Client_code/Evaluations
14pywikibot-core -- BOT OPERATING SYSTEM
15pywikibot-compat
16https://github.com/goldsmith/Wikipedia -- API library and BOT OPERATING SYSTEM
17mwapi https://github.com/yuvipanda/python-mwapi -- Basic API library
18mwclient https://github.com/mwclient/mwclient/ -- Basic API library
19wikitools https://github.com/alexz-enwp/wikitools -- API library that implements API Structure
20https://github.com/halfak/Mediawiki-Utilities -- General collection of utilities for extracting mediawiki data (research focus). API, DB, XML & utilites for extracting sessions, reverts and title parsing
21https://github.com/wikimedia/analytics-zero-sms/blob/master/scripts/api.py
22http://git.wikimedia.org/blob/integration%2Fjenkins.git/master/bin%2Fmw-api-siteinfo.py (thin python layer for api.php query parameters)
23
24TL;DR: there are too many of them!
25
26OAuth
27https://github.com/wikimedia/MediaWiki-OAuth -- General mediawiki OAuth utility
28https://github.com/valhallasw/flask-mwoauth -- Flask mediawiki OAuth routes
29
30Machine learning / Artificial Intelligence
31https://github.com/halfak/Revision-Scoring -- Machine learning and feature extraction system that focuses on the revision
32https://github.com/halfak/Objective-Revision-Evaluation-Service (ORES) -- Restful web service
33https://github.com/halfak/Wiki-Labels -- Handcoding utility (Restful server & gadget)
34https://github.com/halfak/Wiki-Class -- A machine learning system for predicting WP 1.0 assessment classes
35
36Live systems support
37https://github.com/halfak/MediaWiki-events -- A generalize event datasource (reads DB, API, RCStream and [IRC])
38
39Data extraction utilities
40https://github.com/halfak/Extract-scholarly-article-citations-from-Wikipedia (mwcites)
41https://github.com/halfak/mwrefs -- Extracts <ref> tags from XML dumps (both current and historically)
42https://github.com/halfak/mwmetrics -- Extracts basic behavioral stats for new editors using the MediaWiki DB
43https://github.com/halfak/MediaWiki-Streaming -- A hadoop stream processing framework for extracting information from XML dumps
44https://github.com/halfak/MediaWiki-edit-quality-scoring -- A content persistence extraction system that uses the MediaWiki API
45https://github.com/earwig/mwparserfromhell -- Parses WikiText into abstract syntax trees
46
47== Discussion ==
48* halfak: figure out what libraries are out there and try and work together
49* yurik: pywikibot's Page object is a kitchen sink, Site should be used to interact with the API
50* yurik: library should use requests, and do very simple things, separation of storage objects and "Site" object
51* legoktm: pywikibot is heavy, it needs user-config.py
52
53We could publish pywikibot to pypi if we wanted to do. Would need a stable release that we could maintain.
54Yurik: asking about splitting pywikibot to a thin/lower level and the heavy one
55Jay: yeah we are actually doing it
56valhallasw: can do it already! You can do this already.
57Jay: that is the Site object, named methods correspond to the API requests
58Yurik: makes it hard to catch up with upstream changes
59valhallasw: there is a network layer, a simple API layer, and then a layer on top of that (Site)
60jay: ...explore api programmatically via action=paraminfo
61hashar: lacking docs on how to use lower level parts of pywikibot, made it easier for me to write what I needed instead of reusing existing code.
62halfak: smallest library we can possibly use
63Jean-Fred: pywikibot was a hassle to install a lot like in venvs (using tox) for testing − install should be made easier (pip install pywikibot ??)
64jay: problem with the list of wiki famillies growing.
65- one library that loads interwiki.cdb, builds interwiki matrix
66- i18n is in its own package since it keeps being updated over and over (ex: namespaces, edit comments)
67halfak: how so that is an utility! not a library!! A pain in the ass is all the things meant for bots
68
69
70ACTION: define the layers of pywikibot and where we draw the limit.
71
72Segments of pywikibot
73- interwiki.cdb (families)
74- i18n data
75- pywikibot (Recursive?)
76- scripts
77- API
78
79valhallasw: you can use pip install right now, also by pointing it to the git repo
80jay: some dependencies are just optionals
81
82legoktm: agreement to split the framework in well isolated libraries.
83valhallasw: someone who want to run a script would want to download a tarball that has all the dependencies included.
84ToAruShiroiNeku: everything through pip, tarballs for people who want them - not necesarrily with any programming knowledge (just to run scripts perhaps)
85hashar: can use wheels maybe? :-D
86lang=qqq!!!
87yurik: one reference implementation per language to set standard for low level libraries, well known, commonly
88
89
90KEY IDEAS/ACTIONS
91
92pywikibot
93
94* API Auto Documentation for the low layers of pywikibot
95** Jay, Marteen, Antoine, Amir, legoktm
96** AGREED to use Sphinx and .rst
97** AGREED Publish it do doc.wikimedia.org
98*** English up to date docs first, then look into how to maybe (??) localize them
99
100* Second documentation work is to write specs/RFC/architecture/design documentation
101* AGREED For now we agree that we won't do i18n on generated documentation which is geared toward devs
102* AGREEDScripts we keep the (i18ned) documentation on MediaWiki.org
103* ACTION: define user groups and their doc requirements
104
105Expected outcome: easier for developers to reuse the code potentially as a library.
106
107
108low level API library "mwapi"
109* mwapi is a bad name - has to be python specific
110** AGREED to bikeshed about renaming on mailling list later on. One potential proposal: pymwapi
111* AGREED minimum dependencies e.g. "requests"
112* AGREED no hardcoding of any API names (exceptions: login)
113* AGREED Supports Login and Session (non-stored but accessible)
114* AGREED no automatic badtoken handling (middleware should handle that). Ie a badtoken should raise an exception and would not attempt to relogin automatically
115* AGREED Supports all api value types, e.g. timestamps, list -> "str|str|str"
116* AGREED Does not handle errors. Only reports them wrapped in an APIError
117* AGREED Configurable retries for HTTP errors and exponential back-off? Left up to the user who constructs the session object.
118* AGREED Query / continuation (uses new style continuation, does not handle errors during continuation)
119* maxlag <-- Proposal?
120* Discussion
121** Can we repurpose the name "mwapi"? Who owns that utility? YUVI! Yay! https://gerrit.wikimedia.org/r/#/admin/projects/pywikiapi -- use this one, its empty, i created it (if you want of course)
122** Can create another repo as needed.
123** Where do we draw the limit between layers?
124
125Amir: https://github.com/wikimedia/pywikibot-core/tree/master/pywikibot/comms
126
127Middle level?
128* Handles errors and stuff? I guess?
129* OAUTH ???
130
131
132other libraries?
133
134Revscoring intgration with pywikibot
135
136We are all crazy.
137+ 1
138mais pas du tout mon bon monsieur!
139Oh mon dieu du francais =P
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169

Qgil claimed this task.