Implement OAuth Support for Pywikibot
This project is focusing on implement OAuth support for Pywikibot, which is a collection of tools that automate work on MediaWiki sites. OAuth will offer a more reasonable, safe and robust way to identity authentication for users who use Pywikibot to maintain their MediaWiki sites.
Name and contact information
- Name: Jiarong Wei
- Email: firstname.lastname@example.org
- IRC Nickname: VcamX
- Blog: http://vcamx.net
- GitHub: https://github.com/VcamX
- Resume: http://vcamx.net/resume/
- Location: Hangzhou, Zhejiang
- Typical working hours: (UTC+8:00) 9:30 AM to 17:00 PM. Waking hours are 8:30 AM to 23:00 PM.
OAuth is a popular open standard which allows third-party applications to access sensitive resources on some websites on behalf of users. Usually if applications have to access some user's data on other websites, users may expose their username and password to those applications, which may cause the risk of leaking information. OAuth uses tokens as the solution. Applications would get tokens instead of users' name and password. Different token means different privilege. Tokens can be assigned and revoked. Applications can only access resources which users want them to access.
As a typical toolkit of MediaWiki, it's important for Pywikibot to supply reasonable, safe and robust authentication method to users. MediaWiki support OAuth 1.0a through OAuth extension. Considering its rule as an automatic tool of cooperating people to manage wiki sites, it may involve high level account, like sysop account. If users are less careful, they could also leak their password through logs. Leaking of such information is very serious. So it's reasonable to assign tokens (limited privilage) instead of password (unlimited privilage) to applications which serve MediaWiki projects, e.g. Pywikibot. Thus, besides default password-authentication method, supporting OAuth for Pywikibot is a necessary and urgent mission.
According to the description of T74065 on Phabricator, the requirement is very clear:
1. OAuth support
The implmentation of OAuth 1.0a support.
2. Unit tests and deploying
Two mandatory unit tests:
- A unit test should perform a login and logout using OAuth with assertions that verify APISite._userinfo is correct.
- A unit test should login, edit a userpage, and confirm the edit was performed using the OAuth-authenticated account.
These two are mandatory, but I have to assume that more tests may be needed since more requirements for test could arise during the development.
The unit tests should be configured to run on Travis CI when the secret key is available in the Travis CI configuration, and skipped when it isn't.
OAuth is a new authentication method for Pywikibot. So supplying a How-to document about its usage is necessary. Also, docs for developers may be needed.
- Before May 25, 2015
- Plan and confirm implementation details (see below).
- Implement OAuth support.
- May 25, 2015 (Students begin coding for their Google Summer of Code projects) to June 25, 2015
- Implement OAuth support.
- June 26, 2015 (Mentors and students can begin submitting mid-term evaluations) to Aug 16, 2015
- Implement two mandatory unit tests (maybe more if needed) and deploy test running on Travis CI.
- Write How-to and other related documents.
- Fix bugs.
- Aug 17, 2015 (Suggested 'pencils down' date) to August 21, 2015 (Firm 'pencils down' date)
- Finish the final report, present the result to the community and Google.
- August 28, 2015 (Students can begin submitting required code samples to Google)
- Submit required code samples to Google.
I'd like to seperate it into three parts:
- Get familiar with Pywikibot code and MediaWiki API
- Get familiar with OAuth-related knowledge, libraries (e.g. MediaWiki-OAuth) and OAuth extension of MediaWiki software
- Get familiar with Pywikibot test code, Travis CI and its configuration
- Build development environment
Frankly speaking, I just started to touch Pywikibot and MediaWiki software on Februray this year. I'm getting familiar with code of Pywikibot and inner implementation. With the help of jayvdb and Nullzero, I've solved some bugs (T74974 and T57140) of Pywikibot-login.py and got them merged. I built local MediaWiki site, installedfor testing and bug replication by using Bitnami stack. Also I read the source code of MediaWiki-OAuth and got to know the usage of Travis CI.
I think the preparation for me is mostly done.
Mentors discussed about the implementation details on Phabricator. Thanks to Halfak's work, there is a generic OAuth handshake helper in Python, MediaWiki-OAuth, dedicated to MediaWiki OAuth. This really does a big favor for me. So what's left is how to sign new requests with the AccessToken achieved through MediaWiki-OAuth, just as Halfak said. As a conclusion of the discussion, there are two available schemas for signing new requests. The first one is sticking to httplib2 and implementing our own signing strategy with oauthlib. The second one is switching from httplib2 to requests as HTTP communication handler and using requests-oauthlib to signing requests. I've done some investigation for this. And I think these two different schemas focus on different points.
The first schema is to implement our own OAuth signing strategy. It's more relevant to this project considering the goal of this project, that's we're coding OAuth. However, as Halfakk said, this will be hard to do and need some experienced guys to review the implementation. The bug of authentication is vital.
The second schema is more about migrating I think. Since requests and requests-oauthlib support OAuth 1/2 natively and are widely used, the robusty of OAuth authentication is more reliable. But considering conisistency, it's not sensible to use requests/requests-oauthlib and httplib2 simultaneously in my opinion. So migrating to requests/requests-oauthlib is necessary. Pywikibot doesn't just use httplib2 directly, it adds some add-ons, e.g. cookies-support, multi-thread and connection-pool. So confirming requests has equivelant functions and implementing some wrappers for requests is the main point.
Both schemas have their own pros and cons. Httplib2 is more historical and compact. Requests is more popular and powerful. It's hard to judge which one is much better. But I think it's more painful to fully migrate to requests since httplib2 is integrated so tightly in Pywikibot. A lot of work had been done for adaptation. Migrating may be less meaningful considering we are just need OAuth. So I prefer the first schema. That's just my opinion for this delimma. (I don't have any strong bias for this. Both schemas make sense for me. Discussion with two mentors is necessary.)
There are some other implementation details need to be considered: storing keys and tokens for OAuth, Site object adaptation, exception catching and so on.
UPDATE: My investigation about differences between features of customed httplib2 inside
pywikibot and requests:
- Error handler
Pywikibot: error_handling_callback in http.py
Requests: native support
- Custom callbacks
- Connection pool
Pywikibot: 1 pool (1 thread), 5 connections per identifier (scheme and authority)
Requests: 10 pools, 10 connections per pool, using urlib3.PoolManager
- Keeping cookies thread-safe
Pywikibot: LockableCookieJar inside pywikibot/comms/threadedhttp.py
Requests: documentation says it’s thread-safe (A thread on StackOverflow also talked about it)
Cookielib is thread-broken according to the reply from mail list of Python community. After reading the source code roughly, I think it’s partially thread-safe. There is a lock in CookieJar indeed and used in some cases. However it’s not used in some cases, e.g. clear().
Pywikibot: disabling httplib2 redirect mechanism and implementing its own. (The mechanism is kind of twisted for me. Maybe it's for supporting the old httplib2, e.g. httplib2 0.6)
Requests: native support (record history and disable HEAD by default)
According to the requirement of OAuth implementation, some changes and updatation are needed:
- pywikibot/comms: This is a sub-package which provide basic HTTP request/response handlers. So, MediaWiki-OAuth need to be integrated here to handle OAuth handshakes between Pywikibot and web server. Signing requests with access token when using OAuth authentication also goes here. The first schema need to extend pywikibot/comms/threadedhttp.py by adding our own OAuth requests signing. The second schema is more complicated. Most parts of pywikibot/comms/http.py and pywikibot/comms/threadedhttp.py need to changed (There's a commit on Gerrit about this, which is mentioned on the discussion on Phabricator).
- pywikibot/config2.py: This works as a template for user-config.py which is provided by users. Since OAuth is different from password authentication, we need to add new configuration items here.
- pywikibot/login.py: This is the implementation of basic login mechanism. So this module need update.
- pywikibot/data/api.py: This contains a wrapper for LoginManager in pywikibot/login.py, so I have to assume this also need to be updated.
- pywikibot/site.py: This contains the abstraction of wiki sites. So if users choose to use OAuth, the access token might be stored in Site object and also have some flags indicating that.
- pywikibot/exception.py: This contains exceptions might be throwed. Exceptions which inform users about what's wrong during OAuth authentication need to be added.
For OAuth support, we should test that Pywikibot could achieve the right user identity through OAuth authentication and use the identity obtained to perform proper actions.
My opinion is to add an individual test like pywikibot/test/oauth_tests.py, under pywikibot/test, so the two mandatory tests or more related tests could go there. Also, to support these tests, something may be needed:
- pywikibot/test/aspects.py: This module provides some building blocks for tests. The RequireUserMixin provides user login checking. The MetaTestCaseClass provides metadata for configuration. The corresponding code may be added to these class. Also, we should provide something like OAuthSiteTestCase other than DefaultSiteTestCase to distinct two authentication methods. And it'll be used in our tests
- pywikibot/test/http_tests.py: This is for pywikibot/comms. So all tests should be passed and additional tests may be needed here if we choose to migrating to requests library from httplib2.
This part may include comments in code, documentation in Pywikibot's manual and documentation for developers.
The comments in code should be meaningful and concise.
The How-to documentation for the usage of OAuth authentication could be added to Manual:Pywikibot/Basic use
The documentation for developers should describe the idea of design and the basic structure for convenience of bug fixing and improvement.
The implementation details above is based on what I understand about the code by now. if there are bugs or mistakes, I'll appreciate if you could point them out and help me fix them, so I improve the details :D
Communication of progress
- IRC Channel: This is always my first option for help whenever I am stuck at something. I'll be available on IRC channels, pywikibot and wikimedia-dev, by the nickname VcamX.
- Mailing list: I suscribed to mailing lists such as wikitech-l and Pywikipedia-l. If I can't get instant response, mailing list is my second choice.
- Weekly report: Weekly report is helpful for summing up, reviewing what I have done and what I need to change. It's a good way of communicating progress.
Publishing source code
- Gerrit: Wikimedia Code Review
How and where you plan to ask for help?
- Try to solve by my self: read documentation, search online and so on.
- Ask help from the mentors and community through IRC and mailing list.
Education completed or in progress
B.S. in Computer Science, Zhejiang University, Hangzhou, China
How did you hear about this program?
I searched for organizations available on GSoC 2014 and found MediaWiki. On its Phabricator, I found this project seems good for me. This project was for GCI 2014 originally and I was not sure whether it's available on GSoC 2015. Then I got confirmation from jayvdb. So, I think that's it.
Will you have any other time commitments, such as school work, another job, planned vacation, etc., during the duration of the program?
Before June (included), I must spend some time on my graduation project and graduation affairs, so I decide to start coding earlier for compensating the loss.
We advise all candidates eligible to Google Summer of Code and FOSS Outreach Program for Women to apply for both programs. Are you planning to apply to both programs and, if so, with what organization(s)?
Xapian is an Open Source Search Engine Library written in C++. In GSoC 2014, my work is mainly focusing on refactoring the LETOR module. Link
I've been using Python for many years. I like its conciseness and power.
Besides those, I wrote some Python code for fun. I also have project experience of C/C++, Java.
I like writing Python code. Pywikibot is what I need. I don't apply for other projects. For me, concentrating on one single project is better than diffusing energy on many projects. Focusing makes me more efficient.
Any other info
Wikimedia Foundation is one of the greatest nonprofit organization around the world. I benefited so much from Wikipedia and its sibiling projects as everyone on earth. I'm very willing to get involved in Pywikibot project and MediaWiki to learn and contribute.