Page MenuHomePhabricator

[GSOC 2016 proposal] Article author attribution: on a Generalised Sequence of Contributors (GSoC)
Closed, DeclinedPublic


See T120738 for details.

Name: Sigma WP
IRC: freenode as SigmaWP
Web Page: N/A
Resume: Censored edition available on request
Location: the California Republic
Typical working hours: Afternoon on weekdays

Title: Article author attribution: on a Generalised Sequence of Contributors (GSoC)

Wikipedia is an online encyclopedia that ranks among the most visited websites in the world. The text of its articles is licensed under the CC BY-SA and the GFDL. Both licenses require attribution of the material to all the authors whenever content is copied to other wikis or even outside wikis, eg to PDFs. In all cases, it is necessary to attribute the material to all authors. There is no currently existing method to get a list of all authors, outside of exporting the relevant page as a PDF and then copying the list from there. But even then, there is no control over sorting or filtering the list, or even viewing any other details about the list. We introduce this feature into MediaWiki.

Possible mentors: Addshore, Niharika

As the question concerns the very concept of transferring text from a wiki, it would therefore be most fitting for this to be either in core or available as an extension. There exists already an extension for this (Extension:Contributors). There exists also a feature built in to core that is disabled on WMF wikis, action=credits. However, the way that this is currently implemented does not scale (chiefly a lack of caching), hence the disabled on WMF wikis part.

We propose two new tables in the database. One will be called “contributors”, containing id, page_id, user_id, user_text, is_author (boolean), revision_count (int). The other will be called “contributors_props”, containing id, prop_name, prop_value (cf the already-existing page props table).

This puts us in a position such that the current version of Extension:Contributors as well as action=credits will be able to work off of the contributors table. This table can either be updated sync or async. There is no expected need to recalculate everything for a certain page or make any significant database queries. Additional columns may have to be added, but that can be discussed during the actual GSoC process.

The second table will be a more extendable one, used for other non-essential details, eg whitespace change, characters copied and pasted, etc.

This task is basically just a metaphor for recording a bunch of stuff into sql and then dumping it back out.

I plan to maintain a status report in my userspace, where I will list the parts of the project that have been completed, tasks that are in progress, and things I would work on next. I also plan to keep a log of what I have done, in order to make it easier for mentors to track my progress.

The source code will be maintained in a public repository on GitHub. Parts of the project will be submitted as separate patchsets as I finish them, in order to allow for wider review and feedback as I work on the project.

I will communicate with my mentors mostly through IRC, and outside of reading documentation and Googling, I will probably ask for help in MediaWiki-General on IRC as needed. I will respond to feedback on Gerrit, on my talk page and on IRC.

Format is: Week starting on this day: Stuff to do
23 May: figure out the best database schema to organise the data in. That which has been already described above may face some minor modifications depending on feedback.
30 May: create a mockup of what the UI will probably look like and explain what everything in it does. This should correspond to operations that can be done to extract data from the tables.
7 June: ^. Finish up that bit if necessary, and move to the backend. The action of editing is projected to update the table we outlined above, so that the actual getting of the list of contributors does not recalculate anything. This ensures that the extension will scale for heavily trafficked pages. Hence we'll basically be adding hooks to action=edit to update tables with information of the edits as they're made. By the beginning of next week, the internals of the extension should be effectively finished.
13 June: Integrate the new backend with the current extension and integrate the newly updated current extension with the UI to display actual data from the table we create. Write backend unit tests for sanity. We will need to modify the already-existing extension to pull data using our database schemas etc. Fix bugs and make changes as needed based on code review. (Patchset)
20 June (midterm evaluation): ^ continuation. Set up a working prototype (emphatically not called the “MVP”) running on a labs instance or a private wiki.
27 June: write more detailed unit tests for the backend. Address issues based on code review. (Patchset)
4 July: work on other features, eg sorting by a certain field or displaying miscellaneous info (to be decided with mentors when the task is claimed) about users. These should just be additional hooks to what we propose to do with action=edit, and the UI should simply make permutations of SQL queries with minor processing.
11 July: ^ continuation. Write unit tests as needed to accommodate the additions.
18 July: ^ continuation. Fix bugs and make improvements based on code review, and update docs if necessary. Look into integrating with the API. (Patchset)
25 July: integrate with the API. All functionality available via the regular UI will also be accessible via the API.
1 August: write and update docs as necessary.
8 August: tidy up loose ends and make improvements based on code review. (Patchset)
15 August: wrap everything up
23 August: wrap everything else up

I have been active on the English Wikipedia since 2011 and since then I’ve written three bots and I maintain several tools on Labs ( ).
I am motivated to work on this project because as a Wikipedian, I know that the free movement of content is an essential task, so making it easier will have tangible benefits. As a programmer, I also feel that the project is a good match for my skills. My github is privately available on request.

Event Timeline

Sigma created this task.Mar 24 2016, 8:44 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 24 2016, 8:44 PM
Sigma updated the task description. (Show Details)Mar 24 2016, 8:45 PM

Hi @Sigma. Please add a more detailed timeline for your project. We look for well-scoped quantifiable goals per week or fortnight. You can look at proposals by other applicants to see what a good proposal looks like. T130585: Pywikibot Support for Thanks (GSoC Proposal) for example. Bear in mind that the MVP needs to be complete by mid-term. Thanks and good luck!

Sigma updated the task description. (Show Details)Apr 1 2016, 3:28 AM

I wrote new stuff.

01tonythomas reassigned this task from Addshore to Sigma.Apr 8 2016, 4:36 AM
01tonythomas added a subscriber: 01tonythomas.

Project proposals should be assigned to the author, in the review phase.

@Sigma : Please share your gerrit account name, so that we can take a look at your previous commits! Thank you!

Sigma added a comment.Apr 19 2016, 8:30 PM

@01tonythomas I don't have a gerrit account.

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptApr 19 2016, 8:30 PM

Hi @Sigma, did you submit your proposal on the official Summer of Code website? If yes, please do let us know under which name.

Sigma added a comment.Apr 20 2016, 5:58 AM

@Niharika Would it be possible for me to email that info to you?

@Sigma, sure.

01tonythomas closed this task as Declined.Apr 23 2016, 6:13 AM

Thank you for your proposal, but sadly it didn't make it to the selects this time. You are welcome to apply for Outreachy round'13, or GSoC round 14 with the same proposal ( if it still have consensus ) or a new one if elibible. Please notify your siblings below 18 years of age about the Google Code In 2016 ( ) round and add yourself as a mentor for the same, if eligible. Closing the proposal as Declined, see you around in #wikimedia-dev.