Final summary
Overall, I'm quite satisfied with this internship. The project turned out to be very interesting and challenging (the way I like it!). I didn't finish everything that was planned, but the plan was quite ambitious from the start. Over the course of the internship I also stumbled upon many issues that we had not foreseen when scheduling the work, so overall I think I put a bit more effort in the project than planned. I will be finishing the work that I started in the coming months.
I have definitely learned '''a lot''', not only about PHP programming and testing, but also about software design, working with large codebases and more. Huge thanks to all reviewers of my code and my mentors, who were very helpful and gave me numerous hints.
This project affects many wikis in a significant way and it's quite satisfying to work on something that will be used by many. The best moment of this internship for me was finally deploying the finished manual reverts feature to Wikipedia and seeing it in action. Really cool.
Finished work
Directly related to the project
- T258952 – Introduce the EditResult class and pass it to extensions. This object encapsulates information about the effects of the edit in the context of the page, for example whether the edit was a revert and which edits were reverted. It simplifies greatly revert detection for extensions.
- T252366 – Choose the definition of a "reverted" edit. Decided on revert detection method to be used in the project.
- T256001 – Detect manual reverts and mark them with mw-manual-revert tag. The biggest feature I've managed to finish. Based on the comments, I think wikipedians mostly like it.
- T256915 – Fix handling of multi-edit undos. Up until now it was impossible to inform extensions about multi-edit undos, as EditPage lacked code that would properly track all required parameters. That is fixed now, the patch also introduced a few integration tests that ensure undos are detected properly in all cases.
- T259732 – Make EditResult serializable. This is used to store it the DB for later possible use. It will be also used to serialize RevertedTagUpdateJob.
- T259733 – Save the serialized EditResult in ct_params field of the change_tag table for revert tags. This information can be later retrieved for analytics purposes. It can also be used to schedule a delayed RevertedTagUpdateJob.
- T257766 – Fix handling of null edits in PageUpdater. This was discovered due to a production issue with duplicate notification emails that affected pretty much all WMF wikis. The issue was… bad, but we managed to fix it quickly once we identified it.
- Don't mark "dirty" undos with the mw-undo tag. This is to prevent users from marking arbitrary edits as undos. It is mostly in preparation for the reverted tag.
- T257215 – Deprecate the RollbackComplete hook. The hook is now redundant with PageSaveComplete providing all necessary information to extensions.
- T258963 – Document the implemented features. Some of it is not finished yet, as it has to wait for the relevnt features to be completed.
Extras
- T258951 – Add getRevisionIdsBetween method to RevisionStore. This method is to be used by core and various extensions that want to determine which revisions were reverted, given the newest and oldest reverted revisions.
- Add INCLUDE_* constants in RevisionStore. This is a minor extra to clean up the use of 'include_*' options in various get*Between methods.
- T216297 – Indicate reverts in EventBus data. Now the Product Analytics team can fully use all information about reverts provided by the new EditResult object. The task involved modifying the appropriate event schema and writing code for filling it with information.
(Yet) unfinished work
- T254074 – The reverted tag. It's mostly finished, but still in code review. This is the main feature that I was supposed to implement, but it turned out to be ''really'' hard to implement properly. When I started implementing it, I tried to imagine as many ways to abuse it as possible and there turned out to be quite a lot. I have… some experience with wiki communities and I know that there are always people who want to abuse wiki's mechanisms to prove their point or to wage "wars" on other users. I really wouldn't want to see people using it this way, so I came up with a number of mitigations that are supposed to minimize the abuse potential of it. These mechanisms are quite complex (see T259014 for details) and reviewing that is equally as hard, so this particular feature being a bit late doesn't surprise me a lot.
- T260524 – Integrating the reverted tag with FlaggedRevs. There is a new revert approval mechanism introduced by the reverted tag that extensions can hook into. FlaggedRevs is a content management extension used on Wikimedia wikis that have their code updated weekly, so FlaggedRevs should be the first extension we integrate with. The commit is mostly ready, it just needs CR.
- T258921 – The maintenance script for filling the manual revert and reverted tags. This is an extra feature that I have proposed to implement as it's something that I would personally gladly use on my wiki. I started implementing it, but then I stumbled upon all the issues revolving around the reverted tag, so I had to temporarily put this on hold until I finish the reverted tag. I will come back to this task later for sure. I think it should be shipped in 1.36 together with the reverted tag to make the new feature "complete".
Other things I may work on later
- T153570 – Echo integration. The notification system currently uses a complex revert detection system that is now largely redundant. That needs to be cleaned up. ProcrastinatingReader has started working on this, but I will probably try to help him a bit later.
- FlaggedRevs improvements. When Echo starts using the new EditResult-based revert detection, we will be able to remove Echo-specific code in FlaggedRevs. Besides that, FlaggedRevs currently does not mark its rejects as undos properly. With the improvements I made to EditPage it should be now possible for FlaggedRevs to do that. This way all revisions in a multi-edit reject will be marked as reverted.
- Integrating the reverted tag with Approved Revs and possibly Moderation. I will have to look into both of these extensions carefully and I'll probably suggest some patches to integrate them properly with the reverted tag feature.
The original proposal
Profile Information
Name: I don't want to disclose it publicly :)
IRC nickname on Freenode: Ostrzyciel
User page: https://www.mediawiki.org/wiki/User:Ostrzyciel
GitLab: https://gitlab.com/Ostrzyciel
Location: Poland
Typical working hours: 10AM to 10PM CEST (UTC+2)
Synopsis
Based on T164307:
It would be valuable to be able to filter out or highlight edits that have already been "rolled back" or "undone". This task is about adding a "Reverted" filter to the Tagged Edits interface in MediaWiki-Recent-changes
We will want to identify the edit that is being reverted and tag it with a "reverted" tag.
We'll need to detect when a revision is being rolled back / undone, and then apply a tag to the revision that it is rolling back. This involves defining a tag and implementing a hook which can apply that tag in certain conditions.
I think a lot of the logic for this can based on the discussion in T152434: Add method to Revision to check if it was a Revert, and whether an edit was Reverted. I want to implement this task as well, as a lot of the logic is overlapping, it's just the question of how we integrate these two functionalities together. I would propose this method to just check whether the revision has the reverted tag, as that would be present anyway. A useful addition may be a maintenance script for reprocessing all edits in the DB and applying the reverted tag retroactively.
I have worked a lot with MW (see section Past Experience), so I think a more ambitious approach to this project would suit me :)
Possible Mentor(s)
@kostajh and @Catrope
I have not contacted them yet. I initially planned to do a completely different project and I focused on that. Unfortunately I didn't find anyone willing to mentor that (see T247406).
Deliverables and timeline
Period | Task |
May 4 to May 31 | Community bonding period – refining the proposal and timeline, initial research. First draft on the definition of a reverted edit. |
June 1 to June 7 | Finalizing the definition of a reverted edit (a small and quick RfC with the community maybe?), see T152434 for a discussion on this. Deeper research into technical options, choosing implementation details. |
June 8 to June 21 | Adding the EditResult class, refactoring code in the PageUpdater (T152434). |
June 22 to June 28 | Updating hooks to use the new class. |
June 29 to July 3 | Phase 1 evaluation |
July 4 to July 12 | Implementing SHA1-based revert detection. |
July 13 to July 19 | Adding the new "reverted" edit tag. |
July 20 to July 26 | Saving additional info about the revert in the reverted edit tag. |
July 27 to July 31 | Phase 2 evaluation |
August 1 to August 15 | (optional, but I'd love to have something like this as a system admin) Writing a maintenance script for filling out the reverted tag retroactively for old edits. Probably of little use to Wikimedia given Wikipedia's huge DB size, but could be useful for third-party wikis. |
August 16 to August 23 | Final corrections, maybe submitting a few patches to extensions and other modules. |
August 29 to August 31 | Submitting the final work |
Note: due to the unexpected complexity of the reverted tag feature, I had to give up on implementing the maintenance script, at least for now. I hope the extra effort put into other parts of the project make up for it. :)
Participation
To schedule tasks, report progress and do most other dev things I will use Phabricator. The code will of course be in Wikimedia's Gerrit. As for communication I prefer IRC and email, but I can use any other means of communication, such as Zulip.
About Me
I study at the Warsaw University of Technology. I have a BSc in Computer Science and I am currently pursuing a MSc in Data Science (a CS equivalent). I think I stumbled upon the GSoC program accidentally while browsing mediawiki.org. During the summer I will not have any other significant commitments, I may be unavailable for a few days here and there, but I will definitely deliver everything on time :)
I'm interested in the idea of free culture and software, I currently lead a project dedicated to free humor (more in the next section). I think free culture and open-source software are vital for humanity and that is partially why I participate in MediaWiki-related projects.
Past Experience
I've been for over a year a system admin of a medium-sized wiki – Nonsensopedia, which is kind of like Uncyclopedia in Polish, but completely different in some regards, most notably it puts a much larger focus on proper licensing and making sure everything there really is free and funny. It also has much stricter standards regarding hate speech and controversial stuff.
I am also a sysop of this wiki, so I am very familiar with Special:RecentChanges interface and other MediaWiki moderation tools.
By being a sysadmin I gathered a lot of experience with installing and maintaining MediaWiki. Over the last year I also wrote a few MW extensions for Nonsensopedia that could be useful for other wikis as well, you can find them listed on my user page. All MW-related code I wrote is on our GitLab group, including some forks of extensions and other tools.
Two of the extensions I wrote use OOUI for their interface:
- RatePage uses it for its administration panel (all in PHP, similar to AbuseFilter's UI).
- Svetovid uses OOUI for a complex and dynamic interface that enables semi-automatic link creation (JS and PHP).
I also wrote some patches for MW and other extensions. This includes T246127, T231481, T240893, T205219, d7ff338a4cb3, T228584, T228579 and T248826. There are also a few patches in different states that weren't merged (yet).
I also attended the Wikimedia Hackathon in 2019, which really got me interested in MW development. I met a lot of interesting people there that helped me do a lot of this stuff, thank you! During that hackathon I worked with Isarra on THICC, an extension that was supposed to utilize Multi-Content Revisions heavily. We never finished it, but I had to get an understanding of MCR in the process :)
As for other programming experience I wrote things in all kinds of languages (C, C++, C#, PHP, JS, Lua, Python, R, Matlab, Forth, 6502 assembly, yeah, I'll stop), but of course I am not proficient in all of them :) I did a lot of university and hobby projects and I also wrote a few commercial applications (databases, production management, webdev, office automation). No programming can of worms scares me :)