Page MenuHomePhabricator

Parsoid-based wikitext "linting" tool for "buggy" / "deprecated" wikitext usage; keywords: broken wikitext information
Closed, ResolvedPublic

Description

During parsing and running various transformations, Parsoid has sufficient information about buggy wikitext usage that can be passed back to editors for fixing up (Ex: fostering of content out of tr-rows because of missing td-wikitext tags, or missing newlines, etc).

This can also be a good way to slowly deprecate reliance on edge case behavior by editors (Ex: multi-comment whitespace lines are treated different from single-comment whitespace lines -- this is just a side effect of PHP parser code and should be made consistent in the parser after deprecating its usage).

This is more a longer-term goal and can be a good self-contained project for someone.


See also:

Details

Reference
bz46705

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:29 AM
bzimport set Reference to bz46705.

Another thing we could potentially lint for is auto-inserted start / end tags as mentioned in bug 51945. These are fairly common, so some filter would be needed to narrow it down to cases that are likely to cause problems.

  • Bug 51945 has been marked as a duplicate of this bug. ***

In order to decide if you want to be more lenient in accepting bad table-row wikitext, here's an example of how things can go wrong
https://fr.wikipedia.org/w/index.php?title=Aquila_Italiana&diff=101605787&oldid=90404051
(chatting with Subbu it seems to be caused by unnecessary | marks).

Actually https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia is a related project that we should incorporate into our discussion.

Check the following section on that page which is relevant to this project.
https://en.wikipedia.org/wiki/Wikipedia_talk:WikiProject_Check_Wikipedia#Round_2

Arlolra set Security to None.

Wikimedia will apply to Google Summer of Code and Outreachy on Tuesday, February 17. If you want this task to become a featured project idea, please follow these instructions.

Hardik Juneja already worked on this for last year's GSoC .. we are at this point awaiting internal resources to polish it off, deal with a few more things, and enable it in production.

marcoil lowered the priority of this task from Low to Lowest.Feb 12 2015, 11:47 AM
marcoil subscribed.

Just checking, is there someone working on this task in Wikimedia-Hackathon-2015? If not, please remove the project.

The current implementation plan is to have Parsoid send API requests to MediaWiki, and the newly created Linter extension will receive those requests and store errors in the database, and expose them to users via a special page and API module.

The Linter extension is now deployed to all Wikimedia wikis, so the main infrastructure is in place. We've already seen 2 new lint categories be added during the rollout phase, so more checks can always be added.