Page MenuHomePhabricator

Write extension to store data about what revisions were imported from what wikis
Open, LowPublic

Description

Especially in this age of Scribunto, it sometimes happens that revisions will be imported that end up breaking templates or causing other problems. E.g., you might import a bunch of templates from Wikipedia and then import some from other wikis and find that you're getting script errors because you overwrote something you didn't mean to. It can be hard to sort out which templates are causing what problems. Re-importing from Wikipedia doesn't necessarily help because those revisions might be older than the revisions that are causing the problems.

It would be helpful to be able to do a query and find out what pages' current revisions (page.page_latest) were imported from what wikis. That way, one could sort out what needs to be reverted. Therefore, I propose one of two solutions for storing in the database the data that is in <sitename> in the XML file:
(1) Add a revision.rev_imported field to store the name of the source wiki of imported revisions.
(2) Add a new table that will store the same data as in option 1.

Option 1 makes sense if a large proportion of the revisions on the wiki will have been imported from other wikis. Option 2 makes sense if the proportion is lower. I suspect that people are going to want to go with option 2.


Version: master
Severity: enhancement

Details

Reference
bz57490

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:40 AM
bzimport set Reference to bz57490.
bzimport added a subscriber: Unknown Object (MLST).

Another potential use of this might be if a wiki wanted to make sure it was in compliance with licensing requirements; it could do a query to find out what pages have revisions imported from Wikipedia, and verify that those pages also have the CC-by-SA license template applied (if necessary).

It could also produce some interesting statistics for WikiApiary on what revisions, pages, etc. are widely imported where in the wikisphere. We could determine what percentage of a wiki's revisions (and current revisions) consist of imported content, which could produce some better measures of how much new content a wiki has added. We could do a diff of the most recently imported revision of a page, and its current revision, and see how much of the content differs to figure out the statistic for that page.

For WikiApiary purposes, an acceptable alternative might be to fix bug 60090, adding the revision data to log_params.