Page MenuHomePhabricator

Wikipedia requires a patch to load its data from the dumps with mwdumper
Open, LowPublic

Description

I just read: http://www.xarg.org/2016/06/importing-entire-wikipedia-into-mysql/

This could be several things:

  • The importer is outdated
  • The gap between HEAD and the latest stable released version
  • The dumps have some kind of incompatibility
  • The mediawiki code is old/not suitable for Wikipedias
  • The MySQL databases in production are not in sync with Mediawiki

Investigate and solve the issue, in case it is not already reported.

This is the link to the patch suggested on that article: https://gist.github.com/infusion/3c5007c73410b3fea3de76a10628c31e

Event Timeline

jcrespo created this task.Oct 2 2016, 3:19 PM
jcrespo triaged this task as Low priority.Oct 13 2016, 12:48 PM
jcrespo moved this task from Triage to Backlog (help welcome) on the DBA board.

Low because it is not yet verified or clear.

The problem seems to be that mwdumper is providing the page_counter column, which was removed in MediaWiki 1.25:

Given the timestamp of mwdumper files, it's clearly outdated.

awight added a subscriber: awight.Apr 13 2017, 8:50 AM
brion claimed this task.Sep 27 2017, 3:47 PM
brion added a subscriber: brion.

Going to land a couple maintenance patches this weekend, adding this to my queue.

awight removed a subscriber: awight.Mar 21 2019, 4:00 PM
Aklapper removed brion as the assignee of this task.Jun 19 2020, 4:28 PM
Aklapper added a subscriber: Aklapper.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)