Page MenuHomePhabricator

"page_is_redirect" wrong on Indonesian (and probably other non-English) wiki dumps. Doesn't recognize localization of "#REDIRECT"
Closed, ResolvedPublic

Description

Probably duplicate of Bug #12507 and possibly related to Bug #10931 and #30513

I'm downloading idwikibooks and idwiki dumps (i.e. idwiki-pages-articles.xml.bz2). And after I install it locally and go through random pages several times I've stumbled into many 'broken' redirects (as if with parameter "redirect=no"). They didn't redirect me to the proper pages.

From what I found out, the problem was because the site does not recognize the pages with "#ALIH [[title]]" tag (Indonesian for "#REDIRECT [[title]]") as a redirect, because they're marked as "page_is_redirect = 0" on the "page" table. Only redirects with #REDIRECT tag are recognized (and thus redirected properly). I've checked the special pages online and they seem to list all the redirects properly, so it seems this problem is only for the dumps. (Note: I've already set $wgLanguageCode = "id", so that wasn't the problem)

I could create new redirects using "#ALIH [[title]]" just fine, but I couldn't fix the existing redirects. I've tried to purge the page and run "refreshLinks.php --redirects-only" to no avail. Null edit gave me worse result, it displays "1. ALIH [[title]]", as if it's an ordered list. It was strange, because I can make a new redirects using "#ALIH"s, but get an error doing a null edit on existing "#ALIH"s.

I've also downloaded idwiki-latest-redirect.sql.gz and put them in "redirect" table, but as soon as I run rebuildall.php, they were overwritten by the data from "page" table.

Right now my only option left is run a bot locally, and change all "#ALIH"s to "#REDIRECT"s. Hope this gets fixed on the next version.


Version: unspecified
Severity: normal

Details

Reference
bz38919

Event Timeline

bzimport raised the priority of this task from to Needs Triage.Nov 22 2014, 12:49 AM
bzimport set Reference to bz38919.

Is this a problem with the data in the dump, or the dump import process?

How are you importing them, and into what?

A quick spot check of idwiki-20120727-pages-articles.xml.bz2 shows that <page>s whose current <text> is #ALIH [[whatever]] are marked with <redirect> elements, so the dump looks ok:

<page>
  <title>Main Page</title>
  <ns>0</ns>
  <id>92</id>
  <redirect title="Halaman Utama" />
  <revision>
    <id>2637971</id>
    <parentid>652005</parentid>
    <timestamp>2009-10-30T04:14:53Z</timestamp>
    <contributor>
      <username>Bennylin</username>
      <id>469</id>
    </contributor>
    <minor />
    <comment>+kat</comment>
    <sha1>a8p1abnpz1lzdoypdtv1fffrtfch444</sha1>
    <text xml:space="preserve">#ALIH [[Halaman Utama]]

[[Kategori:Pengalihan yang dilindungi]]</text>

  </revision>
</page>

If you're using maintenance/importDump.php to import the XML into MediaWiki, then this is probably related to bug 12507.

(In reply to comment #1)

Is this a problem with the data in the dump, or the dump import process?

How are you importing them, and into what?

I'm importing them with MWDumper, into a fresh installation of MW 1.18.

Benjamin Collier's build to be exact. I found out the problem was on MWDumper's code https://github.com/bcollier/mwdumper/blob/master/src/org/mediawiki/importer/Revision.java

Close this as Fixed. Thanks Brion.

*** This bug has been marked as a duplicate of bug 7497 ***