Page MenuHomePhabricator

Different field order of pagelinks only for Southern Azerbaijani Wikipedia
Closed, DeclinedPublic

Description

Hello folks at Wikimedia :)
I noticed that the dump pagelinks for the Southern Azerbaijani Wikipedia follows the following table:

CREATE TABLE `pagelinks` (
  `pl_from` int(10) unsigned NOT NULL DEFAULT '0',
  `pl_from_namespace` int(11) NOT NULL DEFAULT '0',
  `pl_namespace` int(11) NOT NULL DEFAULT '0',
  `pl_title` varbinary(255) NOT NULL DEFAULT '',
  PRIMARY KEY (`pl_from`,`pl_namespace`,`pl_title`),
  KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`),
  KEY `pl_backlinks_namespace` (`pl_from_namespace`,`pl_namespace`,`pl_title`,`pl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

while all the other Wikipedias follow:

CREATE TABLE `pagelinks` (
  `pl_from` int(8) unsigned NOT NULL DEFAULT '0',
  `pl_namespace` int(11) NOT NULL DEFAULT '0',
  `pl_title` varbinary(255) NOT NULL DEFAULT '',
  `pl_from_namespace` int(11) NOT NULL DEFAULT '0',
  PRIMARY KEY (`pl_from`,`pl_namespace`,`pl_title`),
  KEY `pl_namespace` (`pl_namespace`,`pl_title`,`pl_from`),
  KEY `pl_backlinks_namespace` (`pl_from_namespace`,`pl_namespace`,`pl_title`,`pl_from`)
) ENGINE=InnoDB DEFAULT CHARSET=binary ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=8;

Is possible to change it to the standard table order?

Keep up the great work, Enrico

Event Timeline

The sql in maintenance/tables.sql for creating the pagelinks table is the following:

--
-- Track page-to-page hyperlinks within the wiki.
--
CREATE TABLE /*_*/pagelinks (
  -- Key to the page_id of the page containing the link.
  pl_from int unsigned NOT NULL default 0,
  -- Namespace for this page
  pl_from_namespace int NOT NULL default 0,

  -- Key to page_namespace/page_title of the target page.
  -- The target page may or may not exist, and due to renames
  -- and deletions may refer to different page records as time
  -- goes by.
  pl_namespace int NOT NULL default 0,
  pl_title varchar(255) binary NOT NULL default '',
  PRIMARY KEY (pl_from,pl_namespace,pl_title)
) /*$wgDBTableOptions*/;

so in theory, azb wiki has the correct create statement and all of the other wikis are wrong.

In any case, it shouldn't really matter what the order of the fields in the table is, as long as they are all present and of the right type. Can you explain a bit how this is causing a problem for you?

ArielGlenn triaged this task as Normal priority.Tue, Sep 24, 6:41 AM
ArielGlenn moved this task from Backlog to Active on the Dumps-Generation board.Tue, Sep 24, 6:48 AM

Good morning Ariel :)
Pretty much I found this as an inconsistency while processing all Wikipedia dumps, so I reported it.

Loading the dump into a DB is way too slow, so (as tricky as it sounds) I transform the needed SQL files into CSV on the fly.

Of course this ordering difference later causes the real issue for me ;)

Enrico

Good morning Enrico!

After looking into the issue with one of our dbas:
When the from_namespace field was added to the table in 2014, the CREATE TABLE statement got it added as the second field, like azbwiki has it. The ADD statement to add it to existing wikis did not have an AFTER clause to specify a field after which it should be added, so by default it would be added to the end of the row. See https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/117373/14/maintenance/tables.sql and https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/117373/14/maintenance/archives/patch-pl_from_namespace.sql

This could be fixed by a schema change but it's a lot of work, and since the order of the fields makes no difference for mysql, we'd prefer to avoid it.

Might you instead update your sql to cv converter to account for the field names and put them in a fixed order? You might go with the first column (usually a key and usually some unique id), and then the rest in alphabetical order.

I see, thanks! Actually not bad as a suggestion, I'll keep it in mind :)

Have a nice day, Enrico

@Ebonetti90 I'd like to close this task as declined, meaning that we won't update the schema and you'll take steps on your end to adjust your script. OK by you?

I understand the situation, I do agree.

Enrico

ArielGlenn closed this task as Declined.Wed, Oct 2, 8:48 AM
ArielGlenn moved this task from Active to Done on the Dumps-Generation board.Thu, Oct 3, 6:52 AM