Page MenuHomePhabricator

Refactor GlobalUsage to support Commons Datasets
Open, NormalPublic

Description

When using data from Commons, a pretty standard set of requirements arises: usage tracking, updates, statistics etc. This is essentially what Extension:GlobalUsage does for shared images from Commons. Therefore, it totally makes sense to build upon it. I think that GlobalUsage can continue tracking files only while shared data specific code can live in its own extension.

The existing GU schema:

CREATE TABLE /*_*/globalimagelinks (
	-- Wiki id
	gil_wiki varchar(32) not null,
	-- page_id on the local wiki
	gil_page int unsigned not null,
	-- Namespace, since the foreign namespaces may not match the local ones
	gil_page_namespace_id int not null,
	gil_page_namespace varchar(255) not null,
	-- Page title
	gil_page_title varchar(255) binary not null,
	-- Image name
	gil_to varchar(255) binary not null
) /*$wgDBTableOptions*/;

CREATE UNIQUE INDEX globalimagelinks_to_wiki_page 
	ON /*_*/globalimagelinks (gil_to, gil_wiki, gil_page);
CREATE INDEX globalimagelinks_wiki 
	ON /*_*/globalimagelinks (gil_wiki, gil_page);
CREATE INDEX globalimagelinks_wiki_nsid_title
	ON /*_*/globalimagelinks (gil_wiki, gil_page_namespace_id, gil_page_title);

This table already has 360M+ rows so altering it would be no fun and I'd like to avoid that.

In addition to table name, it always assumes that gl_to points to an image. Also, we probably want to store the subtype of data. Summarizing this all, the schema looks like that so far:

CREATE TABLE /*_*/globaldatalinks (
	-- Wiki id
	gdl_wiki varchar(32) not null,
	-- page_id on the local wiki
	gdl_page int unsigned not null,
	-- Namespace, since the foreign namespaces may not match the local ones
	gdl_page_namespace_id int not null,
	gdl_page_namespace varchar(255) not null,
	-- Page title
	gdl_page_title varchar(255) binary not null,
        -- Data page namespace
	gdl_to_namespace int not null,
	-- Data page title
	gdl_to_title varchar(255) binary not null,
	-- Data type, currently 'tabular' or 'map'
	gdl_type varchar(16) not null
) /*$wgDBTableOptions*/;

Event Timeline

MaxSem created this task.Dec 23 2016, 9:31 PM
Restricted Application added a project: Multimedia. · View Herald TranscriptDec 23 2016, 9:31 PM
Restricted Application added a subscriber: Aklapper. · View Herald Transcript
MaxSem renamed this task from Refactor GlobalUsage to support shared data to Refactor GlobalUsage to support Commons Datasets.Dec 24 2016, 12:16 AM
MaxSem updated the task description. (Show Details)

Change 339585 had a related patch set uploaded (by MaxSem):
Move files into subdirectories

https://gerrit.wikimedia.org/r/339585

Change 339586 had a related patch set uploaded (by MaxSem):
Namespace this extension

https://gerrit.wikimedia.org/r/339586

Change 339587 had a related patch set uploaded (by MaxSem):
Convert to new array syntax

https://gerrit.wikimedia.org/r/339587

Change 339585 merged by jenkins-bot:
[mediawiki/extensions/GlobalUsage@master] Move files into subdirectories

https://gerrit.wikimedia.org/r/339585

Change 339587 merged by jenkins-bot:
[mediawiki/extensions/GlobalUsage@master] Convert to new array syntax

https://gerrit.wikimedia.org/r/339587

debt triaged this task as Normal priority.Jun 9 2017, 7:46 PM
debt added subscribers: Gehel, debt.

Hi @Gehel - can you confirm that this is now done and merged?

Gehel added a comment.Jun 9 2017, 9:11 PM
This comment was removed by Gehel.
Gehel added a comment.Jun 9 2017, 9:19 PM

From what I can see, at least one patch mentioned above has not been merged (https://gerrit.wikimedia.org/r/#/c/339586/). All this seems to be related to the Mediawiki GlobalUsage extension, and honestly, I don't know anything about Mediawiki, or about the lifecycle of extensions.

In short: this does not look to be completed yet (patch not yet merged), but I dont really know what this is about...

debt added a comment.Jun 9 2017, 9:45 PM

Thanks, @Gehel. @MaxSem can you shed some light on what we were really trying to do with this ticket?

MaxSem added a comment.Jun 9 2017, 9:46 PM

No. A bit of cleanup was done, but this task as it is is not done.

Moving off the sprint board - the Discovery team won't be able to finish this work at this time.

Change 339586 abandoned by MaxSem:
Namespace this extension

https://gerrit.wikimedia.org/r/339586