When using data from Commons, a pretty standard set of requirements arises: usage tracking, updates, statistics etc. This is essentially what Extension:GlobalUsage does for shared images from Commons. Therefore, it totally makes sense to build upon it. I think that GlobalUsage can continue tracking files only while shared data specific code can live in its own extension.
The existing GU schema:
CREATE TABLE /*_*/globalimagelinks ( -- Wiki id gil_wiki varchar(32) not null, -- page_id on the local wiki gil_page int unsigned not null, -- Namespace, since the foreign namespaces may not match the local ones gil_page_namespace_id int not null, gil_page_namespace varchar(255) not null, -- Page title gil_page_title varchar(255) binary not null, -- Image name gil_to varchar(255) binary not null ) /*$wgDBTableOptions*/; CREATE UNIQUE INDEX globalimagelinks_to_wiki_page ON /*_*/globalimagelinks (gil_to, gil_wiki, gil_page); CREATE INDEX globalimagelinks_wiki ON /*_*/globalimagelinks (gil_wiki, gil_page); CREATE INDEX globalimagelinks_wiki_nsid_title ON /*_*/globalimagelinks (gil_wiki, gil_page_namespace_id, gil_page_title);
This table already has 360M+ rows so altering it would be no fun and I'd like to avoid that.
In addition to table name, it always assumes that gl_to points to an image. Also, we probably want to store the subtype of data. Summarizing this all, the schema looks like that so far:
CREATE TABLE /*_*/globaldatalinks ( -- Wiki id gdl_wiki varchar(32) not null, -- page_id on the local wiki gdl_page int unsigned not null, -- Namespace, since the foreign namespaces may not match the local ones gdl_page_namespace_id int not null, gdl_page_namespace varchar(255) not null, -- Page title gdl_page_title varchar(255) binary not null, -- Data page namespace gdl_to_namespace int not null, -- Data page title gdl_to_title varchar(255) binary not null, -- Data type, currently 'tabular' or 'map' gdl_type varchar(16) not null ) /*$wgDBTableOptions*/;