Page MenuHomePhabricator

Commons limit on data is 2,048 kilobytes
Open, Needs TriagePublic

Description

I was adding data to Commons and got "Error: The text you have submitted is 3,171.888 kilobytes long, which is longer than the maximum of 2,048 kilobytes. It cannot be saved."

This was to create an interactive heat map like this on Wikipedia https://ourworldindata.org/per-capita-co2

Event Timeline

Doc_James created this task.Mon, Oct 7, 8:12 AM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMon, Oct 7, 8:12 AM
Aklapper changed the task status from Open to Stalled.Mon, Oct 7, 8:59 AM

@Doc_James: Please follow https://www.mediawiki.org/wiki/How_to_report_a_bug and provide context and clearer steps to reproduce - thanks!

Sure so I tried to add a bunch more data for another 100 years here https://commons.wikimedia.org/wiki/Data:CO2PerCapita.tab

It was 3,171.888 kilobytes long which generated the error mentioned.

Thanks! I assume this is about StructuredDataOnCommons. Hence adding project tag so others can find this task when searching for tasks under that project or looking at that project workboard.

Aklapper renamed this task from Commons limit on data to Commons limit on data is 2,048 kilobytes.Mon, Oct 7, 9:49 AM
Aklapper changed the task status from Stalled to Open.

The Data: namespace is tabular data, not structured data, and as far as I’m aware that’s a separate project.

Yurik added a subscriber: Yurik.Mon, Oct 7, 4:18 PM

Correct, this is the tabular data hitting the 2MB page limit. One relatively simple solution would be to fix JsonConfig base class to store data as "compact", rather than pretty-printed JSON (there shouldn't be any externally visible consequences because JSON is always reformatted before saving). That would immediately increase max storage by a significant percentage, especially for .map (geojson tends to have a lot of small arrays, so when they break up between lines and prefixed with tons of spaces, the size increases several times the original). I suspect Wikibase has had to solve a similar problem storing their items in the MW engine.

It’s already stored compactly:

lucaswerkmeister-wmde@mwmaint1002:~$ mwscript shell.php commonswiki
Psy Shell v0.9.9 (PHP 7.2.22-1+0~20190902.26+debian9~1.gbpd64eb7+wmf1 — cli) by Justin Hileman
>>> $services = MediaWiki\MediaWikiServices::getInstance();
=> MediaWiki\MediaWikiServices {#208}
>>> $revision = $services->getRevisionStore()->getRevisionByTitle( $services->getTitleParser()->parseTitle( 'Data:CO2PerCapita.tab' ) );
=> MediaWiki\Revision\RevisionStoreRecord {#742}
>>> $services->getBlobStore()->getBlob( $revision->getSlot( 'main' )->getAddress() );
=> "{"license":"CC-BY-4.0","description":{"en":"CO<sub>2</sub> emissions per capita"},"sources":"https://ourworldindata.org/per-capita-co2","schema":{"fields":[{"name":"country","type":"string","title":{"en":"ISO Country Code"}},{"name":"year","type":"number","title":{"en":"Year"}},{"name":"tonnes","type":"number","title":{"en":"tonnes per capita"}}]},"data":[["AFG",1900,0],["AFG",1901,0],["AFG",1902,0],["AFG",1903,0],…

Is it possible to double or triple the maximum allowed size?

Yurik added a comment.Mon, Oct 7, 9:54 PM

@Lucas_Werkmeister_WMDE thanks, but this is very surprising, I was 99.99% certain it was storing it pretty-printed... Either that, or it did the size limit check in the pretty-printed version before storing. Would it be possible to do a direct SQL query for that data, and also to run a MAX( LEN( data ))to see the largest page in the Data namespace on Commons? Thanks for checking!

I don’t think that’s possible, but you can check the page_len for yourself in Quarry.