Page MenuHomePhabricator

Need a way to test with data set reasonably close to production
Closed, DeclinedPublic

Description

In order to produce data sets to test various query scenarios, we need to test wikidata with big datasets that resemble production data. Unfortunately, labs databases (wikidatawiki.labsdb) are not useful for this since they do not have the actual page data, so no way to access wikidata content. Loading whole wikidata dump anew for each instance we need to test seems wasteful, so it would be nice if there was a central repository with a read-only copy of production data or reasonable approximation of it, which could be used to test against it.

Event Timeline

Smalyshev raised the priority of this task from to Needs Triage.
Smalyshev updated the task description. (Show Details)
Smalyshev subscribed.

Are we even able to get a 'full dump' as a starting point for creating this dataset?

I was able to install the stack and browse to the interface... but swiftly discovered it was completely empty. With such a complex tool, example data is crucial. If i have to spin it up on a large server to test my ideas thats fine, but to do any of that I need some kind of 'easily' loadable data dump and the 'easy' procedure to load it.

chasemp subscribed.

how big is big in concrete terms?