Page MenuHomePhabricator

Create a script to generate lots of Items/Properties with lots of Terms
Closed, ResolvedPublic

Description

This script will be useful to generate lots of data so that we can run some performance tests and get some measurements re different implementations of anything we have performance questions on. Also remember to allow high degree of duplication in term texts across languages, types and entity type.

Details

Related Gerrit Patches:

Event Timeline

Restricted Application removed a project: Patch-For-Review. · View Herald TranscriptApr 5 2019, 2:38 PM

@alaa_wmde are you working on this? If so, please link the stuff you have so far.

While not the best fro a design or flexibility perspective, I suspect the most pragmatic approach here is to just create a MW maintenance script in Wikibase (Repo?).

@JeroenDeDauw yup that's what I did will push it in a moment .. thought I did already 😓 still needs final few lines of code

Change 503359 had a related patch set uploaded (by Alaa Sarhan; owner: Alaa Sarhan):
[mediawiki/extensions/Wikibase@master] Add random entities and terms generator maintenance script to repo.

https://gerrit.wikimedia.org/r/503359

Yesterday while thinking about design stuff I randomly realized that we might not need a script like this. Can't we just use https://github.com/Wikidata/WikibaseImport to important a bunch of real entities? If that is too slow, then perhaps we can use https://github.com/JeroenDeDauw/Replicator to import JSON dumps.

@JeroenDeDauw sure we can use that too. even better actually as we will be testing with production data

alaa_wmde added a subscriber: Ladsgroup.

@Ladsgroup if you agree with just using WikibaseImport or Replicator or whatever that already exists, please feel free to close this one as Declined ;)

@Ladsgroup if you agree with just using WikibaseImport or Replicator or whatever that already exists, please feel free to close this one as Declined ;)

WikibaseImport contains a limited number of items and properties which is good for testing but not enough. I think we should keep this maintenance script.

okay will get it done then

WikibaseImport contains a limited number of items and properties

What does this mean? I thought WikibaseImport gets items and properties from Wikidata. How does it contain a limited number?

@JeroenDeDauw I'm testing WikbaseImport in the meanwhile .. it isn't really limited technically and one seem to be able to import all properties and entities in given ranges (from it's readme) .. though it seems to be: 1) quite slow, and 2) importing everything (incl. statements and linked entities) with no way to disable it (adding to the slowness) and 3) requires a separate wikibase instance to import from (a dependency that one might not want to have, esp. locally).

This script generates entities with only terms attached to them (no statements yet, but could be added later with an option, say --with-statements). Those generated random entities can be used for stress tests, and maybe as fixtures for integration tests.

Change 503359 merged by jenkins-bot:
[mediawiki/extensions/Wikibase@master] Add random entities and terms generator maintenance script to repo.

https://gerrit.wikimedia.org/r/503359

alaa_wmde closed this task as Resolved.May 22 2019, 10:27 AM