Page MenuHomePhabricator

Create randomly split partial entity dumps
Closed, InvalidPublic

Description

It's still useful to have so we can parallelize reading massive dumps. Similar to XML dumps. (Each entity would be in one dump.)

Event Timeline

Lucas_Werkmeister_WMDE renamed this task from Partial random dumps to Create randomly split partial entity dumps.Jun 22 2021, 9:41 AM
Lucas_Werkmeister_WMDE updated the task description. (Show Details)

We should think about splitting obvious classes (academic papers and astronomical objects) before going random with the rest.

I'm closing this for now as we'll need to look at the topic more holistically and I believe a random split is probably last on the list of things we want to do before some other more meaningful splits.