More knowledge transfer on the specific details and the pieces of the Hadoop Java API that are relevant to solving the "too many revisions" problem when trying to publish and split big XML files.
Description
Description
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Open | VirginiaPoundstone | T345988 [Epic] XML MediaWiki data dumps for right to fork | |||
Resolved | Milimetric | T330296 Dumps 2.0 Phase I: Proof of concept for MediaWiki XML content dump via Event Platform, Iceberg and Spark | |||
Open | None | T346147 Generate XML dumps for simplewiki | |||
Resolved | Milimetric | T335862 Implement job to generate Dump XML files | |||
Resolved | Milimetric | T344693 Understand Hadoop OutputFormat and how to solve the problem |