Page MenuHomePhabricator

Alternative output formats
Closed, ResolvedPublic

Description

parse_wiki.exs should be able to output either jsonlines or CSV / TSV. Implement a commandline flag which switches between these output formats.

It's possible that some of the columns don't fit nicely in a CSV file, mainly potential_ref_transclusions. Rather than jam these into some special format, we could also produce multiple output files and these could land in a dedicated file. This might still be JSON, or we might produce a CSV where each row no longer corresponds to a page, but instead could have columns like (template name, number of pages where we see it producing refs).

For Review