Page MenuHomePhabricator

Enable header/footer generation with RdfWriter without buffering full output.
Closed, ResolvedPublic

Description

RdfWriter currently does not work for generating very large output files without buffering, mainly due to how prefix/namespace declarations are handled. After some discussion with Stas, we identified the following key changes to be needed:

  • writer needs a method to create sub-document writers that share prefix declarations, but are not attached to the parent writer's output buffer. The method to generate such a sub-writer could be snippet(). Or sub(), with the current sub() method being moved to inline().
  • it should be possible to call drain() at any time, e.g. to generate the "header" part of the output. drain() would make a state transition to DOCUMENT.
  • there should be a finish() method (for top-level writers) that would close the document (transition to the DRAIN state, which should be renamed to END), and then call drain() and return the resulting string.
  • prefix() can (for now) only be called on the top-level writer, and only before start()
  • reset() should reset declared prefixes only when called on a top level (document role) writer.

To make use of these changes, RdfSerializer needs to get startDocument() and endDocument() methods.

Event Timeline

daniel created this task.Mar 27 2015, 8:15 PM
daniel raised the priority of this task from to Needs Triage.
daniel updated the task description. (Show Details)
daniel added a project: Wikidata.
daniel added subscribers: daniel, Smalyshev.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMar 27 2015, 8:15 PM
daniel updated the task description. (Show Details)Mar 27 2015, 8:16 PM
daniel set Security to None.

Change 199543 had a related patch set uploaded (by Smalyshev):
T94224: Refactor prefixes to be output only once, eliminate need to clean prefixes on dump

https://gerrit.wikimedia.org/r/199543

Smalyshev triaged this task as Normal priority.

Not sure separate sub-writer is even needed. See the patch - it looks like just adding FINISH state is enough. But we could tweak it further if you want, I'm just not sure we need to complicate it with sub-writers, it looks pretty good as it is.

Change 199543 merged by jenkins-bot:
T94224: Refactor prefixes to be output only once, eliminate need to clean prefixes on dump

https://gerrit.wikimedia.org/r/199543

Smalyshev closed this task as Resolved.Mar 31 2015, 9:09 PM