Page MenuHomePhabricator

MCR: Import all slots from XML dumps
Closed, ResolvedPublic

Description

Once we have T174031: MCR: Include all slots in XML dumps, we need to also be able to read/import slots other than the main slot from dumps. This means implementing support for XML schema version 0.11 in WikiImporter. To enable this, we'll probably want to turn WikiRevision into a wrapper for MutableRevisionRecord.

Event Timeline

I just realized we never did that. It seems kind of important ;)

From other tickets i gather that there was an agreement for a format that would contain all slots, seems like this one was version 0.11 https://www.mediawiki.org/wiki/Requests_for_comment/Schema_update_for_multiple_content_objects_per_revision_(MCR)_in_XML_dumps#Schema

But the version of the current dumps on commons are version="0.10" so, are there any dumps that include the slots that in turn include the structure data?

cc @ArielGlenn which might know the answer to the question

@Nuria 0.10 is still the default format, we should probably change that. Maybe this could 3even make it into 1.34 still, I suppose we just forgot to move it forward. @CCicalese_WMF, thoughts?

@daniel: so I understand since i know little about all this. At this time the slots that contain the structure data items on say, a page in commons, are NOT included in the dumps with the page itself. Correct?

Is that structure data being dumped elsewhere on its own?

@daniel: so I understand since i know little about all this. At this time the slots that contain the structure data items on say, a page in commons, are NOT included in the dumps with the page itself. Correct?

Is that structure data being dumped elsewhere on its own?

Not yet; there's a task for that but it's blocked on a performance issue. See https://phabricator.wikimedia.org/T222497 the blocker, and https://phabricator.wikimedia.org/T221917 the dumps task.

@daniel: so I understand since i know little about all this. At this time the slots that contain the structure data items on say, a page in commons, are NOT included in the dumps with the page itself. Correct?

Is that structure data being dumped elsewhere on its own?

Data in slots other than the main slot are not dumped anywhere right now. This was tagged as Not A Blocker (tm) for the MVP. Ask @Abit and @Ramsey-WMF about the reasoning.

Not yet; there's a task for that but it's blocked on a performance issue. See https://phabricator.wikimedia.org/T222497 the blocker, and https://phabricator.wikimedia.org/T221917 the dumps task.

To clarify - the blocker is for the RDF dumps. Including the MediaInfo slot in the XML dump is not blocked on anything, we could just do it. Or am I missing something?

Not yet; there's a task for that but it's blocked on a performance issue. See https://phabricator.wikimedia.org/T222497 the blocker, and https://phabricator.wikimedia.org/T221917 the dumps task.

To clarify - the blocker is for the RDF dumps. Including the MediaInfo slot in the XML dump is not blocked on anything, we could just do it. Or am I missing something?

That's right, this is an answer to the question "Is that structured data being dumped elsewhere on its own" (like the wikidata entity dumps).

Putting this con the CPT clinic duty board as a "small project".

Change 586316 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] WIP: MCR import

https://gerrit.wikimedia.org/r/586316

Change 612417 had a related patch set uploaded (by Cicalese; owner: Cicalese):
[mediawiki/extensions/FileImporter@master] Fix constructor invocation and content model

https://gerrit.wikimedia.org/r/612417

Change 612536 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@master] Add import/export round trip test.

https://gerrit.wikimedia.org/r/612536

Change 612417 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Fix constructor invocation and content model

https://gerrit.wikimedia.org/r/612417

Change 612536 merged by jenkins-bot:
[mediawiki/core@master] Add import/export round trip test.

https://gerrit.wikimedia.org/r/612536

Change 614327 had a related patch set uploaded (by Cicalese; owner: Cicalese):
[mediawiki/core@master] Check for unknown slot.

https://gerrit.wikimedia.org/r/614327

Change 586316 merged by jenkins-bot:
[mediawiki/core@master] MCR import

https://gerrit.wikimedia.org/r/586316

Change 614044 had a related patch set uploaded (by Daniel Kinzler; owner: Daniel Kinzler):
[mediawiki/core@REL1_35] Add import/export round trip test.

https://gerrit.wikimedia.org/r/614044

Change 614044 merged by jenkins-bot:
[mediawiki/core@REL1_35] Add import/export round trip test.

https://gerrit.wikimedia.org/r/614044

Change 614045 had a related patch set uploaded (by Cicalese; owner: Daniel Kinzler):
[mediawiki/core@REL1_35] MCR import

https://gerrit.wikimedia.org/r/614045

Change 614766 had a related patch set uploaded (by Cicalese; owner: Cicalese):
[mediawiki/extensions/FileImporter@REL1_35] Fix constructor invocation and content model

https://gerrit.wikimedia.org/r/614766

Change 614045 merged by jenkins-bot:
[mediawiki/core@REL1_35] MCR import

https://gerrit.wikimedia.org/r/614045

Change 614766 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@REL1_35] Fix constructor invocation and content model

https://gerrit.wikimedia.org/r/614766

Change 614762 had a related patch set uploaded (by Cicalese; owner: Cicalese):
[mediawiki/core@master] Remove backward compatibility code from ImportableOldRevisionImporter

https://gerrit.wikimedia.org/r/614762

Change 614752 had a related patch set uploaded (by Cicalese; owner: Cicalese):
[mediawiki/extensions/FileImporter@master] Add SlotRoleRegistry to ImportableOldRevisionImporter constructor

https://gerrit.wikimedia.org/r/614752

Change 614327 merged by jenkins-bot:
[mediawiki/core@master] Check for unknown slot.

https://gerrit.wikimedia.org/r/614327

Change 614769 had a related patch set uploaded (by Daniel Kinzler; owner: Cicalese):
[mediawiki/core@REL1_35] Check for unknown slot.

https://gerrit.wikimedia.org/r/614769

Change 614769 merged by jenkins-bot:
[mediawiki/core@REL1_35] Check for unknown slot.

https://gerrit.wikimedia.org/r/614769

Change 614752 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@master] Add SlotRoleRegistry to ImportableOldRevisionImporter constructor

https://gerrit.wikimedia.org/r/614752

Change 615444 had a related patch set uploaded (by Daniel Kinzler; owner: Cicalese):
[mediawiki/extensions/FileImporter@REL1_35] Add SlotRoleRegistry to ImportableOldRevisionImporter constructor

https://gerrit.wikimedia.org/r/615444

Change 615444 merged by jenkins-bot:
[mediawiki/extensions/FileImporter@REL1_35] Add SlotRoleRegistry to ImportableOldRevisionImporter constructor

https://gerrit.wikimedia.org/r/615444

Change 614762 merged by jenkins-bot:
[mediawiki/core@master] Remove backward compatibility code from ImportableOldRevisionImporter

https://gerrit.wikimedia.org/r/614762

eprodromou subscribed.

Congratulations and good job!