Page MenuHomePhabricator

switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots
Open, MediumPublic

Description

This needs to be scheduled, widely announced, and then the switch flipped in between dump runs. I'd like to see it happen for the Feb 1 run which gives us a little over two months to get the word out to folks and let them update their dump processing scripts.

Event Timeline

I'm going to send an email announcement to wikitech and xmldatadumps-l. Someone on the research and wikidata lists should forward the announcement there. Adding the relevant projects (sorry if they aren't right, please feel free to move this around where it belongs).

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists? @hoo are you on the wikidata mailing list and can you forward it there? Thanks in advance :)

Forwarded to wikidata-tech for now, not sure if it should also be on wikidata-l proper.

Johan subscribed.

I'll include this in Tech News too.

Example: a wikitext-only revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2748</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" sha1="basgq6oyo0kf51ykrohsumsutvpda86" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

– almost the same, but there is now a sha1 attribute on the <text> tag and the <origin> is new.

Example: a WikibaseMediaInfo revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" xml:space="preserve" />
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2224</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" sha1="phoiac9h4m842xq45sp7s6u21eteeq1" xml:space="preserve" />
  <content>
    <role>mediainfo</role>
    <origin>2590</origin>
    <model>wikibase-mediainfo</model>
    <format>application/json</format>
    <text bytes="371" sha1="oropqlvv0q2n9spse1s6autcvay4vqz" xml:space="preserve">{"type":"mediainfo","id":"M902","labels":[],"descriptions":[],"statements":{"P25":[{"mainsnak":{"snaktype":"value","property":"P25","hash":"183074b9158e8b72cc95b7f6c16d5ba5ab5d9544","datavalue":{"value":{"entity-type":"item","numeric-id":503,"id":"Q503"},"type":"wikibase-entityid"}},"type":"statement","id":"M902$d8b8679f-4a1e-bfc4-499b-67ee67f9e155","rank":"normal"}]}}</text>
  </content>
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

– the <text> is still empty (it’s a file), but the entity content is new.

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists?

I sent it to wiki-research-l and analytics lists. thanks!

This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/ and related patches, so we're looking at March 1 if all goes well.

I remove the Research tag. Please ping us if we can support in any way.

Aklapper subscribed.

Removing task assignee due to inactivity, as this open task has been assigned for more than two years. See the email sent to the task assignee on February 06th 2022 (and T295729).

Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome.

If this task has been resolved in the meantime, or should not be worked on ("declined"), please update its task status via "Add Action… 🡒 Change Status".

Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator.