Page MenuHomePhabricator

switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots
Open, MediumPublic

Description

This needs to be scheduled, widely announced, and then the switch flipped in between dump runs. I'd like to see it happen for the Feb 1 run which gives us a little over two months to get the word out to folks and let them update their dump processing scripts.

Event Timeline

I'm going to send an email announcement to wikitech and xmldatadumps-l. Someone on the research and wikidata lists should forward the announcement there. Adding the relevant projects (sorry if they aren't right, please feel free to move this around where it belongs).

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists? @hoo are you on the wikidata mailing list and can you forward it there? Thanks in advance :)

Forwarded to wikidata-tech for now, not sure if it should also be on wikidata-l proper.

Johan added a subscriber: Johan.

I'll include this in Tech News too.

Example: a wikitext-only revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2748</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" sha1="basgq6oyo0kf51ykrohsumsutvpda86" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

– almost the same, but there is now a sha1 attribute on the <text> tag and the <origin> is new.

Example: a WikibaseMediaInfo revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" xml:space="preserve" />
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2224</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" sha1="phoiac9h4m842xq45sp7s6u21eteeq1" xml:space="preserve" />
  <content>
    <role>mediainfo</role>
    <origin>2590</origin>
    <model>wikibase-mediainfo</model>
    <format>application/json</format>
    <text bytes="371" sha1="oropqlvv0q2n9spse1s6autcvay4vqz" xml:space="preserve">{"type":"mediainfo","id":"M902","labels":[],"descriptions":[],"statements":{"P25":[{"mainsnak":{"snaktype":"value","property":"P25","hash":"183074b9158e8b72cc95b7f6c16d5ba5ab5d9544","datavalue":{"value":{"entity-type":"item","numeric-id":503,"id":"Q503"},"type":"wikibase-entityid"}},"type":"statement","id":"M902$d8b8679f-4a1e-bfc4-499b-67ee67f9e155","rank":"normal"}]}}</text>
  </content>
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

– the <text> is still empty (it’s a file), but the entity content is new.

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists?

I sent it to wiki-research-l and analytics lists. thanks!

This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/ and related patches, so we're looking at March 1 if all goes well.

I remove the Research tag. Please ping us if we can support in any way.