Page MenuHomePhabricator

switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots
Open, MediumPublic

Description

This needs to be scheduled, widely announced, and then the switch flipped in between dump runs. I'd like to see it happen for the Feb 1 run which gives us a little over two months to get the word out to folks and let them update their dump processing scripts.

Event Timeline

ArielGlenn triaged this task as Medium priority.Sat, Nov 23, 8:29 AM
ArielGlenn created this task.

I'm going to send an email announcement to wikitech and xmldatadumps-l. Someone on the research and wikidata lists should forward the announcement there. Adding the relevant projects (sorry if they aren't right, please feel free to move this around where it belongs).

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists? @hoo are you on the wikidata mailing list and can you forward it there? Thanks in advance :)

Forwarded to wikidata-tech for now, not sure if it should also be on wikidata-l proper.

Johan added a subscriber: Johan.

I'll include this in Tech News too.

Example: a wikitext-only revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2748</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="16" sha1="basgq6oyo0kf51ykrohsumsutvpda86" xml:space="preserve">Wikitext content</text>
  <sha1>basgq6oyo0kf51ykrohsumsutvpda86</sha1>
</revision>

– almost the same, but there is now a sha1 attribute on the <text> tag and the <origin> is new.

Example: a WikibaseMediaInfo revision might change from

0.10
<revision>
  <!-- ... -->
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" xml:space="preserve" />
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

to

0.11
<revision>
  <!-- ... -->
  <origin>2224</origin>
  <model>wikitext</model>
  <format>text/x-wiki</format>
  <text bytes="0" sha1="phoiac9h4m842xq45sp7s6u21eteeq1" xml:space="preserve" />
  <content>
    <role>mediainfo</role>
    <origin>2590</origin>
    <model>wikibase-mediainfo</model>
    <format>application/json</format>
    <text bytes="371" sha1="oropqlvv0q2n9spse1s6autcvay4vqz" xml:space="preserve">{"type":"mediainfo","id":"M902","labels":[],"descriptions":[],"statements":{"P25":[{"mainsnak":{"snaktype":"value","property":"P25","hash":"183074b9158e8b72cc95b7f6c16d5ba5ab5d9544","datavalue":{"value":{"entity-type":"item","numeric-id":503,"id":"Q503"},"type":"wikibase-entityid"}},"type":"statement","id":"M902$d8b8679f-4a1e-bfc4-499b-67ee67f9e155","rank":"normal"}]}}</text>
  </content>
  <sha1>q27phnond5qrm8u8zpnwo17ll81tohw</sha1>
</revision>

– the <text> is still empty (it’s a file), but the entity content is new.

binbot added a subscriber: binbot.Wed, Nov 27, 5:36 PM
leila added a comment.Wed, Nov 27, 6:36 PM

https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists?

I sent it to wiki-research-l and analytics lists. thanks!

Thanks for the forwards!