Page MenuHomePhabricator

Consult Dumps users and other community members about the future of Dumps
Open, HighPublic

Description

What is the problem/what do you want to achieve?

We have multiple questions about our dumps as we work to improve them.

How can we help you?

We would like to work with the community to get answers to some questions and rough consensus on some directions we decide on. In this order:

  1. If we move from XML to JSON Lines formatting, would you love this, hate it, need time to migrate (how much?), etc.
  2. We can output more incremental dumps. How would you like to see this? Time chunks? What are your use cases for more incremental updates to the dumps?
  3. Which of the many dumps artifacts that are generated do you need, and which would be very helpful but with some changes?

What does success look like?

We have decisions documented, with dissenting opinions noted, and consensus highlighted on the questions above (I think these three are good but we may add one or two more as we go).

What is your deadline?

The sooner the better for #1, but it's better to take our time and be thorough with #2 and #3.

Event Timeline

@Milimetric Thanks for filing this. While we're more than happy to help, we as a team have no idea who those users are nor can figure that out in any way different than, well, asking on generic fora, "are you a dump user". Will you give us data about them?

PS: Also please give me an idea of a timeline. Are we talking "this month", "next FY", ...? TY.

@Elitre: good point. I think we're no more knowledgeable about dumps users either. Ariel had a presentation that explained why it's hard to even know who they are (because dumps are mirrored all over the world and we can't track that usage at all). I think if we go forward with the approach we took in the Wikistats 2 consultation, where Erik Zachte asked on the mailing lists and his personal blog and the existing Wikistats 1 website about it, we might be ok. I think ultimately we involved the right people there. For reference: https://www.mediawiki.org/wiki/Wikistats_2.0_Design_Project/RequestforFeedback

As for timeline, for #1 we'd like to start right away, as soon as we figure out the right approach as per above ^. For #2 and #3, we can follow up after we start and see who engages and all that. If that answer's too wishy washy, please do ping me and we can chat in more detail.

@Milimetric I'll come up with a plan and LYK when I can, thanks.

Elitre triaged this task as High priority.

We had our first meeting today and I think I have what I need to start working on this request . Given that my team's offsite is a bit in the way, this is not going to reach the public this month.

The team LMK that there's no rush. I started a doc for them to work on the language of the questions.

@Elitre is there any update about this task? to transfer the info to Asana Mov Calendar

Not really, we're progressing slowly. I added what seems to be a related task now.

AutoWikiBrowser has an database scanner using the dumps. Just pointing it out, seems relevant.

Are you planning on incorporating the Wikidata QID into this dump?
It would be great to discuss this, you can contact us at wikidata-integrations@wikimedia.de

This is following up from this ticket: https://phabricator.wikimedia.org/T197090

Are you planning on incorporating the Wikidata QID into this dump?
It would be great to discuss this, you can contact us at wikidata-integrations@wikimedia.de

This is following up from this ticket: https://phabricator.wikimedia.org/T197090

Hi @xcollazo
It would be great to collaborate on getting this field into the dump - if possible can you point us in the direction of who to talk to about that? Thanks so much
wikidata-integrations@wikimedia.de

Actually I already found out who to contact, thanks!

Actually I already found out who to contact, thanks!

Sorry for the delay and all the hoops to contact us. Just in case copying here:

@SuzanneWood-WMDE thanks for the update, please reach out to the DPE / myself regarding anything Dumps 2 (and 1) related going forward.