Page MenuHomePhabricator

Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors
Closed, ResolvedPublic

Description

Please provide the following information.

  • Provide a short summary of your proposed post for the Wikimedia Technical Blog:

Deep analysis of collaboration on Wikimedia projects is now possible with the public Mediawiki History dataset. Some exciting research is already underway, read on for an introduction to the dataset and the kinds of questions it can help you answer.

(Blog post is mostly written: https://docs.google.com/document/d/1v0lOO2aF3wmlA6TYJsgXXSPuc-0iFf3CqGd66VvYLXY/edit)

Data Science, Data Engineering

  • What audience or audiences do you think your post is appropriate for:

Technical community, broader research community

  • Will you need assistance with writing your blog post?

No

  • Does your post need to be published by a certain date?

No

Event Timeline

Nuria created this task.Aug 3 2020, 9:28 PM
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptAug 3 2020, 9:28 PM

Heja @Nuria, thanks for this post! @srodlund is currently out, so I'm kindly asking for a bit more patience with this one. :)

Milimetric renamed this task from Story idea for Blog: The best dataset to measure content and contributors to wikipedia to Story idea for Blog: The Best Dataset on Wikimedia Content and Contributors.Aug 5 2020, 2:51 PM
Milimetric updated the task description. (Show Details)

@Nuria I am back and happy to proceed! The note above says the post is "mostly" written. Let me know when you feel like it is done, and I will start editing.

Thanks for your patience with this!!!

@Milimetric <--I saw that you were editing on this doc, too, so I thought I would check-in and see if you feel it is ready to edit.

@srodlund yes, totally ready! Edit away, and feel free to ping me if you want to hack on it together. My main goal with it is to make that "historified edit" idea very approachable, I'm very open to changing anything/everything to make that happen.

@Milimetric I will take a first pass at it (asynch), and then if we want to meet (synch) we can.

Should I pay any attention to the section at the end?

@Milimetric I will take a first pass at it (asynch), and then if we want to meet (synch) we can.

Sounds good

Should I pay any attention to the section at the end?

The "Old Ideas" one? Nono, sorry, I should've deleted that.

@Milimetric Okay, I did a first pass of this. Please take a look and let me know if you have questions or want to synch up!

Wow, thanks so much. I think I cleaned up most of my messes, can you take another look? And then maybe we can do a quick live sync and go over it?

@Millimetric Great! I will take a look at this today or tomorrow. We could set up a synch meeting for Thursday Aug 27) or early next week if you like. Let me know if that would work for you.

@srodlund: both work, but maybe since you have Andrew's 3 to get through, let's do next week? I'll pencil something in.

Feel free to look at my calendar and invite me to a meeting. It is up to date :-)

Sarah has already reviewed mine, i'm mostly just waiting for reviews from you all (and also for response from Confluent about a potential cross post). Yours could probably go out before mine do if yours is ready.

@Milimetric Just checking in to see how you are progressing on this?

I'm sorry, I keep trying to bring it up at our team meetings but we've been busy end of quarter. I think if you have a lull and need to get a post out, you can go with it. It works perfectly well as an introduction and we can follow up on it with more in-depth examples of research.

Talked it over with the team, we think we should post this. So post away, anytime that's good for your schedule. We have some good ideas for follow-ups.

@Millimetric Great! Since we have 2 posts going up this week, I am going to publish this one early next week if that works for you.

@Milimetric Just an update. I had a number of posts this week, and plan on posting this one on Thursday, 1 Oct. Will let you know when it is published.

@Milimetric this is published, but can you please review it to make sure there are no errors before I announce widely? Particularly, I want to make sure the code snippets appear correctly.

Unfortunately, the code syntax highlight function in the blog's editor is not working right now, so this formatting is the best I can get at the moment. Once this is fixed, we can update with better formatting.

https://techblog.wikimedia.org/2020/10/01/mediawiki-history-the-best-dataset-on-wikimedia-content-and-contributors/

Looks great @srodlund. Do you think the formatting will be fixed in a week or so? Maybe I can help? If so, then let's wait to announce, otherwise do announce it. I hadn't read it in a bit and I like it now that I'm revisiting. I think it's gonna be ok :) And I want to do a talk about it somewhere soon.

srodlund added a subscriber: bd808.EditedThu, Oct 1, 7:26 PM

@Milimetric I'm meeting w/@bd808 to help troubleshoot on Monday; it may be something we can fix then, but I'm not entirely sure. Let me know if you want me to unpublish it until then. Otherwise, I can leave it and just not announce until then.

@Milimetric -- @bd808 was able to fix the code syntax highlighter on the blog's editor, and I applied this to the two main blocks on your post. Can you take a look and let me know what you think? If it all looks correct, I'll send out tweets and announce today.

(I think it looks really nice!)

@srodlund looks awesome, thanks to you and Bryan :)

srodlund closed this task as Resolved.Fri, Oct 2, 4:31 PM
srodlund claimed this task.

Yay! Announced on Twitter and resolving this ticket! Thanks for all your work on this! It's a really good post!