Page MenuHomePhabricator

Story idea for Blog: Migration of the Analytics Hadoop infrastructure to Apache Bigtop
Open, Needs TriagePublic

Description

  • Provide a short summary of your proposed post for the Wikimedia Technical Blog. Blog readers will see this as the preview to your post:

Migration of the Wikimedia data infrastructure to Apache Bigtop.
The Analytics engineering team moved its infrastructure to Apache Bigtop (https://bigtop.apache.org/), a full open source Hadoop package distribution. The process involved testing and preparing the migration on a separate environment, a 400TB data backup and eventually a long upgrade day that involved almost one hundred hosts.

I'd say big picture. I'd love to highlight two things:

  1. The technical challenges migrating to the new distribution.
  2. The benefits that we'll have long term, since we switched to a full open source project that we can actively influence and contribute to (as opposed to before, that was more only consuming upstream packages).
  • Which audience or audiences do you think your post is appropriate for?:

Anybody interested in the Hadoop ecosystem and how to run a full open source distribution about it. It will also be interesting as use case for Apache Bigtop upstream devs, they have been very helpful and interested in our use case, so this blog post will probably be shared from their blog as well eventually.

  • Will you need assistance with writing your blog post, or do you already have a draft? If you have a draft, please provide a link here:

Still no draft, but we can start one asap, I wanted to get the green light from the Tech blog's admins first. Due to the length of the migration, it might be possible to do two blog posts (one for the preparation/testing and one for the upgrade itself).

  • Does your post need to be published by a certain date?

No urgency.

Hadoop's logo is an elephant, so something like the following could be nice:
https://upload.wikimedia.org/wikipedia/commons/3/37/African_Bush_Elephant.jpg

  • Do you have any other questions or comments?

No questions :)

Once your request is received, a technical blog admin will review it and reach out to you through Phabricator.

Event Timeline

@elukey

Hey hey! This sounds like a great idea for a blog post (or two)! It is perfectly fine to plan for more than one post.

Let me know when your draft is ready. I will read it and share feedback and suggestions with you. Really looking forward to reading this!

Ottomata added a project: Analytics-Kanban.
Ottomata moved this task from Backlog to Q4 2020/2021 on the Analytics-Clusters board.

Hey @elukey I just wanted to touch base to see how you are coming along with this. Let me know how I can support you!

@srodlund hi! Thanks a lot, I am currently still writing the first draft, it got deprioritized due to the work to be done for end of Q3. I hope to have something ready for next week!

Sound great! Looking forward to reading it!

@srodlund sorry for the lag, me and Joseph should have a draft for this week :)

No problem! I'll keep an eye out for it!

@srodlund draft ready! I shared the gdoc with you and the Analytics team :)

@elukey Awesome!!! I will take a pass at this tomorrow!