HomePhabricator
Phame Blogs Doing the needful
Doing the needful
Wikimedia Release Engineering Team Blog

Investigate a PHP segmentation fault

Written by hashar on Jul 28 2023, 12:06 PM.

Summary

Read more...

CI: Get notified immediately when a job fails

Written by kostajh on Mar 7 2023, 9:59 AM.

If you've submitted patches for MediaWiki core, skins or extensions, you've seen this output in Gerrit:

Read more...

Shrinking H2 database files

Written by hashar on Dec 16 2022, 3:38 PM.

Our code review system Gerrit has several caches, the largest ones being backed up on disk. The disk caches offload memory usage and persist the data between restarts. As a Java application, the caches are stored in H2 database files and I recently had to find how to connect to them in order to inspect their content and reduce their size.

Read more...

scap backport Makes Deployments Easy

Written by jeena on Sep 26 2022, 10:47 PM.

Mediawiki developers, have you ever thought, “I wish I could deploy my own code for Mediawiki”? Now you can! More deploys! More fun!

Read more...

Production Excellence #46: July & August 2022

Written by Krinkle on Sep 8 2022, 11:02 PM.

How are we doing in our strive for operational excellence? Read on to find out!

Read more...

Production Excellence #45: June 2022

Written by Krinkle on Jul 30 2022, 12:14 AM.

How are we doing in our strive for operational excellence? Read on to find out!

Read more...

Production Excellence #44: May 2022

Written by Krinkle on Jun 16 2022, 12:07 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

GitLab-a-thon!

Written by brennen on May 31 2022, 6:30 PM.

Release Engineering's "GitLab-a-thon" sprint for May 10th-24th (roughly) focused on the mechanics of migrating a Wikimedia service to GitLab, setting up a CI pipeline, building container images from that service, and publishing images to the Wikimedia registry. We selected the Blubber project as a good candidate for experimentation:

Read more...

Production Excellence #43: April 2022

Written by Krinkle on May 12 2022, 9:00 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #42: March 2022

Written by Krinkle on Apr 21 2022, 9:29 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

What We Learned from Trainsperiment Week

Written by thcipriani on Apr 20 2022, 8:15 PM.

Developers should own the process of putting their code into production. They should decide when to deploy, monitor their deployment, and make decisions about rollback.

Read more...

A Trainsperiments Week Reflection

Written by dduvall on Apr 1 2022, 2:29 AM.

Over here in the Release-Engineering-Team, Train Deployment is usually a rotating duty. We've written about it before, so I won't go into the exact process, but I want to tell you something new about it.

Read more...

Production Excellence #41: February 2022

Written by Krinkle on Mar 15 2022, 12:59 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

GitLab: Rethinking how we handle access control

Written by brennen on Mar 4 2022, 10:44 PM.

I'll start with a bit of general administrivia. First, our migration of Wikimedia code review & CI to GitLab continues, and we're mindful that people could use regular updates on progress. Second, I need to think through some stuff about the project, and doing that in writing is helpful for all involved. I'm going to try writing occasional blog entries here for both purposes.

Read more...

Diving Into Our Deployment Data

Written by thcipriani on Feb 15 2022, 9:17 PM.

If you’ve ever experienced the pride of seeing your name on MediaWiki's contributor list, you've been involved in our deployment process (whether you knew it or not).

Read more...

Production Excellence #40: January 2022

Written by Krinkle on Feb 4 2022, 4:32 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #39: December 2021

Written by Krinkle on Jan 17 2022, 10:16 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #38: November 2021

Written by Krinkle on Dec 12 2021, 1:34 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #37: October 2021

Written by Krinkle on Nov 5 2021, 2:05 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Benchmarking MediaWiki with PHPBench

Written by kostajh on Oct 28 2021, 1:54 PM.

This post gives a quick introduction to a benchmarking tool, phpbench, ready for you to experiment with in core and skins/extensions.[1]

Read more...

Production Excellence #36: September 2021

Written by Krinkle on Oct 21 2021, 11:31 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

How we deploy code

Written by thcipriani on Sep 27 2021, 6:44 PM.

Last week I spoke to a few of my Wikimedia Foundation (WMF) colleagues about how we deploy code—I completely botched it. I got too complex too fast. It only hit me later—to explain deployments, I need to start with a lie.

Read more...

Production Excellence #35: August 2021

Written by Krinkle on Sep 8 2021, 3:53 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #34: July 2021

Written by Krinkle on Aug 19 2021, 3:49 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #33: June 2021

Written by Krinkle on Jul 14 2021, 3:34 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Shrinking the tasks backlog

Written by hashar on Jul 2 2021, 3:05 PM.

The release engineering team triages tasks flagged Release-Engineering-Team on a weekly basis. It is an all hands on deck one hour meeting in which we pick tasks one by one and find out what to do with them. We have started with more than a hundred of them and are now down to just a dozen or so, most filed since the last meeting.

Read more...

Production Excellence #32: May 2021

Written by Krinkle on Jun 21 2021, 1:31 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #31: April 2021

Written by Krinkle on May 13 2021, 3:49 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #30: March 2021

Written by Krinkle on Apr 3 2021, 12:20 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Tracking memory issue in a Java application

Written by hashar on Mar 12 2021, 9:38 AM.

One of the critical pieces of our infrastructure is Gerrit. It hosts most of our git repositories and is the primary code review interface. Gerrit is written in the Java programming language which runs in the Java Virtual Machine (JVM). For a couple years we have been struggling with memory issues which eventually led to an unresponsive service and unattended restarts. The symptoms were the usual ones: the application responses being slower and degrading until server side errors render the service unusable. Eventually the JVM terminates with:

Read more...

Production Excellence #29: February 2021

Written by Krinkle on Mar 6 2021, 1:03 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #28: January 2021

Written by Krinkle on Feb 19 2021, 6:45 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #27: December 2020

Written by Krinkle on Feb 4 2021, 5:46 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #26: November 2020

Written by Krinkle on Dec 15 2020, 9:49 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Runnable runbooks

Written by mmodell on Dec 11 2020, 11:51 PM.

Recently there has been a small effort on the Release-Engineering-Team to encode some of our institutional knowledge as runbooks linked from a page in the team's wiki space.

Read more...

Production Excellence #25: October 2020

Written by Krinkle on Nov 24 2020, 5:13 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #24: September 2020

Written by Krinkle on Oct 23 2020, 11:51 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

CI now updates your deployment-charts

Written by jeena on Sep 24 2020, 5:34 PM.

If you're making changes to a service that is deployed to Kubernetes, it sure is annoying to have to update the helm deployment-chart values with the newest image version before you deploy. At least, that's how I felt when developing on our dockerfile-generating service, blubber.

Read more...

Production Excellence #23: July & August 2020

Written by Krinkle on Sep 23 2020, 6:10 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #22: June 2020

Written by Krinkle on Jul 23 2020, 3:25 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Faster source code fetches thanks to git protocol version 2

Written by hashar on Jul 6 2020, 10:57 AM.

In 2015 I noticed git fetches from our most active repositories to be unreasonably slow, sometimes up to a minute which hindered fast development and collaboration. You can read some of the debugging details I have conducted at the time on T103990. Gerrit upstream was aware of the issue and a workaround was presented though we never went to implement it.

Read more...

Production Excellence #21: May 2020

Written by Krinkle on Jun 24 2020, 8:00 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Celebrating 600,000 commits for Wikimedia

Written by Jdforrester-WMF on May 29 2020, 10:47 PM.

Earlier today, the 600,000th commit was pushed to Wikimedia's Gerrit server. We thought we'd take this moment to reflect on the developer services we offer and our community of developers, be they Wikimedia staff, third party workers, or volunteers.

Read more...

Production Excellence #20: April 2020

Written by Krinkle on May 14 2020, 4:10 PM.

How are we doing on that strive for operational excellence during these unprecedented times?

Read more...

Production Excellence #19: February 2020

Written by Krinkle on Mar 24 2020, 9:40 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #18: January 2020

Written by Krinkle on Feb 28 2020, 7:39 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #17: December 2019

Written by Krinkle on Jan 10 2020, 2:51 AM.

How’d we do in our strive for operational excellence in November and December? Read on to find out!

Read more...

Production Excellence #16: October 2019

Written by Krinkle on Nov 8 2019, 5:57 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #15: September 2019

Written by Krinkle on Oct 24 2019, 11:25 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Integrating code coverage metrics with your development workflow

Written by kostajh on Oct 9 2019, 10:04 AM.

In Changes and improvements to PHPUnit testing in MediaWiki, I wrote about efforts to help speed up PHPUnit code coverage generation for local development.[0] While this improves code coverage generation time for local development, it could be better.

Read more...

Introducing Phatality

Written by mmodell on Oct 7 2019, 12:36 AM.

This past week marks the release of a little tool that I've been working on for a while. In fact, it's something I've wanted to build for more than a year. But before I tell you about the solution, I need to describe the problem that I set out to solve.

Read more...

Production Excellence #14: August 2019

Written by Krinkle on Oct 3 2019, 4:27 AM.

How’d we do in our strive for operational excellence in August? Read on to find out!

Read more...

Production Excellence #13: July 2019

Written by Krinkle on Aug 30 2019, 8:08 PM.

How’re we doing on that strive for operational excellence? Read this first anniversary edition to find out!

Read more...

Production Excellence #12: June 2019

Written by Krinkle on Jul 31 2019, 6:44 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Changes and improvements to PHPUnit testing in MediaWiki

Written by kostajh on Jul 16 2019, 4:13 AM.

Building off the work done at the Prague Hackathon (T216260), we're happy to announce some significant changes and improvements to the PHP testing tools included with MediaWiki.

Read more...

Production Excellence #11: May 2019

Written by Krinkle on Jul 1 2019, 6:56 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #10: April 2019

Written by Krinkle on May 31 2019, 7:21 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Introducing the codehealth pipeline beta

Written by kostajh on May 14 2019, 8:29 PM.

After many months of discussion, work and consultation across teams and departments[0], and with much gratitude and appreciation to the hard work and patience of @thcipriani and @hashar, the Code-Health-Metrics group is pleased to announce the introduction of the code health pipeline. The pipeline is currently in beta and enabled for GrowthExperiments, soon to be followed by Notifications, PageTriage, and StructuredDiscussions. (If you'd like to enable the pipeline for an extension you maintain or contribute to, please reach out to us via the comments on this post.)

Read more...

Production Excellence #9: March 2019

Written by Krinkle on Apr 21 2019, 6:51 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Quibble hibernated, it is time to flourish

Written by hashar on Mar 28 2019, 11:48 AM.

Writing blog is neither my job nor something that I enjoy, I am thus late in the Quibble updates. The last one Blog Post: Quibble in summer has been written in September 2018 and I forgot to publish it until now. You might want to read it first to get a glance about some nice changes that got implemented last summer.

Read more...

Quibble in summer

Written by hashar on Mar 28 2019, 10:42 AM.

Note: this post has been published on 03/28 but has been originally written in September 2018 after Quibble 0.0.26 and never got published.

Read more...

CI working group report, with recommendations of new tools to try

Written by LarsWirzenius on Mar 25 2019, 6:29 PM.

The working group to consider future CI tooling for Wikimedia has finished and produced a report. The report is at https://www.mediawiki.org/wiki/Wikimedia_Release_Engineering_Team/CI_Futures_WG/Report and the short summary is that the release engineering team should do prototype implementations of Argo, GitLab CI/CD, and Zuul v3.

Read more...

Production Excellence #8: February 2019

Written by Krinkle on Mar 21 2019, 7:11 PM.

How’d we do in our strive for operational excellence? Read on to find out!

Read more...

Help my CI job fails with exit status -11

Written by hashar on Mar 21 2019, 9:52 AM.

For a few weeks, a CI job had PHPUnit tests abruptly ending with:

Read more...

Work progresses on CI tool evaluation

Written by LarsWirzenius on Mar 8 2019, 4:59 PM.

The working group to consider future tooling for continuous integration is making progress (see previous blog post J148 for more information). We're looking at and evaluating alternatives and learning of new needs within WMF.

Read more...

Choosing tools for continuous integration

Written by LarsWirzenius on Feb 28 2019, 6:27 PM.

The Release Engineering team has started a working group to discuss and consider our future continuous integration tooling. Please help!

Read more...

Production Excellence #7: January 2019

Written by Krinkle on Feb 13 2019, 3:53 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #6: December 2018

Written by Krinkle on Jan 22 2019, 2:54 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Gerrit now automatically adds reviewers

Written by hashar on Jan 17 2019, 4:53 PM.
WARNING: 20210305 the reviewers by blame Gerrit plugin got disabled after it got announced by this blog post. It turns out the author of change is not necessarily an adequate reviewer suggestion in our context and some were being added to review for a whole lot code than they would expect. The post still have some worthy information as to how one can find reviewers.
Read more...

Code Health Metrics and SonarQube

Written by zeljkofilipin on Jan 10 2019, 2:54 PM.
  1. Code Health
Read more...

Production Excellence #5: November 2018

Written by Krinkle on Dec 12 2018, 4:40 AM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Production Excellence #4: October 2018

Written by Krinkle on Nov 28 2018, 5:47 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

Incident Documentation: An Unexpected Journey

Written by zeljkofilipin on Nov 22 2018, 6:06 PM.

The Release Engineering team wants to continually improve the quality of our software over time. One of the ways in which we hoped to do that this year is by creating more useful Selenium smoke tests. (From now on, test will be used instead of Selenium test.) This blog post is about how we determined where the tests should focus and the relative priority.

Read more...

Bring in 'da noise, bring in defunct. It's a zombie party!

Written by dduvall on Nov 16 2018, 7:22 PM.

Halloween is a full two weeks behind us here in the United States, but it's still on my mind. It happens to be my favorite holiday, and I receive it both gleefully and somberly.

Read more...

Wikimedia Release Engineering's 1st Annual Developer Satisfaction Survey

Written by zeljkofilipin on Nov 7 2018, 4:02 PM.
NOTE: The survey is now closed
Read more...

Production Excellence #3: September 2018

Written by Krinkle on Sep 25 2018, 6:41 PM.

How’d we do in our strive for operational excellence last month? Read on to find out!

Read more...

An introduction to Task Types in Phabricator

Written by mmodell on Sep 20 2018, 5:22 PM.

This blog post will describe a bit about how we are utilizing the "Task Types" feature in Phabricator to facilitate better tracking of work and to streamline workflows with custom fields. Additionally, I will be soliciting feedback about potential use-cases which could potentially take further advantage of this feature.

Read more...

mediawiki_selenium 1.8.1 Ruby Gem Released

Written by zeljkofilipin on Jun 14 2018, 3:05 PM.

It has been a while since the last mediawiki_selenium release! 💎

Read more...

Quibble in May

Written by hashar on Jun 1 2018, 8:36 PM.

[Quibble] is the new test runner for MediaWiki (see the intro Blog Post: Introducing Quibble). This post is to give an update of what happened during May 2018.

Read more...

Technical Debt - The Contagion Effect

Written by Jrbranaa on May 24 2018, 11:16 PM.

One particularly interesting topic discussed during the Hackathon Technical Debt session (T194934) was that of the contagious aspect of technical debt. Although this makes sense in hindsight, it's not something that I had really given much thought to previously.

Read more...

Run Selenium tests using Quibble and Docker

Written by zeljkofilipin on May 2 2018, 1:46 PM.

Dependencies are Git Python 3, and Docker Community Edition (CE).

Read more...

Introducing Quibble

Written by hashar on Apr 30 2018, 9:09 AM.

Running all tests for MediaWiki and matching what CI/Jenkins is running has been a constant challenge for everyone, myself included. Today I am introducing Quibble, a python script that clone MediaWiki, set it up and run test commands.

Read more...

Selenium tests in Node.js project retrospective

Written by zeljkofilipin on Mar 26 2018, 2:28 PM.

I have been working on the project with more or less focus on it since 2015. Maybe the easiest way to follow the project is by taking a look at a few epic tasks:

Read more...

Phabricator Updates for February 2018

Written by mmodell on Feb 15 2018, 7:55 AM.

This is a digest of the updates from several weeks of changelogs which are published upstream. This is an incomplete list as I've cherry-picked just the changes which I think will be of significant interest to end-users of Wikimedia's phabricator. Please see the upstream changelogs for a detailed overview of everything that's changed recently.

Read more...

Selenium Ruby framework deprecated

Written by zeljkofilipin on Oct 30 2017, 1:44 PM.

This is your friendly but final warning that we are replacing Selenium tests written in Ruby with tests in Node.js. There will be no more reminders. Ruby stack will no longer be maintained. For more information see T139740 and T173488.

Read more...

Tech talk: Selenium tests in Node.js

Written by zeljkofilipin on Oct 27 2017, 12:04 PM.

Željko Filipin, Engineer (Contractor) from Release Engineering team. That's me! 👋

Read more...

Selenium Ruby framework deprecation (September)

Written by zeljkofilipin on Sep 25 2017, 3:27 PM.

Originally an email sent on September 25 2017 to qa, engineering and wikitech-l mailing lists.

Read more...

Selenium Ruby framework deprecation

Written by zeljkofilipin on Sep 25 2017, 3:14 PM.

Originally an email sent on August 23 2017 to qa, engineering and wikitech-l mailing lists.

Read more...

Selenium tests in Node.js

Written by zeljkofilipin on Sep 25 2017, 2:57 PM.

Originally an-email sent on April 3 2017 to qa, engineering and wikitech-l mailing lists.

Read more...

New feature: Embed videos from Commons into Phabricator markup

Written by mmodell on Jun 1 2017, 11:49 PM.

I just finished deploying an update to Phabricator which includes a simple but rather useful feature:

Read more...

Sponsored Phabricator Improvements

Written by mmodell on Jul 27 2016, 10:44 AM.

In T135327, the WMF Technical Collaboration team collected a list of Phabricator bugs and feature requests from the Wikimedia Developer Community. After identifying the most promising requests from the community, these were presented to Phacility (the organization that builds and maintains Phabricator) for sponsored prioritization.

Read more...

Code Review Office Hours

Written by mmodell on May 9 2016, 9:50 PM.

Starting Thursday May 12th, 13:00 PDT ( 20:00 GMT ) we will be having the first weekly Code Review office hours on freenode IRC in the #wikimedia-codereview channel.

Read more...

What's new: Lots of improvements on phabricator.wikimedia.org

Written by mmodell on Feb 23 2016, 12:23 AM.

Not a lot has changed for Wikimedia's instance of Phabricator over the past few months. That's because a lot has been happening behind the scenes, as well as upstream at Phacility. Members of the Release-Engineering-Team and Team-Practices group have been working since December 2015 to integrate various upstream changes, however, nothing was released to our production instance because there were so many important features that were in-progress and not yet fully usable. Additionally, we had to figure out exactly how these features would fit with the specific needs of our project and test a lot of functionality to be sure that we would not break anyone's workflows.

Read more...
About Doing the needful

Occasional updates from the Release-Engineering-Team