Page MenuHomePhabricator

git.wikimedia.org is unstable
Closed, DeclinedPublic

Description

Since about Apr 3 2014, 18:00 UTC git.wikimedia.org is highly unstable and
reaches an high cpu usage (and a full heap) about 4 hours after being
restarted.
When this happens, the site is mostly unresponsive and basically useless.
Access logs do not show a lot of activity, but they may miss something.
More investigation is needed and maybe an alternative software can be an option

Details

Reference
rt7204

Event Timeline

rtimport raised the priority of this task from to Normal.
rtimport set Reference to rt7204.
Joe created this task.Apr 4 2014, 4:05 PM

On Fri Apr 04 16:05:09 2014, glavagetto wrote:

Since about Apr 3 2014, 18:00 UTC git.wikimedia.org is highly unstable
and
reaches an high cpu usage (and a full heap) about 4 hours after being
restarted.

When this happens, the site is mostly unresponsive and basically
useless.

Access logs do not show a lot of activity, but they may miss
something.

More investigation is needed and maybe an alternative software can be
an option

The alternative is moving to Phabricator. Another stop-gap between then
and now would be a waste of time imho.
-Chad

Status changed from 'new' to 'open' by RT_System

Dzahn added a comment.Apr 14 2014, 5:12 AM

for completeness, here was the gerrit link that tried to tune it a bit. it
seemed to help at first for a couple hours, then the issue came back
https://gerrit.wikimedia.org/r/#/c/123848/
should we at least/just add a cron job in puppet that restarts the service?
shrug

should we at least/just add a cron job in puppet that restarts the service?

Is this something actionable? Shall this be moved to public Wikimedia-Git-or-Gerrit project if this should remain open?

Dzahn changed the visibility from "WMF-NDA (Project)" to "Public (No Login Required)".Aug 5 2015, 6:03 AM
Dzahn changed the edit policy from "WMF-NDA (Project)" to "All Users".
Dzahn set Security to None.
Restricted Application added a subscriber: Matanya. · View Herald TranscriptAug 5 2015, 6:03 AM
Dzahn closed this task as Declined.Aug 5 2015, 6:04 AM
Dzahn claimed this task.

@Joe what do you think , was "declined" justified since there is the workboard to deprecate gitblit or do you want it reopened until the actual switch?

Dzahn reopened this task as Open.
Dzahn removed Dzahn as the assignee of this task.
Joe added a comment.Aug 5 2015, 6:35 AM

I think this ticket should stay open, to testify how much harm and inertia some abandonware might cause. We should have had the courage to admint that

  1. Gitblit doesn't fit our needs in terms of stability
  2. We had no time in a year to really work on a transition
  3. We refused to surrender to the evidence that this dinosaur was left alive for no good reason, and has eaten a lot of ops man-hours in the process.

I'd like us all, the next time we say something is unstable and someone else says "we won't work on it, there is a plan for X" to ask for a time of decommission, that will happen at that date independently of someone preparing the promised alternative (X).

I personally feel ashamed we've left in production an application that probably has a 75% uptime without anyone caring (besides the occasional restart from ops) for more than a year.

It's down again. Perhaps some kind of monitoring could be implemented to detect this until the migration to a stable successor system is implemented.

Perhaps some kind of monitoring could be implemented to detect this until the migration to a stable successor system is implemented.

icinga does report it to #wikimedia-operations... but can't you use diffusion or github instead?

Kghbln added a comment.EditedAug 15 2015, 9:57 AM

Ah, nobody there at operations. Was not aware of this. :p Yeah, since Diffusion is kinda useless since it does not allow downloads and also has a pretty confusing interface I will use GitHub from now on. Should have done so much earlier. However, this is another story.

Anybody aware of the fact that about every extension's page points to git.wikimedia.org?

@Glaisher Thanks for your suggestion!

Anybody aware of the fact that about every extension's page points to git.wikimedia.org?

Yes. See T108864: Update mediawiki.org templates to link to Diffusion, not gitblit.

Cool, it's in the making. :)

Dzahn added a comment.Sep 3 2015, 7:24 PM

Why did you remove that project? It seems the right one to me. This is about not using gitblit anymore, that would solve the unstable-ness.

greg added a subscriber: greg.Sep 3 2015, 7:30 PM

Why did you remove that project? It seems the right one to me. This is about not using gitblit anymore, that would solve the unstable-ness.

Because this isn't a blocker to doing the deprecation. There's lots of things that would be benefited from not using gitblit, but tracking them in that project doesn't help.

Dzahn added a comment.Sep 3 2015, 7:57 PM

Isn't blocker done by the "blocked-by" feature rather than the project tag?

greg added a comment.Sep 3 2015, 9:17 PM

This is offtopic (sorry everyone else) but:

  1. what is being lost by not having the gitblit deprecate project? Nothing, is my contention. This (the wikimedia-git-or-gerrit) is the proper project for this task.
  2. the gitblit deprecate project says "Project for tracking the deprecation of Gitblit and use of Phabricator's Diffusion as its replacement". This task does not help us deprecate gitblit any faster. It'd like adding every issue that we have with Gerrit to the Gerrit-deprecation project; not helpful.
hashar changed the task status from Open to Stalled.
hashar added a subscriber: hashar.

Stalling the ticket. The gitblit software powering git.wikimedia.org is in the process of being replaced by Phabricator Diffusion.

I have made T111465: [keyresult] Deprecate gitblit in favor of Diffusion a blocker.

See also project: Gitblit-Deprecate.

greg edited projects, added Gitblit; removed Gerrit.Sep 18 2015, 9:04 PM
MoritzMuehlenhoff closed this task as Declined.May 3 2016, 8:23 AM

gitblit is currently being migrated to Diffusion (T123718), closing this bug.