Since about Apr 3 2014, 18:00 UTC git.wikimedia.org is highly unstable and
reaches an high cpu usage (and a full heap) about 4 hours after being
restarted.
When this happens, the site is mostly unresponsive and basically useless.
Access logs do not show a lot of activity, but they may miss something.
More investigation is needed and maybe an alternative software can be an option
Description
Details
- Reference
- rt7204
Status | Subtype | Assigned | Task | ||
---|---|---|---|---|---|
Declined | None | T83702 git.wikimedia.org is unstable | |||
Duplicate | None | T104356 git.wikimedia.org sometimes not loading (e.g. showing a 503) |
Event Timeline
On Fri Apr 04 16:05:09 2014, glavagetto wrote:
Since about Apr 3 2014, 18:00 UTC git.wikimedia.org is highly unstable
and
reaches an high cpu usage (and a full heap) about 4 hours after being
restarted.When this happens, the site is mostly unresponsive and basically
useless.Access logs do not show a lot of activity, but they may miss
something.More investigation is needed and maybe an alternative software can be
an option
The alternative is moving to Phabricator. Another stop-gap between then
and now would be a waste of time imho.
-Chad
for completeness, here was the gerrit link that tried to tune it a bit. it
seemed to help at first for a couple hours, then the issue came back
https://gerrit.wikimedia.org/r/#/c/123848/
should we at least/just add a cron job in puppet that restarts the service?
shrug
should we at least/just add a cron job in puppet that restarts the service?
Is this something actionable? Shall this be moved to public Wikimedia-Git-or-Gerrit project if this should remain open?
@Joe what do you think , was "declined" justified since there is the workboard to deprecate gitblit or do you want it reopened until the actual switch?
I think this ticket should stay open, to testify how much harm and inertia some abandonware might cause. We should have had the courage to admint that
- Gitblit doesn't fit our needs in terms of stability
- We had no time in a year to really work on a transition
- We refused to surrender to the evidence that this dinosaur was left alive for no good reason, and has eaten a lot of ops man-hours in the process.
I'd like us all, the next time we say something is unstable and someone else says "we won't work on it, there is a plan for X" to ask for a time of decommission, that will happen at that date independently of someone preparing the promised alternative (X).
I personally feel ashamed we've left in production an application that probably has a 75% uptime without anyone caring (besides the occasional restart from ops) for more than a year.
It's down again. Perhaps some kind of monitoring could be implemented to detect this until the migration to a stable successor system is implemented.
icinga does report it to #wikimedia-operations... but can't you use diffusion or github instead?
Ah, nobody there at operations. Was not aware of this. :p Yeah, since Diffusion is kinda useless since it does not allow downloads and also has a pretty confusing interface I will use GitHub from now on. Should have done so much earlier. However, this is another story.
Anybody aware of the fact that about every extension's page points to git.wikimedia.org?
@Glaisher Thanks for your suggestion!
Why did you remove that project? It seems the right one to me. This is about not using gitblit anymore, that would solve the unstable-ness.
Because this isn't a blocker to doing the deprecation. There's lots of things that would be benefited from not using gitblit, but tracking them in that project doesn't help.
This is offtopic (sorry everyone else) but:
- what is being lost by not having the gitblit deprecate project? Nothing, is my contention. This (the wikimedia-git-or-gerrit) is the proper project for this task.
- the gitblit deprecate project says "Project for tracking the deprecation of Gitblit and use of Phabricator's Diffusion as its replacement". This task does not help us deprecate gitblit any faster. It'd like adding every issue that we have with Gerrit to the Gerrit-deprecation project; not helpful.
Stalling the ticket. The gitblit software powering git.wikimedia.org is in the process of being replaced by Phabricator Diffusion.
I have made T111465: [keyresult] Deprecate gitblit in favor of Diffusion a blocker.
See also project: Gitblit-Deprecate.