Fri, Nov 16
@RobH would you know if it's possible to add physical storage to this machine? If not we'll have to work out a different solution.
Two days later -- after adding in some better monitoring and blocking bots from indexing git blame for giant files with a lot of history -- we seem to be in a stable place.
Thu, Nov 15
What's the use case for longer time on the canaries? Do you want to check something, or allow more time for traffic to hit those machines? If you want to check something maybe we could prompt deployer to move off of canaries and onto the rest of production rather than having a configurable time.
Wed, Nov 14
Now available for gerrit admins at: https://gerrit.wikimedia.org/r/monitoring
Seems like the metrics reporter plugin hasn't received any updates for 8 months now: https://gerrit.googlesource.com/plugins/metrics-reporter-jmx/
Fri, Nov 9
Thu, Nov 8
Done, let me know if anything is amiss.
Wed, Nov 7
After re-deploy of wmf.3 to labswiki, I no longer see this error. Thanks for the quick action on this!
Tue, Nov 6
Mon, Nov 5
@akosiaris helped to roll wmf.2 out today. It required us to use cumin to depool each appserver and restart hhvm as part of deployment:
Fri, Nov 2
Thu, Nov 1
Removed as a deployment blocker, but leaving open for further review per IRC.
Hrm, I'm no longer seeing these errors in logstash, and I see that a couple patches are merged and at least 1 is backported to 1.33.0-wmf.2, does anything remain to be resolved? Still blocking further train rollout?
Sounds like the general consensus is to start with integration/config and work from there, so I'll got ahead and close out this task.
Wed, Oct 31
I don't think that it is partial blocks related. I noticed this at a low volume when I rolled forward to 1.33.0-wmf.2 and filed T208468.
<volans|off> ok all confirmed, the only one without that field in the core DBs is only db2050 only for ruwikiquote
Changing priority after adding as train blocker
Tue, Oct 30
Added T207707 as a subtask since it is about getting additional storage space without specifying any movement of the storage driver.
Mon, Oct 29
A git grep for 10.68 yields a good amount of stuff related to beta-cluster.
So it looks like
>>> os.utime('blazegraph-service-0.3.1-SNAPSHOT.war', None) >>> stat = os.lstat('blazegraph-service-0.3.1-SNAPSHOT.war') >>> os.utime('blazegraph-service-0.3.1-SNAPSHOT.war', (stat.st_atime, stat.st_mtime + 1)) Traceback (most recent call last): File "<stdin>", line 1, in <module> OSError: [Errno 1] Operation not permitted: 'blazegraph-service-0.3.1-SNAPSHOT.war'