Joe (Giuseppe Lavagetto)
Spy

Projects (22)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 5:57 AM (206 w, 5 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
GLavagetto (WMF) [ Global Accounts ]

Recent Activity

Thu, Sep 13

Joe added a comment to T203039: Storage of data for recommendation API.

don't have libraries and abstractions for accessing MySQL from our nodejs services. Is that correct?

That's the easy part, node has great support for MySQL with an abundance of drivers and frameworks.

I believe the issue is that we don't have access to MySQL from node services and, honestly, I don't think we should have access to the main DB from node services, at least not for this use-case. If we had some other MySQL cluster that would be the best option.

Another consideration is that here we're distributing pre-learned AI model, I believe there should be industry standards or best practices on how to deploy such data, it's not my area of expertise though. @bmansurov are you aware of any?

Thu, Sep 13, 1:23 PM · Operations, DBA, Services (designing), Research
Joe added a comment to T203959: SRE quarterly goal: allow MediaWiki requests to be served by PHP7 alongside HHVM.

@Legoktm I'm ok delaying this into next quarter, or even the one after that; but I think php 7.2 is indeed a possibility; there are packages that should be easy to backport so if support in MediaWiki is there by next quarter, I'd be happy to work on this :)

Thu, Sep 13, 6:29 AM · Operations

Wed, Sep 12

Joe added a comment to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.".

Ok, I think I found the issue:

Wed, Sep 12, 6:43 AM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services

Tue, Sep 11

Joe added a comment to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.".

So the only config difference is that in codfw we call the mediawiki API via HTTPS, while in eqiad we call it via HTTP and I think there are some subtle differences to how we do it, that might explain why wikitech would fail via https - it's not hosted on the main cluster.

Tue, Sep 11, 9:19 PM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services
Joe added a comment to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.".

RB or Parsoid mis-configuration?

Tue, Sep 11, 9:11 PM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services
Joe added a comment to T203039: Storage of data for recommendation API.

AIUI, the reason why we're not using MySQL (which would probably fit this storage model as well, if not better than cassandra) is just that we don't have libraries and abstractions for accessing MySQL from our nodejs services. Is that correct?

Tue, Sep 11, 4:55 PM · Operations, DBA, Services (designing), Research

Mon, Sep 10

Joe added a comment to T203674: Debian package or files managed my puppet for pt-kill-wmf.

This is not true if a binary debian package is built, as proposed. In fact, you can consider a binary-only package (built with dpkg-deb) a glorified tarball.

I don't see how? I said to banyek that if a .deb was decided, we would source-control it in a new repo on operations/software/package name or operation/debs/package name (e.g. like we do with pybal). Leaving intact the upstream control, and pushing on top our changes. Doing that on puppet source control is quite messing- do you want to import all upsteam changes on our puppet? Patch changes? Separate there is a better control.

Mon, Sep 10, 4:35 PM · Puppet, Operations
Joe added a comment to T203959: SRE quarterly goal: allow MediaWiki requests to be served by PHP7 alongside HHVM.

We're probably not going to get to the stretch goals, but it should be noted that MediaWiki is still not ready to run on PHP 7.2 itself, so we don't really have an alternative and we need to stick to 7.0 for now.

Mon, Sep 10, 12:58 PM · Operations
Joe created T203959: SRE quarterly goal: allow MediaWiki requests to be served by PHP7 alongside HHVM.
Mon, Sep 10, 12:56 PM · Operations
Joe triaged T203944: Create a spicerack cookbook for restoring an etcd cluster from backups as Normal priority.
Mon, Sep 10, 9:26 AM · Operations-Software-Development, User-jijiki, User-Joe, Operations
Joe created T203943: Convert automation scripts to spicerack cookbooks.
Mon, Sep 10, 9:22 AM · Operations-Software-Development, User-jijiki, User-Joe, Operations
Joe created T203932: Old links to the donate page on wikimediafoundation.org get redirected weirdly.
Mon, Sep 10, 6:19 AM · fundraising-tech-ops, wikimediafoundation.org

Fri, Sep 7

Joe added a comment to T203674: Debian package or files managed my puppet for pt-kill-wmf.

So my initial suggestion was to create a debian package for the following reasons:

  • Source control patches on a separate repo so upgrades are easy (giiven the package is mostly a patched upstream source with extra patches that fix our bugs or add new functionality)
Fri, Sep 7, 8:48 AM · Puppet, Operations
Joe added a comment to T192370: Deploy mcrouter to production as a wancache backend.

Note that memcached-pecl (which uses Nutcracker) is still used in wmf-config in two places:

  1. On all wikis, for parser cache. [wmf-config/CommonSettings.php#mysql-multiwrite]
Fri, Sep 7, 6:43 AM · Patch-For-Review, Performance-Team (Radar), Availability (MediaWiki-MultiDC), Operations

Thu, Sep 6

Marostegui awarded T199124: Remove all usages of $::mw_primary on puppet a Mountain of Wealth token.
Thu, Sep 6, 9:42 AM · Patch-For-Review, Puppet, DBA, Operations
Joe closed T203479: labtestweb2001: Memcached error for key on server "127.0.0.1:11213": SERVER HAS FAILED as Resolved.
Thu, Sep 6, 9:16 AM · Patch-For-Review, wikitech.wikimedia.org, Wikimedia-production-error, cloud-services-team
Joe added a comment to T203479: labtestweb2001: Memcached error for key on server "127.0.0.1:11213": SERVER HAS FAILED.

The problem is that labswebtest machines are configured to use labstestwiki, and that we didn't configure those to use their local nutcracker, but the global mcrouter, which doesn't make any sense.

Thu, Sep 6, 8:50 AM · Patch-For-Review, wikitech.wikimedia.org, Wikimedia-production-error, cloud-services-team
Joe claimed T203479: labtestweb2001: Memcached error for key on server "127.0.0.1:11213": SERVER HAS FAILED.
Thu, Sep 6, 8:49 AM · Patch-For-Review, wikitech.wikimedia.org, Wikimedia-production-error, cloud-services-team
Joe added a comment to T203626: deploy1001 can't talk to memcached, breaking invalidation of RL localization cache.

Clearly I just forgot to merge a change at the time of the mcrouter rollout, sorry about that.

Thu, Sep 6, 6:21 AM · Performance-Team, Operations
Joe claimed T203626: deploy1001 can't talk to memcached, breaking invalidation of RL localization cache.
Thu, Sep 6, 6:04 AM · Performance-Team, Operations

Thu, Aug 23

Eevans awarded T201804: restbase2003 has a broken disk (at least) a Cookie token.
Thu, Aug 23, 4:51 PM · Services (watching), ops-codfw, Operations
Joe added a comment to T202476: Give thiemowmde permission to upload wikidiff2 releases (releasers-wikidiff2).
  • I don't know what "WMCS" stands for, despite working with Wikimedia infrastructure for about a decade.
  • I find it hard to tell what the "cloud" in "WMF cloud services" includes. Aren't the production servers also in some kind of "cloud"? Isn't everything in the "cloud" nowadays? What's the meaning of this word?
  • L3 states to use different keys for "production" and "labs", but does not explain what "labs" includes. Gerrit, for example, is not on wmflabs.org but on wikimedia.org. Does this still count as being part of "labs"?

    I would highly appreciate a language more people are able to understand, if that's possible. Avoiding abbreviations is always a good start. Thank you.

    And again, what's the benefit if the two are stored in the same key pass anyway? (Something I assume most people do.)
Thu, Aug 23, 9:22 AM · Patch-For-Review, Operations, SRE-Access-Requests, User-Addshore, wikidiff2

Wed, Aug 22

Joe created T202504: Evaluate VMWare's Harbour as a docker registry.
Wed, Aug 22, 10:24 AM · Continuous-Integration-Infrastructure (shipyard), Kubernetes, Operations

Aug 17 2018

Joe created T202149: Exception thrown for failure to save settings appears ~ 1000 times/day.
Aug 17 2018, 4:24 PM · MW-1.32-release-notes (WMF-deploy-2018-09-18 (1.32.0-wmf.22)), Patch-For-Review, BetaFeatures, Wikimedia-production-error, MediaWiki-Authentication-and-authorization

Aug 16 2018

Joe added a comment to T201986: cassandra-a instance on aqs1007 is not starting.

For the record: I removed the file (still on disk at /srv/cassandra-a/commitlog/CommitLog-5-1530620590775.log.bak once I noticed it was all zeroes.

Aug 16 2018, 6:59 AM · Cassandra, Operations

Aug 14 2018

Joe merged T201855: onboarding Effie Mouzeli into T201816: Onboarding Effie Mouzeli.
Aug 14 2018, 12:46 PM · SRE-Access-Requests, Patch-For-Review, Operations
Joe merged task T201855: onboarding Effie Mouzeli into T201816: Onboarding Effie Mouzeli.
Aug 14 2018, 12:46 PM · Operations
Joe added a comment to T201409: Harmonise the identification of requests across our stack.

We also need internal requests to be traced, so I would assume we need all services to generate a request Id whenever they receive a request that has none.

Aug 14 2018, 4:44 AM · Performance-Team (Radar), Patch-For-Review, Operations, Services (designing), TechCom-RFC, User-mobrovac, Traffic

Aug 13 2018

Joe triaged T201849: Request production global root access for Effie Mouzeli as Normal priority.
Aug 13 2018, 3:19 PM · Patch-For-Review, SRE-Access-Requests, Operations
Joe claimed T201816: Onboarding Effie Mouzeli.
Aug 13 2018, 10:00 AM · SRE-Access-Requests, Patch-For-Review, Operations
Joe created T201816: Onboarding Effie Mouzeli.
Aug 13 2018, 10:00 AM · SRE-Access-Requests, Patch-For-Review, Operations
Joe added a comment to T110209: Maintenance scripts should fail on unknown parameters.

There was one case of failure - one of the dumps scripts failed (see T201772). I would call this a success!

Aug 13 2018, 7:34 AM · MW-1.32-release-notes (WMF-deploy-2018-08-07 (1.32.0-wmf.16)), Patch-For-Review, Core-Platform-Team (CPT-Q1-Jul-Sep-2018), Wikimedia-Incident, Incident-20150825-Redis, MediaWiki-Maintenance-scripts
Joe created T201804: restbase2003 has a broken disk (at least).
Aug 13 2018, 6:53 AM · Services (watching), ops-codfw, Operations

Aug 7 2018

Joe added a comment to T164609: Merge cache_misc into cache_text functionally.

Sometimes we get 503 peaks from a cache_misc application like phabricator or gerrit; knowing the origin of the 5xxs in broad categories ("public traffic for the sites" vs "miscellanea") was very useful IMHO; do we have a way to preserve such information?

Aug 7 2018, 7:16 AM · Patch-For-Review, Operations, Traffic

Aug 3 2018

Joe added a comment to T201103: Reconsider use of RESTBase k-r-v storage for mobileapps.

[ ... ]

  • What cache hit ratio we have at the restbase layer for MCS-related entities

.
Assuming I understand you correctly, the answer is essentially 100% here, because we pre-generate everything.

Aug 3 2018, 3:01 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, Services (designing), RESTBase, Cassandra, User-Eevans
Joe added a comment to T201103: Reconsider use of RESTBase k-r-v storage for mobileapps.

I have a few comments on this topic. Specifically:

Aug 3 2018, 2:47 PM · Patch-For-Review, Reading-Infrastructure-Team-Backlog, Services (designing), RESTBase, Cassandra, User-Eevans
Joe added a comment to T201140: Puppetize the installation of PHP-FPM on the MediaWiki hosts.

Looking at the modules tagged php on puppetforge:

Aug 3 2018, 10:06 AM · Patch-For-Review, User-Joe, User-ArielGlenn, Operations
Joe moved T201140: Puppetize the installation of PHP-FPM on the MediaWiki hosts from Backlog to Doing on the User-Joe board.
Aug 3 2018, 9:55 AM · Patch-For-Review, User-Joe, User-ArielGlenn, Operations
Joe moved T197126: Create tool to handle the state of database configuration in MediaWiki in etcd from Doing to Blocked on others on the User-Joe board.
Aug 3 2018, 9:55 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe moved T198220: Stop and remove old job runners from Doing to Blocking others on the User-Joe board.
Aug 3 2018, 9:54 AM · WMF-JobQueue, Patch-For-Review, User-Joe, Operations, Services (watching), EventBus, Analytics
Joe claimed T201140: Puppetize the installation of PHP-FPM on the MediaWiki hosts.
Aug 3 2018, 9:54 AM · Patch-For-Review, User-Joe, User-ArielGlenn, Operations
Joe updated the task description for T201139: Intermittent connectivity issues in eqiad's row C.
Aug 3 2018, 8:47 AM · netops, Operations
Joe triaged T201140: Puppetize the installation of PHP-FPM on the MediaWiki hosts as Normal priority.
Aug 3 2018, 8:31 AM · Patch-For-Review, User-Joe, User-ArielGlenn, Operations

Aug 2 2018

Joe closed T115899: Move scap target configuration to etcd, a subtask of T80395: Update dsh node groups from puppet, as Resolved.
Aug 2 2018, 1:34 PM · Operations
Joe closed T115899: Move scap target configuration to etcd as Resolved.
Aug 2 2018, 1:34 PM · Scap (Scap3-MediaWiki-MVP), scap2, Operations
Joe added a comment to T115899: Move scap target configuration to etcd.

We ended up generating the dsh lists in production from etcd, which is ok as a solution without asking scap to know about its details. I think we can close this ticket.

Aug 2 2018, 1:34 PM · Scap (Scap3-MediaWiki-MVP), scap2, Operations

Aug 1 2018

Joe closed T200799: Add email addresses for new techcom members to techcom@wikimedia.org as Resolved.
Aug 1 2018, 6:51 AM · Operations

Jul 31 2018

Joe claimed T200799: Add email addresses for new techcom members to techcom@wikimedia.org.
Jul 31 2018, 1:54 PM · Operations
Joe created T200799: Add email addresses for new techcom members to techcom@wikimedia.org.
Jul 31 2018, 1:53 PM · Operations
Joe added a comment to T196968: Re-organize the apache configuration for MediaWiki in puppet.

@Krenair I think I will just reproduce the patches I did to the mediawiki_test environment in the main one, that looks safer given we already know those patches are ok.

Jul 31 2018, 10:25 AM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Joe added a comment to T200720: docker-pkg should attempt to pull dependent images from the registry.

I guess we should add a command line switch to jump between the two behaviours.

Jul 31 2018, 10:22 AM · docker-pkg
Joe added a comment to T200722: releng/mediawiki-phpcs-dryrun fails to upload to docker-registry.wikimedia.org.

Hi, I'm not sure I understand what's the behaviour you would prefer.

Jul 31 2018, 10:08 AM · Operations, Patch-For-Review, docker-pkg

Jul 26 2018

Ladsgroup awarded T197126: Create tool to handle the state of database configuration in MediaWiki in etcd a Love token.
Jul 26 2018, 11:56 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA

Jul 17 2018

Joe added a comment to T199594: Exception "Job queue is read-only".

Thanks for unearthing this, @Krinkle . This is probably the last thing we forgot to change in the JobQueue switch. So, both the JobQueue and the DB layer in MW are controlled by the $wgReadOnly variable, which is set to a truth-y value in codfw. It is read from EtcD.

It seems that the way forward would be to decouple these into multiple settings, since now we have the semantics of "enqueuing a job" and "executing a job" which don't happen in the same place any longer. @Joe, @Pchelolo what do you think?

Jul 17 2018, 8:52 AM · Services (done), MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), User-Joe, Operations, Wikimedia-production-error, Core-Platform-Team, WMF-JobQueue
Joe edited P3855 etcd_recovery_generator.py.
Jul 17 2018, 6:06 AM · Operations
Joe edited P3855 etcd_recovery_generator.py.
Jul 17 2018, 6:05 AM · Operations

Jul 13 2018

Joe added a comment to T181208: Migrate translatewiki.net to PHP7.

@Joe I think you got a wrong task :)

Jul 13 2018, 8:11 AM · translatewiki.net
elukey awarded T196968: Re-organize the apache configuration for MediaWiki in puppet a Love token.
Jul 13 2018, 5:59 AM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Krinkle awarded T196968: Re-organize the apache configuration for MediaWiki in puppet a Orange Medal token.
Jul 13 2018, 12:07 AM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations

Jul 9 2018

Joe added a subtask for T176370: Migrate to PHP 7 in WMF production: T196968: Re-organize the apache configuration for MediaWiki in puppet.
Jul 9 2018, 2:52 PM · Core-Platform-Team, TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
Joe added parent tasks for T196968: Re-organize the apache configuration for MediaWiki in puppet: T181208: Migrate translatewiki.net to PHP7, T176370: Migrate to PHP 7 in WMF production.
Jul 9 2018, 2:52 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Joe added a subtask for T181208: Migrate translatewiki.net to PHP7: T196968: Re-organize the apache configuration for MediaWiki in puppet.
Jul 9 2018, 2:52 PM · translatewiki.net

Jul 6 2018

Joe added a comment to T118331: Alert when used_memory gets too high for redis queues.

Closing as declined as we've removed the redis-based jobqueue.

Jul 6 2018, 12:49 PM · Patch-For-Review, Operations
Joe closed T118331: Alert when used_memory gets too high for redis queues as Declined.
Jul 6 2018, 12:48 PM · Patch-For-Review, Operations

Jul 5 2018

Joe added a project to T198256: RFC: Modern Event Platform - Choose Schema Tech: Operations.
Jul 5 2018, 5:56 AM · TechCom-RFC (TechCom-Approved), Operations, Services (designing), Analytics-EventLogging, EventBus, Analytics
Joe added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

Yeah, both protobufs and thrift are options, but neither have the advantages that Avro does, yet many of the same disadvantages.

Jul 5 2018, 5:55 AM · TechCom-RFC (TechCom-Approved), Operations, Services (designing), Analytics-EventLogging, EventBus, Analytics

Jul 4 2018

Joe moved T196968: Re-organize the apache configuration for MediaWiki in puppet from Backlog to Doing on the User-Joe board.
Jul 4 2018, 1:54 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Joe moved T196685: rack/setup/install rdb10[09|10].eqiad.wmnet from Backlog to Blocking others on the User-Joe board.
Jul 4 2018, 8:42 AM · User-Joe, User-Elukey, Operations

Jul 2 2018

Joe claimed T198220: Stop and remove old job runners.
Jul 2 2018, 1:11 PM · WMF-JobQueue, Patch-For-Review, User-Joe, Operations, Services (watching), EventBus, Analytics
Joe moved T198220: Stop and remove old job runners from Backlog to Doing on the User-Joe board.
Jul 2 2018, 1:11 PM · WMF-JobQueue, Patch-For-Review, User-Joe, Operations, Services (watching), EventBus, Analytics
Joe moved T197126: Create tool to handle the state of database configuration in MediaWiki in etcd from Backlog to Doing on the User-Joe board.
Jul 2 2018, 1:03 PM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe added a comment to T198239: Rollout use of mcrouter for MediaWiki in production.

+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.

Jul 2 2018, 9:34 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Jul 1 2018

Joe renamed T197550: Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes from traaaaaaaa to Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes .
Jul 1 2018, 6:09 AM · Phabricator
Joe closed T197550: Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes as Resolved.
Jul 1 2018, 6:08 AM · Phabricator
Joe renamed T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff from evbaaaaaaa to php-memcached 3.0 (PHP 7) incompatible with BagOStuff.
Jul 1 2018, 5:51 AM · MW-1.30-release-notes, MW-1.31-release-notes, MW-1.29-release-notes, MW-1.27-release-notes, MW-1.31-release, MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Performance-Team, PHP 7.0 support, MediaWiki-Platform-Team, Operations

Jun 27 2018

Joe added a comment to T184715: pybal's "can-depool" logic only takes downServers into account.

Reopened as this is still not fixed, see https://wikitech.wikimedia.org/wiki/Incident_documentation/20180626-LoadBalancers

Jun 27 2018, 2:00 PM · Patch-For-Review, Pybal, Traffic, Operations
Joe reopened T184715: pybal's "can-depool" logic only takes downServers into account as "Open".
Jun 27 2018, 1:58 PM · Patch-For-Review, Pybal, Traffic, Operations

Jun 25 2018

Joe added a comment to T103886: Translation cache exhaustion caused by changes to PHP code in file scope.

Does that actually still make sense at this point? We'll get rid of HHVM in 6-9 months and we don't have current issues with the TC cache, while enabling it in general could actually expose some subtle bugs. Not opposing it per se, but wondering whether the benefit warrants the potential risks.

Jun 25 2018, 6:33 AM · User-Joe, Performance-Team (Radar), Release-Engineering-Team (Watching / External), Operations, Deployments, HHVM

Jun 20 2018

Joe added a member for acl*sre-team: Vgutierrez.
Jun 20 2018, 7:42 AM
Joe triaged T115945: status.wikimedia.org should not load Google Analytics as Normal priority.
Jun 20 2018, 7:32 AM · Security-Core, Operations, Privacy, monitoring
Joe added a comment to T115945: status.wikimedia.org should not load Google Analytics.

Hello @Ottomata. Ping @Dzahn and @BBlack.

The fact that this site is hosted by a third party does not seem to me a good reason to reject the request.

I don't understand why this intermediary leaves his own code loading google scripts on a subdomain of the foundation. What does that bring to us? Why can't we ask him to take this thing off? WIkimedia's policy is to respect users' privacy. On https://status.wikimedia.org/ nothing indicates that the site sends data to google. It would also be nice to have confirmation that it is in agreement with https://wikimediafoundation.org/wiki/Privacy_policy.

Please reconsider this request.

Jun 20 2018, 7:32 AM · Security-Core, Operations, Privacy, monitoring
Joe triaged T197630: decommission samarium.frack.eqiad.wmnet as Normal priority.
Jun 20 2018, 7:28 AM · Patch-For-Review, Operations, ops-eqiad
Joe added a comment to T197237: Requesting access for mbsantos.

We also need @greg approval for adding people to deployers.

Jun 20 2018, 7:27 AM · Patch-For-Review, Analytics, Operations, SRE-Access-Requests
Joe triaged T192206: Remove wildcard vhost for *.wikimedia.org as Low priority.
Jun 20 2018, 7:23 AM · Patch-For-Review, Operations, Wikimedia-Apache-configuration, Traffic

Jun 19 2018

Joe closed T197676: Degraded RAID on ms-be1019 as Resolved.
Jun 19 2018, 12:53 PM · ops-eqiad, Operations
Joe added a comment to T197676: Degraded RAID on ms-be1019.

This seemed to be an issue with the smartarray controller; a simple hard reboot fixed the issue.

Jun 19 2018, 12:53 PM · ops-eqiad, Operations
Joe claimed T197126: Create tool to handle the state of database configuration in MediaWiki in etcd.
Jun 19 2018, 8:43 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe updated the task description for T197126: Create tool to handle the state of database configuration in MediaWiki in etcd.
Jun 19 2018, 8:43 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe closed T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28) as Resolved.
Jun 19 2018, 8:20 AM · Operations, Release-Engineering-Team
Joe added a comment to T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28).

So there isn't much I can do right now, the situation recovered; I don't think it's reasonable to keep the old versions caches indeed, but we can manage the situation.

Jun 19 2018, 8:20 AM · Operations, Release-Engineering-Team
Joe triaged T183546: .dockerignore is not used to filter the build context as High priority.
Jun 19 2018, 8:14 AM · Patch-For-Review, User-Joe, docker-pkg

Jun 18 2018

Joe triaged T196751: labvirt1019 IPMI alert as Low priority.
Jun 18 2018, 2:54 PM · cloud-services-team, ops-eqiad, DC-Ops, Operations
Joe triaged T197084: Report problems found in server's IPMI SEL as Normal priority.
Jun 18 2018, 2:53 PM · Operations, monitoring
Joe triaged T197086: Report problems found by mcelog as Normal priority.
Jun 18 2018, 2:52 PM · monitoring, Operations
Joe triaged T197172: Improve outbound mail service alerting as High priority.
Jun 18 2018, 2:51 PM · User-herron, monitoring, Mail, Wikimedia-Incident, Operations
Joe triaged T197606: Degraded RAID on db2052 as Normal priority.
Jun 18 2018, 2:50 PM · Operations, ops-codfw
Joe triaged T197173: Ship MX logs to ELK as Normal priority.
Jun 18 2018, 2:49 PM · User-herron, Wikimedia-Logstash, Mail, Operations
Joe closed T197219: Logstash started showing full serialized log entry as a message as Resolved.
Jun 18 2018, 2:20 PM · Services (watching), Wikimedia-Logstash, Operations
Joe added a comment to T197219: Logstash started showing full serialized log entry as a message.

So after some reasoning:

Jun 18 2018, 2:20 PM · Services (watching), Wikimedia-Logstash, Operations
Joe claimed T197219: Logstash started showing full serialized log entry as a message.
Jun 18 2018, 1:22 PM · Services (watching), Wikimedia-Logstash, Operations