Joe (Giuseppe Lavagetto)
Spy

Projects (22)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 5:57 AM (198 w, 2 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
GLavagetto (WMF) [ Global Accounts ]

Recent Activity

Tue, Jul 17

Joe added a comment to T199594: Exception "Job queue is read-only".

Thanks for unearthing this, @Krinkle . This is probably the last thing we forgot to change in the JobQueue switch. So, both the JobQueue and the DB layer in MW are controlled by the $wgReadOnly variable, which is set to a truth-y value in codfw. It is read from EtcD.

It seems that the way forward would be to decouple these into multiple settings, since now we have the semantics of "enqueuing a job" and "executing a job" which don't happen in the same place any longer. @Joe, @Pchelolo what do you think?

Tue, Jul 17, 8:52 AM · MW-1.32-release-notes (WMF-deploy-2018-07-24 (1.32.0-wmf.14)), Patch-For-Review, Services (designing), User-Joe, Operations, Wikimedia-log-errors, Core-Platform-Team, WMF-JobQueue
Joe edited P3855 etcd_recovery_generator.py.
Tue, Jul 17, 6:06 AM · Operations
Joe edited P3855 etcd_recovery_generator.py.
Tue, Jul 17, 6:05 AM · Operations

Fri, Jul 13

Joe added a comment to T181208: Migrate translatewiki.net to PHP7.

@Joe I think you got a wrong task :)

Fri, Jul 13, 8:11 AM · translatewiki.net
elukey awarded T196968: Re-organize the apache configuration for MediaWiki in puppet a Love token.
Fri, Jul 13, 5:59 AM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Krinkle awarded T196968: Re-organize the apache configuration for MediaWiki in puppet a Orange Medal token.
Fri, Jul 13, 12:07 AM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations

Mon, Jul 9

Joe added a subtask for T176370: Migrate to PHP 7 in WMF production: T196968: Re-organize the apache configuration for MediaWiki in puppet.
Mon, Jul 9, 2:52 PM · Core-Platform-Team, TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
Joe added parent tasks for T196968: Re-organize the apache configuration for MediaWiki in puppet: T181208: Migrate translatewiki.net to PHP7, T176370: Migrate to PHP 7 in WMF production.
Mon, Jul 9, 2:52 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Joe added a subtask for T181208: Migrate translatewiki.net to PHP7: T196968: Re-organize the apache configuration for MediaWiki in puppet.
Mon, Jul 9, 2:52 PM · translatewiki.net

Fri, Jul 6

Joe added a comment to T118331: Alert when used_memory gets too high for redis queues.

Closing as declined as we've removed the redis-based jobqueue.

Fri, Jul 6, 12:49 PM · Patch-For-Review, Operations
Joe closed T118331: Alert when used_memory gets too high for redis queues as Declined.
Fri, Jul 6, 12:48 PM · Patch-For-Review, Operations

Thu, Jul 5

Joe added a project to T198256: RFC: Modern Event Platform - Choose Schema Tech: Operations.
Thu, Jul 5, 5:56 AM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Joe added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

Yeah, both protobufs and thrift are options, but neither have the advantages that Avro does, yet many of the same disadvantages.

Thu, Jul 5, 5:55 AM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics

Wed, Jul 4

Joe moved T196968: Re-organize the apache configuration for MediaWiki in puppet from Backlog to Doing on the User-Joe board.
Wed, Jul 4, 1:54 PM · User-Joe, Patch-For-Review, Wikimedia-Apache-configuration, Operations
Joe moved T196685: rack/setup/install rdb10[09|10].eqiad.wmnet from Backlog to Blocking others on the User-Joe board.
Wed, Jul 4, 8:42 AM · ops-eqiad, User-Joe, User-Elukey, Operations

Mon, Jul 2

Joe claimed T198220: Stop and remove old job runners.
Mon, Jul 2, 1:11 PM · WMF-JobQueue, Patch-For-Review, User-Joe, Operations, Services (watching), EventBus, Analytics
Joe moved T198220: Stop and remove old job runners from Backlog to Doing on the User-Joe board.
Mon, Jul 2, 1:11 PM · WMF-JobQueue, Patch-For-Review, User-Joe, Operations, Services (watching), EventBus, Analytics
Joe moved T197126: Create tool to handle the state of database configuration in MediaWiki in etcd from Backlog to Doing on the User-Joe board.
Mon, Jul 2, 1:03 PM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe added a comment to T198239: Rollout use of mcrouter for MediaWiki in production.

+1 to the overall plan; I'd like to see dates attached to the various steps now, so that we can have a clear schedule.

Mon, Jul 2, 9:34 AM · MW-1.32-release-notes (WMF-deploy-2018-07-17 (1.32.0-wmf.13)), Patch-For-Review, Availability (MediaWiki-MultiDC), Performance-Team

Sun, Jul 1

Joe renamed T197550: Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes from traaaaaaaa to Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes .
Sun, Jul 1, 6:09 AM · Patch-For-Review, Phabricator
Joe closed T197550: Remove approval requirement for new accounts, or patch everything in Phabricator to allow unapproved users to be treated as logged out for permissions purposes as Resolved.
Sun, Jul 1, 6:08 AM · Patch-For-Review, Phabricator
Joe renamed T196125: php-memcached 3.0 (PHP 7) incompatible with BagOStuff from evbaaaaaaa to php-memcached 3.0 (PHP 7) incompatible with BagOStuff.
Sun, Jul 1, 5:51 AM · MW-1.30-release-notes, MW-1.31-release-notes, MW-1.29-release-notes, MW-1.27-release-notes, MW-1.31-release, MW-1.32-release-notes (WMF-deploy-2018-06-05 (1.32.0-wmf.7)), Performance-Team, PHP 7.0 support, MediaWiki-Platform-Team, Operations

Wed, Jun 27

Joe added a comment to T184715: pybal's "can-depool" logic only takes downServers into account.

Reopened as this is still not fixed, see https://wikitech.wikimedia.org/wiki/Incident_documentation/20180626-LoadBalancers

Wed, Jun 27, 2:00 PM · Patch-For-Review, Pybal, Traffic, Operations
Joe reopened T184715: pybal's "can-depool" logic only takes downServers into account as "Open".
Wed, Jun 27, 1:58 PM · Patch-For-Review, Pybal, Traffic, Operations

Mon, Jun 25

Joe added a comment to T103886: Translation cache exhaustion caused by changes to PHP code in file scope.

Does that actually still make sense at this point? We'll get rid of HHVM in 6-9 months and we don't have current issues with the TC cache, while enabling it in general could actually expose some subtle bugs. Not opposing it per se, but wondering whether the benefit warrants the potential risks.

Mon, Jun 25, 6:33 AM · User-Joe, Patch-For-Review, Performance-Team (Radar), Release-Engineering-Team (Watching / External), Operations, Deployments, HHVM

Jun 20 2018

Joe added a member for acl*operations-team: Vgutierrez.
Jun 20 2018, 7:42 AM
Joe triaged T115945: status.wikimedia.org should not load Google Analytics as Normal priority.
Jun 20 2018, 7:32 AM · Security-Core, Operations, Privacy, monitoring
Joe added a comment to T115945: status.wikimedia.org should not load Google Analytics.

Hello @Ottomata. Ping @Dzahn and @BBlack.

The fact that this site is hosted by a third party does not seem to me a good reason to reject the request.

I don't understand why this intermediary leaves his own code loading google scripts on a subdomain of the foundation. What does that bring to us? Why can't we ask him to take this thing off? WIkimedia's policy is to respect users' privacy. On https://status.wikimedia.org/ nothing indicates that the site sends data to google. It would also be nice to have confirmation that it is in agreement with https://wikimediafoundation.org/wiki/Privacy_policy.

Please reconsider this request.

Jun 20 2018, 7:32 AM · Security-Core, Operations, Privacy, monitoring
Joe triaged T197630: decommission samarium.frack.eqiad.wmnet as Normal priority.
Jun 20 2018, 7:28 AM · ops-eqiad, Operations
Joe added a comment to T197237: Requesting access for mbsantos.

We also need @greg approval for adding people to deployers.

Jun 20 2018, 7:27 AM · Patch-For-Review, Analytics, SRE-Access-Requests, Operations
Joe triaged T192206: Remove wildcard vhost for *.wikimedia.org as Low priority.
Jun 20 2018, 7:23 AM · Patch-For-Review, Operations, Wikimedia-Apache-configuration, Traffic

Jun 19 2018

Joe closed T197676: Degraded RAID on ms-be1019 as Resolved.
Jun 19 2018, 12:53 PM · ops-eqiad, Operations
Joe added a comment to T197676: Degraded RAID on ms-be1019.

This seemed to be an issue with the smartarray controller; a simple hard reboot fixed the issue.

Jun 19 2018, 12:53 PM · ops-eqiad, Operations
Joe claimed T197126: Create tool to handle the state of database configuration in MediaWiki in etcd.
Jun 19 2018, 8:43 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe updated the task description for T197126: Create tool to handle the state of database configuration in MediaWiki in etcd.
Jun 19 2018, 8:43 AM · Patch-For-Review, User-Joe, MediaWiki-Configuration, Operations, DBA
Joe closed T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28) as Resolved.
Jun 19 2018, 8:20 AM · Operations, Release-Engineering-Team
Joe added a comment to T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28).

So there isn't much I can do right now, the situation recovered; I don't think it's reasonable to keep the old versions caches indeed, but we can manage the situation.

Jun 19 2018, 8:20 AM · Operations, Release-Engineering-Team
Joe triaged T183546: .dockerignore is not used to filter the build context as High priority.
Jun 19 2018, 8:14 AM · Patch-For-Review, User-Joe, docker-pkg

Jun 18 2018

Joe triaged T196751: labvirt1019 IPMI alert as Low priority.
Jun 18 2018, 2:54 PM · cloud-services-team, ops-eqiad, Operations, DC-Ops
Joe triaged T197084: Report problems found in server's IPMI SEL as Normal priority.
Jun 18 2018, 2:53 PM · Operations, monitoring
Joe triaged T197086: Report problems found by mcelog as Normal priority.
Jun 18 2018, 2:52 PM · monitoring, Operations
Joe triaged T197172: Improve outbound mail service alerting as High priority.
Jun 18 2018, 2:51 PM · User-herron, monitoring, Mail, Wikimedia-Incident, Operations
Joe triaged T197606: Degraded RAID on db2052 as Normal priority.
Jun 18 2018, 2:50 PM · Operations, ops-codfw
Joe triaged T197173: Ship MX logs to ELK as Normal priority.
Jun 18 2018, 2:49 PM · User-herron, Wikimedia-Logstash, Mail, Operations
Joe closed T197219: Logstash started showing full serialized log entry as a message as Resolved.
Jun 18 2018, 2:20 PM · Services (watching), Wikimedia-Logstash, Operations
Joe added a comment to T197219: Logstash started showing full serialized log entry as a message.

So after some reasoning:

Jun 18 2018, 2:20 PM · Services (watching), Wikimedia-Logstash, Operations
Joe claimed T197219: Logstash started showing full serialized log entry as a message.
Jun 18 2018, 1:22 PM · Services (watching), Wikimedia-Logstash, Operations
Joe added a comment to T197219: Logstash started showing full serialized log entry as a message.

The problem comes from https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/437864/

Jun 18 2018, 1:22 PM · Services (watching), Wikimedia-Logstash, Operations
Joe added a comment to T197219: Logstash started showing full serialized log entry as a message.

At a quick glance, neither Mediawiki-generated logs nor syslog generated ones show this issue. I can't find anything relevant in the SAL, but I'll try to dig deeper.

Jun 18 2018, 1:07 PM · Services (watching), Wikimedia-Logstash, Operations
Joe triaged T196474: Externalize tile storage for maps as Normal priority.
Jun 18 2018, 12:44 PM · Maps (Tilerator), Operations
Joe triaged T196417: Rack/Setup frbast2001.frack.codfw.wmnet as Normal priority.
Jun 18 2018, 12:43 PM · Patch-For-Review, ops-codfw, fundraising-tech-ops, Operations
Joe triaged T196252: Labservices1001 crashed as Normal priority.
Jun 18 2018, 12:42 PM · Patch-For-Review, ops-eqiad, cloud-services-team, Operations
Joe added a comment to T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28).

mwdebug2001 now has 8 free gigabytes, but one must wonder how we might need 22 GBs of space used by /srv/mediawiki; I guess we're doing something wrong there.

Jun 18 2018, 12:42 PM · Operations, Release-Engineering-Team
Joe claimed T197275: Scap error from mwdebug2001.codfw.wmnet: sync: write failed on "/srv/mediawiki/wmf-config/InitialiseSettings.php": No space left on device (28).
Jun 18 2018, 12:39 PM · Operations, Release-Engineering-Team
Joe triaged T197554: Update wikitech-static mediawiki version as Low priority.
Jun 18 2018, 12:33 PM · Operations
Joe closed T196943: Add MSantos to `ldap/wmf` as Resolved.
Jun 18 2018, 12:31 PM · Patch-For-Review, Operations, LDAP-Access-Requests
Joe added a comment to T196943: Add MSantos to `ldap/wmf`.

Done. You should be able to access the corresponding resources.

Jun 18 2018, 12:31 PM · Patch-For-Review, Operations, LDAP-Access-Requests
Joe added a comment to T197237: Requesting access for mbsantos.

Specifically, it would be useful to use the permissions of another person in your team as a blueprint ("I need the same level of access as X" would help us specify better which permissions you need).

Jun 18 2018, 11:13 AM · Patch-For-Review, Analytics, SRE-Access-Requests, Operations
Joe added a comment to T197237: Requesting access for mbsantos.

@MSantos while we wait to understand the specific accesses you need, can you please read and sign the L3 document? So I can proceed to create your user and also to add you to the LDAP group for wmf employees.

Jun 18 2018, 11:08 AM · Patch-For-Review, Analytics, SRE-Access-Requests, Operations
Joe claimed T196943: Add MSantos to `ldap/wmf`.
Jun 18 2018, 10:55 AM · Patch-For-Review, Operations, LDAP-Access-Requests
Joe updated the task description for T196886: Replace wtp1043's sda.
Jun 18 2018, 10:53 AM · DC-Ops, ops-eqiad, Operations
Joe edited projects for T196886: Replace wtp1043's sda, added: ops-eqiad, DC-Ops; removed monitoring.
Jun 18 2018, 10:51 AM · DC-Ops, ops-eqiad, Operations
Joe triaged T196886: Replace wtp1043's sda as Normal priority.
Jun 18 2018, 10:50 AM · DC-Ops, ops-eqiad, Operations
Joe renamed T196886: Replace wtp1043's sda from SMART checks fail on wtp1043's sda to Replace wtp1043's sda .
Jun 18 2018, 10:50 AM · DC-Ops, ops-eqiad, Operations
Joe triaged T196901: Replace memory bank on scb1002 as Low priority.
Jun 18 2018, 10:49 AM · Operations, ops-eqiad, DC-Ops
Joe triaged T196916: Phabricator outbound email seems to have a SPOF of mx1001 as High priority.
Jun 18 2018, 10:48 AM · Release-Engineering-Team (Watching / External), User-herron, Patch-For-Review, Wikimedia-Incident, Phabricator, Mail, Operations
Joe triaged T196920: Add email queueing/failover to services currently using mail_smarthost[0] as High priority.
Jun 18 2018, 10:48 AM · User-herron, Patch-For-Review, Wikimedia-Incident, Operations
Joe triaged T196547: Extension:JADE scalability concerns due to creating a page per revision as Normal priority.
Jun 18 2018, 10:47 AM · TechCom-RFC, DBA, Scoring-platform-team (Current), User-Joe, Operations, JADE
Joe added a comment to T183381: Deploy JADE extension to production.

See also T196547 where the discussion should probably continue

Jun 18 2018, 10:46 AM · Goal, Patch-For-Review, Services (watching), Operations, TechCom, Scoring-platform-team (Current), JADE
Joe updated subscribers of T183381: Deploy JADE extension to production.
Jun 18 2018, 10:44 AM · Goal, Patch-For-Review, Services (watching), Operations, TechCom, Scoring-platform-team (Current), JADE
Joe updated subscribers of T183381: Deploy JADE extension to production.

I would like this to wait for a review by the DBA and Traffic teams.

Jun 18 2018, 10:43 AM · Goal, Patch-For-Review, Services (watching), Operations, TechCom, Scoring-platform-team (Current), JADE
Joe closed T196654: Add CI namespace in staging k8s cluster as Resolved.
Jun 18 2018, 10:20 AM · Patch-For-Review, Release Pipeline, Operations, Release-Engineering-Team (Kanban)
Joe added a comment to T196654: Add CI namespace in staging k8s cluster.

I created a namespace called ci that you can deploy to using helm as long as you use the kubeconfig /etc/kubernetes/ci-staging.config, which is readable by contint-admins and the user jenkins-slave, so that the pipeline should be able to deploy using helm to that namespace.

Jun 18 2018, 10:20 AM · Patch-For-Review, Release Pipeline, Operations, Release-Engineering-Team (Kanban)
Joe claimed T196654: Add CI namespace in staging k8s cluster.
Jun 18 2018, 8:45 AM · Patch-For-Review, Release Pipeline, Operations, Release-Engineering-Team (Kanban)
Joe triaged T194855: Degraded RAID on labvirt1020 as Normal priority.
Jun 18 2018, 8:44 AM · ops-eqiad, Operations
Joe triaged T197470: find a way to systematically update the deployment server name across all repos as High priority.
Jun 18 2018, 8:41 AM · Release-Engineering-Team, Scap, Operations
Joe added a comment to T197503: Archive operations/puppet/varnishkafka repository.

@elukey since you did the work of removing the submodule, will you do the honours?

Jun 18 2018, 8:40 AM · Analytics, Operations, Cleanup
Joe triaged T197503: Archive operations/puppet/varnishkafka repository as Low priority.
Jun 18 2018, 8:40 AM · Analytics, Operations, Cleanup
Joe added a comment to T166937: Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002.

@elukey is this still ongoing? It's opened with priority high.

Jun 18 2018, 8:33 AM · Analytics, Operations
Joe removed a project from T166937: Broken /a/refinery-source/guard/run_all_guards.sh script on stat1002: Patch-For-Review.
Jun 18 2018, 8:33 AM · Analytics, Operations
Joe assigned T160060: Icinga check for sysctl settings to herron.
Jun 18 2018, 8:32 AM · User-herron, Patch-For-Review, monitoring, Icinga, Operations
Joe added a comment to T160060: Icinga check for sysctl settings.

@herron any news on this? I am assigning the ticket to you as you have an open patch for this.

Jun 18 2018, 8:32 AM · User-herron, Patch-For-Review, monitoring, Icinga, Operations
Joe closed T134893: Unhandled pybal error causing services to be depooled in etcd but not in lvs as Resolved.
Jun 18 2018, 8:31 AM · Patch-For-Review, Operations-Software-Development, Pybal, Operations, Traffic
Joe updated subscribers of T134893: Unhandled pybal error causing services to be depooled in etcd but not in lvs.

@ema @Vgutierrez AIUI this bug is resolved since we've fixed the EtcdConfigObserver class. Resolving this, please feel free to re-open it in case.

Jun 18 2018, 8:30 AM · Patch-For-Review, Operations-Software-Development, Pybal, Operations, Traffic
Joe added a project to T180183: Profiling for X-Wikimedia-Debug seems to start fairly late: User-Joe.
Jun 18 2018, 8:29 AM · User-Joe, Patch-For-Review, Performance-Team-notice, MediaWiki-Debug-Logger, Performance-Team
Joe added a comment to T180183: Profiling for X-Wikimedia-Debug seems to start fairly late.

@Krinkle I prepared a patch to use the auto prepend file on all appservers, not just the canaries. Should we deploy it once the deployment freeze is over?

Jun 18 2018, 8:29 AM · User-Joe, Patch-For-Review, Performance-Team-notice, MediaWiki-Debug-Logger, Performance-Team
Joe added a comment to T103886: Translation cache exhaustion caused by changes to PHP code in file scope.

will merge this change once we're out of the deployment freeze.

Jun 18 2018, 8:27 AM · User-Joe, Patch-For-Review, Performance-Team (Radar), Release-Engineering-Team (Watching / External), Operations, Deployments, HHVM
Joe claimed T103886: Translation cache exhaustion caused by changes to PHP code in file scope.
Jun 18 2018, 8:27 AM · User-Joe, Patch-For-Review, Performance-Team (Radar), Release-Engineering-Team (Watching / External), Operations, Deployments, HHVM
Joe renamed T196434: Allow Tarrow access to kibana from Add Tarrow to the ldap/nda group to Allow Tarrow access to kibana.
Jun 18 2018, 7:14 AM · Patch-For-Review, LDAP-Access-Requests
Joe updated subscribers of T196434: Allow Tarrow access to kibana.
Jun 18 2018, 6:54 AM · Patch-For-Review, LDAP-Access-Requests
Joe added a comment to T196434: Allow Tarrow access to kibana.

Hi @Tarrow you are indeed already part of the WMDE Ldap group, but not the NDA one, which is what you need.

Jun 18 2018, 6:50 AM · Patch-For-Review, LDAP-Access-Requests
Joe assigned T197237: Requesting access for mbsantos to herron.
Jun 18 2018, 6:31 AM · Patch-For-Review, Analytics, SRE-Access-Requests, Operations
Joe triaged T197564: cronspam for slow queries in PageAssessments as Low priority.
Jun 18 2018, 6:18 AM · Operations
Joe created T197564: cronspam for slow queries in PageAssessments.
Jun 18 2018, 6:18 AM · Operations
Joe changed the visibility for T150375: cronspam cleanup: Cron <www-data@terbium> /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null.
Jun 18 2018, 6:14 AM · Multimedia, MediaWiki-File-management, Commons, MediaWiki-Maintenance-scripts, Operations
Joe added a comment to T150375: cronspam cleanup: Cron <www-data@terbium> /usr/local/bin/foreachwiki maintenance/cleanupUploadStash.php > /dev/null.

I actually think the best way to handle this is to add a redirection of stderr and stdout to files, and to properly logrotate them.

Jun 18 2018, 6:08 AM · Multimedia, MediaWiki-File-management, Commons, MediaWiki-Maintenance-scripts, Operations
Joe updated subscribers of T197562: Replace disk on wasat.
Jun 18 2018, 6:03 AM · ops-codfw, DC-Ops, Operations
Joe triaged T197562: Replace disk on wasat as Normal priority.
Jun 18 2018, 6:03 AM · ops-codfw, DC-Ops, Operations
Joe created T197562: Replace disk on wasat.
Jun 18 2018, 6:03 AM · ops-codfw, DC-Ops, Operations
Joe assigned T196989: mailman listing unresponsive (fermium high latency) to herron.
Jun 18 2018, 5:41 AM · Patch-For-Review, Mail, Operations, Wikimedia-Mailing-lists