Joe (Giuseppe Lavagetto)
Spy

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.
User Since
Oct 3 2014, 5:57 AM (124 w, 5 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
Unknown

Recent Activity

Today

Joe added a comment to T94239: Scap is lacking a license.

My preference for standalone tools is always the GPL v3, because there is no reason for people to use it in different contexts

Wed, Feb 22, 6:58 AM · Scap, Software-Licensing, WMF-Legal, Documentation

Mon, Feb 20

GitHub <noreply@github.com> committed rMSCP6091c057b77c: Merge pull request #161 from Ladsgroup/cp_disable_ores (authored by Joe).
Merge pull request #161 from Ladsgroup/cp_disable_ores
Mon, Feb 20, 5:19 PM

Fri, Feb 17

Joe added a comment to T125735: Warning: timed out after 0.2 seconds when connecting to rdb1001.eqiad.wmnet [110]: Connection timed out.

Hi! I'm the one who suggested most of those timeout changes. Some have different historical reasons, but I think we can safely raise the connect timeout for the jobrunners (NOT for the common appservers).

Fri, Feb 17, 7:26 AM · Operations, Wikimedia-log-errors

Wed, Feb 15

Joe moved T155823: Expand conftool to support multiple objects via a schema definition. from Blocked on others to Doing on the User-Joe board.
Wed, Feb 15, 4:40 PM · DC-Switchover-Prep-Q3-2016-17, Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations
Joe added a comment to T156922: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc).

warmup is one of the things that bounds our read-only time during the switchover, in that case we could start warming up wikis sorted by e.g. their pageviews to further shorten the acceptable read-only time.

That would significantly complicate the script as well as the actual switchover process. You'd have to deploy many changes to mw-config during the switchover to gradually read-only more and more wikis. The warmup script, meanwhile, takes less than a minute to run. I doubt we'd be reasonably saving any time considering the gradual read-only switching would have to be done manually and is about saving a subset of 50 seconds time.

Indeed, it seems a whole lot of effort for small gains over 50s. Do you know if we could simulate a warmup (and a wipe beforehand) in codfw given how it is configured today in mediawiki?

Wed, Feb 15, 11:50 AM · Performance-Team, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T143349: Deprecate precise instances in Labs by 03/31/2017.

I think we should simply drop 5.3 from the CI tests, then. I wasn't aware that the PHP versions had to be co-installable, which makes a custom 5.3 build for trusty a far more complicated endeavour.

Wed, Feb 15, 6:23 AM · Patch-For-Review, Labs-Infrastructure, Labs

Tue, Feb 14

Joe added a comment to T156023: Check the size of every cluster in codfw to see if it matches eqiad's capacity.

Also note that while for videoscalers and jobrunners it is advisable to reimage, in the other cases a simple change of role in puppet is ok.

Tue, Feb 14, 11:22 AM · Patch-For-Review, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T156023: Check the size of every cluster in codfw to see if it matches eqiad's capacity.

If the above counts are consistent, I'd to:

  1. reimage 3 appservers (40 cores) as api_appservers
  2. reimage 2 appservers (40 cores) as imagescalers
  3. reimage 1 appserver (40 cores) as jobrunner
  4. reimage 2 appservers (32 cores) as videoscalers

Seems sane to me to balance things a bit more in codfw

Tue, Feb 14, 11:22 AM · Patch-For-Review, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations

Fri, Feb 10

Joe added a comment to T127976: Graphite DC fail-over / per-DC setup.

So basically either the connection is kept open on the client side and the name is never looked up again, or the applications cache dns indefinitely.

Fri, Feb 10, 11:02 AM · Patch-For-Review, codfw-rollout, codfw-rollout-Jan-Mar-2016
Joe added a comment to T155098: Rework job queue usage for TimedMediaHandler (video scalers).

The prioritized queue is working well, but I'll probably raise the number of non-prioritized workers today as we're now underutilizing the systems.

Fri, Feb 10, 6:54 AM · WMF-deploy-2017-02-07_(1.29.0-wmf.11), WMF-deploy-2017-02-14_(1.29.0-wmf.12), Patch-For-Review, TimedMediaHandler-Transcode

Thu, Feb 9

Joe added a comment to T156009: Create an etcd cluster in codfw.

The codfw cluster is getting replicated data from eqiad under /eqiad.wmnet/conftool.

Thu, Feb 9, 7:36 AM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T156922: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc).

Another interesting possibility we might want to explore:

Thu, Feb 9, 7:23 AM · Performance-Team, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations

Mon, Feb 6

Joe closed T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30) as "Resolved".
Mon, Feb 6, 4:42 PM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe added a reverting commit for rMSCDac11ebec974e: ORES: reduce concurrency, disable various wikis: rMSCD5f932a398de5: Revert "ORES: reduce concurrency, disable various wikis".
Mon, Feb 6, 3:27 PM
Joe committed rMSCD5f932a398de5: Revert "ORES: reduce concurrency, disable various wikis" (authored by Joe).
Revert "ORES: reduce concurrency, disable various wikis"
Mon, Feb 6, 3:27 PM
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

Looking into it better, the api user wasn't a red herring after all; I am going to ban the use of oresscores from the mw api since:

Mon, Feb 6, 12:08 PM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

scratch what I said; the counter for etwiki is most likely broken.

Mon, Feb 6, 11:47 AM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

So, graphing ores.*.scores_request.*.count it shows most requests seem to come from etwiki, investigating this further. RechentChanges suggests this is not coming from any form of bot activity.

Mon, Feb 6, 11:41 AM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

From my further analysis of logs:

Mon, Feb 6, 11:21 AM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe committed rMSCDd917c2b92ec6: ORES: reduce concurrency, disable various wikis (authored by Joe).
ORES: reduce concurrency, disable various wikis
Mon, Feb 6, 9:36 AM
Joe committed rMSCDac11ebec974e: ORES: reduce concurrency, disable various wikis (authored by Joe).
ORES: reduce concurrency, disable various wikis
Mon, Feb 6, 9:36 AM
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

So after taking a quick look at ORES's logs: around 70% of requests come from changepropagation for "precaching". Also

Mon, Feb 6, 7:45 AM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES
Joe added a comment to T157206: ORES Overloaded (particularly 2017-02-05 02:25-02:30).

Before raising the number of workers for ORES:

Mon, Feb 6, 7:24 AM · WMF-deploy-2017-01-31_(1.29.0-wmf.10), WMF-deploy-2017-02-07_(1.29.0-wmf.11), Wikimedia-Incident, Patch-For-Review, Revision-Scoring-As-A-Service, Operations, Revision-Scoring-As-A-Service-Backlog, ORES

Thu, Feb 2

Joe added a comment to T156922: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc).

Correct me if I'm wrong, but I think the Main page call can be skipped for all non-standard-wiki-serving machines, so API, image/video scalers; also: do we really need to warm up APC for all of the wikis? Or could we target only the ones doing 99% of the traffic (which I guess are way less than that?).

Thu, Feb 2, 11:23 AM · Performance-Team, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations

Wed, Feb 1

Joe added a comment to T125069: Create a service location / discovery system for locating local/master resources easily across all WMF applications.

Duplicate of T149617

Wed, Feb 1, 5:26 PM · Services (next), User-Joe, Services-next, User-mobrovac, Operations, codfw-rollout, codfw-rollout-Jan-Mar-2016
Joe closed T125069: Create a service location / discovery system for locating local/master resources easily across all WMF applications as "Resolved".
Wed, Feb 1, 5:25 PM · Services (next), User-Joe, Services-next, User-mobrovac, Operations, codfw-rollout, codfw-rollout-Jan-Mar-2016
Joe created T156924: Allow integration of data from etcd into the MediaWiki configuration.
Wed, Feb 1, 4:45 PM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe created T156922: Prepare a reasonably performant warmup tool for MediaWiki caches (memcached/apc).
Wed, Feb 1, 4:11 PM · Performance-Team, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations

Mon, Jan 30

Joe added a comment to T156009: Create an etcd cluster in codfw.

The cluster in codfw is installed and tested to work correctly with conftool. The performance of the cluster using nginx as a TLS/proxy auth seems to be much better too.

Mon, Jan 30, 11:25 AM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T149617: Integrating MediaWiki (and other services) with dynamic configuration.

Yes, I am just unsure how / to who I can attribute the template design. That's what is blocking me at the moment.

Mon, Jan 30, 8:02 AM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe closed T149408: Asynchronous processing in production: one queue to rule them all as "Resolved".
Mon, Jan 30, 7:47 AM · Analytics, User-mobrovac, EventBus, Services (watching), Performance-Team, MediaWiki-JobQueue, ChangeProp, Operations, Wikimedia-Developer-Summit (2017)
Joe closed T149408: Asynchronous processing in production: one queue to rule them all, a subtask of T147937: Facilitate Wikidev'17 main topic "How to manage our technical debt", as "Resolved".
Mon, Jan 30, 7:46 AM · User-greg, Release-Engineering-Team, Wikimedia-Developer-Summit
Joe added a comment to T149408: Asynchronous processing in production: one queue to rule them all.

https://commons.wikimedia.org/wiki/File:Asynchronous_processing_on_the_WMF_cluster.pdf is the uploaded file.

Mon, Jan 30, 7:40 AM · Analytics, User-mobrovac, EventBus, Services (watching), Performance-Team, MediaWiki-JobQueue, ChangeProp, Operations, Wikimedia-Developer-Summit (2017)
Joe added a comment to T149617: Integrating MediaWiki (and other services) with dynamic configuration.
Mon, Jan 30, 7:03 AM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)

Fri, Jan 27

Joe added a comment to T149408: Asynchronous processing in production: one queue to rule them all.

Slides for the starting the discussion available here https://docs.google.com/presentation/d/1DCofLYbP1dWnTb1JWNNnsb0Zp_da8sBhDzlwjCXRoq8/edit?usp=sharing

I'll upload those to commons after the Developer Summit.

Did this happen? If not, could you please do so? :)

I moved the notes to https://www.mediawiki.org/wiki/Wikimedia_Developer_Summit/2017/Asynchronous_processing

Fri, Jan 27, 11:32 AM · Analytics, User-mobrovac, EventBus, Services (watching), Performance-Team, MediaWiki-JobQueue, ChangeProp, Operations, Wikimedia-Developer-Summit (2017)

Thu, Jan 26

Joe updated subscribers of T156356: Pages with an &stable=1 in their URL could not be viewed or edited.

@hashar rolled back to wmf.8 and I can confirm the pages I was looking at now render correctly.

Thu, Jan 26, 11:13 AM · Patch-For-Review, Release-Engineering-Team, Operations
Joe added projects to T156356: Pages with an &stable=1 in their URL could not be viewed or edited: MediaWiki-Releasing, Release-Engineering-Team.
Thu, Jan 26, 10:43 AM · Patch-For-Review, Release-Engineering-Team, Operations
Joe added a comment to T156356: Pages with an &stable=1 in their URL could not be viewed or edited.

I can reproduce the problem. Any idea since when is this happening?

It obviously isn't broken in wmf.8, otherwise de.wikipedia would be broken, too, which would not go unnoticed. So, it's a bug in wmf.9.

Thu, Jan 26, 10:41 AM · Patch-For-Review, Release-Engineering-Team, Operations
Joe added a comment to T156356: Pages with an &stable=1 in their URL could not be viewed or edited.

The error is the following:

Thu, Jan 26, 10:41 AM · Patch-For-Review, Release-Engineering-Team, Operations
Joe added a comment to T156356: Pages with an &stable=1 in their URL could not be viewed or edited.

I can reproduce the problem. Any idea since when is this happening?

Thu, Jan 26, 10:33 AM · Patch-For-Review, Release-Engineering-Team, Operations

Tue, Jan 24

Joe claimed T156009: Create an etcd cluster in codfw.
Tue, Jan 24, 8:36 AM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe moved T125069: Create a service location / discovery system for locating local/master resources easily across all WMF applications from Doing to Blocked on others on the User-Joe board.
Tue, Jan 24, 8:35 AM · Services (next), User-Joe, Services-next, User-mobrovac, Operations, codfw-rollout, codfw-rollout-Jan-Mar-2016
Joe closed T147402: Investigate ways to deploy docker to production as "Resolved".
Tue, Jan 24, 8:35 AM · Kubernetes-production-experiment, Prod-Kubernetes, User-Joe, Operations
Joe closed T147402: Investigate ways to deploy docker to production, a subtask of T147181: Docker installation for production kubernetes, as "Resolved".
Tue, Jan 24, 8:35 AM · Patch-For-Review, Kubernetes-production-experiment, Prod-Kubernetes, User-Joe, Operations
Joe moved T155823: Expand conftool to support multiple objects via a schema definition. from Doing to Blocked on others on the User-Joe board.
Tue, Jan 24, 8:34 AM · DC-Switchover-Prep-Q3-2016-17, Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations
Joe moved T149617: Integrating MediaWiki (and other services) with dynamic configuration from Doing to Blocked on others on the User-Joe board.
Tue, Jan 24, 8:34 AM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe moved T156009: Create an etcd cluster in codfw from Backlog to Doing on the User-Joe board.
Tue, Jan 24, 8:34 AM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations

Mon, Jan 23

Joe added a project to T154759: Pybal not happy with DNS delays: User-Joe.
Mon, Jan 23, 5:15 PM · User-Joe, Traffic, Pybal, Operations
Joe created T156023: Check the size of every cluster in codfw to see if it matches eqiad's capacity.
Mon, Jan 23, 4:51 PM · Patch-For-Review, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T154658: Prepare and improve the datacenter switchover procedure.

@Gilles will do today or tomorrow

Mon, Jan 23, 4:49 PM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a project to T155823: Expand conftool to support multiple objects via a schema definition.: DC-Switchover-Prep-Q3-2016-17.
Mon, Jan 23, 4:23 PM · DC-Switchover-Prep-Q3-2016-17, Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations
Joe added a project to T156009: Create an etcd cluster in codfw: User-Joe.
Mon, Jan 23, 3:30 PM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe created T156009: Create an etcd cluster in codfw.
Mon, Jan 23, 3:29 PM · Patch-For-Review, User-Joe, DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T149185: Decommission mw1152.

@Cmjohnson any news on this?

Mon, Jan 23, 3:05 PM · Patch-For-Review, User-Joe, ops-eqiad, Operations

Jan 22 2017

Joe moved T125069: Create a service location / discovery system for locating local/master resources easily across all WMF applications from Backlog to Doing on the User-Joe board.
Jan 22 2017, 10:43 AM · Services (next), User-Joe, Services-next, User-mobrovac, Operations, codfw-rollout, codfw-rollout-Jan-Mar-2016
Joe moved T149617: Integrating MediaWiki (and other services) with dynamic configuration from Backlog to Doing on the User-Joe board.
Jan 22 2017, 10:43 AM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe moved T152977: conftool service removal bugs from Backlog to Blocking others on the User-Joe board.
Jan 22 2017, 10:42 AM · User-Joe, Operations-Software-Development, Operations
Joe closed T155618: Parsoid timing out or failing when trying to parse specific user page as "Resolved".
Jan 22 2017, 10:38 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe moved T155618: Parsoid timing out or failing when trying to parse specific user page from Backlog to Doing on the User-Joe board.
Jan 22 2017, 10:38 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe moved T155823: Expand conftool to support multiple objects via a schema definition. from Backlog to Doing on the User-Joe board.
Jan 22 2017, 10:38 AM · DC-Switchover-Prep-Q3-2016-17, Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations
Joe moved T97192: HHVM request timeouts not working; support lowering the API request timeout per request from Doing to Blocking others on the User-Joe board.
Jan 22 2017, 10:38 AM · User-Joe, Operations, Services (watching), Wikimedia-Incident, Incident-20150423-Commons, HHVM, RESTBase, Parsoid, Availability, Performance, MediaWiki-API
Joe closed T152074: Separate clusters for asynchronous processing from the ones for public consumption, a subtask of T151702: API cluster failure / OOM, as "Resolved".
Jan 22 2017, 10:37 AM · Patch-For-Review, Parsing-Team, Wikimedia-Incident, HHVM, Release-Engineering-Team, WMF-NDA, Operations
Joe closed T152074: Separate clusters for asynchronous processing from the ones for public consumption as "Resolved".
Jan 22 2017, 10:37 AM · Mobile-Content-Service, Services (doing), Wikimedia-Incident, User-Joe, User-mobrovac, RESTBase, ChangeProp, Parsoid, Parsing-Team, HHVM, Operations
Joe added a comment to T149589: Puppet tab in Horizon unusably slow.

I would suggest, a few things:

Jan 22 2017, 8:04 AM · Patch-For-Review, Horizon, Operations, Puppet, Labs

Jan 21 2017

Joe awarded T149589: Puppet tab in Horizon unusably slow a Burninate token.
Jan 21 2017, 12:05 AM · Patch-For-Review, Horizon, Operations, Puppet, Labs

Jan 20 2017

Joe added projects to T149589: Puppet tab in Horizon unusably slow: Puppet, Operations.
Jan 20 2017, 2:51 PM · Patch-For-Review, Horizon, Operations, Puppet, Labs
Joe triaged T149589: Puppet tab in Horizon unusably slow as "Unbreak Now!" priority.
Jan 20 2017, 2:50 PM · Patch-For-Review, Horizon, Operations, Puppet, Labs
Joe added a comment to T149589: Puppet tab in Horizon unusably slow.

Today I wanted to go around horizon to check and refactor hiera keys before merging https://gerrit.wikimedia.org/r/#/c/332355/.

Jan 20 2017, 2:50 PM · Patch-For-Review, Horizon, Operations, Puppet, Labs
Joe created T155823: Expand conftool to support multiple objects via a schema definition..
Jan 20 2017, 12:18 PM · DC-Switchover-Prep-Q3-2016-17, Patch-For-Review, Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations
Joe closed T155768: Parsoid: fix logrotate as "Resolved".
Jan 20 2017, 10:39 AM · Patch-For-Review, User-mobrovac, Operations, Parsoid
Joe added a comment to T155768: Parsoid: fix logrotate.

Problem is now fixed and not just for parsoid.

Jan 20 2017, 10:39 AM · Patch-For-Review, User-mobrovac, Operations, Parsoid
Joe added a comment to T149617: Integrating MediaWiki (and other services) with dynamic configuration.

Extracting from the session outcomes:

Jan 20 2017, 10:37 AM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe added a comment to T155768: Parsoid: fix logrotate.

so, mystery solved.

Jan 20 2017, 7:19 AM · Patch-For-Review, User-mobrovac, Operations, Parsoid
Joe added a comment to T155768: Parsoid: fix logrotate.

@mobrovac when I read the task I was as surprised as you, given I remember we did create those rules correctly (although I think the copytruncate is on purpose).

Jan 20 2017, 6:54 AM · Patch-For-Review, User-mobrovac, Operations, Parsoid

Jan 19 2017

Joe added a comment to T155209: Increase $wgHTTPImportTimeout to a higher value on WMF wikis.

@Nemo_bis a blank page usually means something different than a timeout has happened. Probably a memory limit was hit; if we want to be able to import tens of thousands of revisions we might want to transform that into an async job instead, too.

Jan 19 2017, 9:36 AM · Operations, Wikimedia-General-or-Unknown
Joe added a comment to T155618: Parsoid timing out or failing when trying to parse specific user page.

@elukey apparently this needs a code deploy, which means accepting a pull request on github (sic) where not everyone from ops has the ability to merge a PR (I do as I'm an admin of the wikimedia github org, but YMMV), then you need to check that into the gerrit-based deploy repo, then restbase uses some ansible recipe (sic, again) to be deployed instead of scap3 or trebuchet.

Jan 19 2017, 9:24 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe added a comment to T155098: Rework job queue usage for TimedMediaHandler (video scalers).

@brion before the change to TMH goes into production, we also need to tweak the jobrunner setup in operations/puppet.

Jan 19 2017, 7:19 AM · WMF-deploy-2017-02-07_(1.29.0-wmf.11), WMF-deploy-2017-02-14_(1.29.0-wmf.12), Patch-For-Review, TimedMediaHandler-Transcode
Joe added a comment to T155098: Rework job queue usage for TimedMediaHandler (video scalers).

joe informs me the timeout problem is solved, which should help reduce the need to check for thresholds. :D

I don't see the timeout problem as solved. All big transcodes (720p, 1080p, and long ones) still fail.
And we have 360,000+ failed transcodes, and growing.

Jan 19 2017, 7:17 AM · WMF-deploy-2017-02-07_(1.29.0-wmf.11), WMF-deploy-2017-02-14_(1.29.0-wmf.12), Patch-For-Review, TimedMediaHandler-Transcode

Jan 18 2017

Joe added a subtask for T154658: Prepare and improve the datacenter switchover procedure: T149617: Integrating MediaWiki (and other services) with dynamic configuration.
Jan 18 2017, 5:11 PM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a parent task for T149617: Integrating MediaWiki (and other services) with dynamic configuration: T154658: Prepare and improve the datacenter switchover procedure.
Jan 18 2017, 5:11 PM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe added a comment to T155618: Parsoid timing out or failing when trying to parse specific user page.

Strace gives little more information, besides the fact for each of these pages parsoid does hundreds of preprocessing requests to the MW API. Maybe some recursion limit is reached?

Jan 18 2017, 9:30 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe added a comment to T155618: Parsoid timing out or failing when trying to parse specific user page.

Isolating a single request, I see that most of the time is spent in executing

Jan 18 2017, 9:04 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe claimed T155618: Parsoid timing out or failing when trying to parse specific user page.
Jan 18 2017, 8:54 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe renamed T155618: Parsoid timing out or failing when trying to parse specific user page from "Parsoid unable to parse specific user page" to "Parsoid timing out or failing when trying to parse specific user page".
Jan 18 2017, 8:53 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe created T155618: Parsoid timing out or failing when trying to parse specific user page.
Jan 18 2017, 8:45 AM · User-mobrovac, User-Joe, Parsoid, Operations
Joe added a subtask for T154658: Prepare and improve the datacenter switchover procedure: T132076: TTMServer should support multi-dc configuration.
Jan 18 2017, 7:19 AM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a parent task for T132076: TTMServer should support multi-dc configuration: T154658: Prepare and improve the datacenter switchover procedure.
Jan 18 2017, 7:19 AM · WMF-deploy-2017-02-28_(1.29.0-wmf.14), Patch-For-Review, Discovery-Search (Current work), Language-Engineering April-June 2016, Elasticsearch, Discovery, MediaWiki-extensions-Translate

Jan 17 2017

Joe added a subtask for T154658: Prepare and improve the datacenter switchover procedure: T139372: Set up oresrdb redis node in codfw.
Jan 17 2017, 8:45 AM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a parent task for T139372: Set up oresrdb redis node in codfw: T154658: Prepare and improve the datacenter switchover procedure.
Jan 17 2017, 8:45 AM · Operations, Revision-Scoring-As-A-Service-Backlog

Jan 11 2017

Joe triaged T149617: Integrating MediaWiki (and other services) with dynamic configuration as "Normal" priority.
Jan 11 2017, 7:40 PM · Wikimedia-Multiple-active-datacenters, Services (watching), Performance-Team, discovery-system, User-Joe, User-mobrovac, MediaWiki-Configuration, Operations, Wikimedia-Developer-Summit (2017)
Joe added a project to T154658: Prepare and improve the datacenter switchover procedure: DC-Switchover-Prep-Q3-2016-17.
Jan 11 2017, 5:32 PM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe edited Description on DC-Switchover-Prep-Q3-2016-17.
Jan 11 2017, 5:31 PM
Joe created DC-Switchover-Prep-Q3-2016-17.
Jan 11 2017, 5:30 PM

Jan 9 2017

Joe added a comment to T149408: Asynchronous processing in production: one queue to rule them all.

Slides for the starting the discussion available here https://docs.google.com/presentation/d/1DCofLYbP1dWnTb1JWNNnsb0Zp_da8sBhDzlwjCXRoq8/edit?usp=sharing

Jan 9 2017, 5:38 PM · Analytics, User-mobrovac, EventBus, Services (watching), Performance-Team, MediaWiki-JobQueue, ChangeProp, Operations, Wikimedia-Developer-Summit (2017)

Jan 7 2017

Joe added a comment to T154841: OTRS error (back up, now monitoring).

De-assigning from me as I'm going to hop on a plane in a few hours from now and I won't be able to follow through on monday.

Jan 7 2017, 11:54 AM · Wikimedia-Incident, Operations, OTRS
Joe placed T154841: OTRS error (back up, now monitoring) up for grabs.
Jan 7 2017, 11:53 AM · Wikimedia-Incident, Operations, OTRS
Joe added a comment to T154841: OTRS error (back up, now monitoring).

There was a huge error log for apache caused by an error in inserting a ticket in the history; I stopped apache, removed the file that was filling up the root filesystem, and started apache/otrs back again. Things look healthier from a server-side prespective, but I'm no expert on the application, so some error messages I see in the logs don't really make sense to me.

Jan 7 2017, 11:38 AM · Wikimedia-Incident, Operations, OTRS

Jan 5 2017

Joe created T154658: Prepare and improve the datacenter switchover procedure.
Jan 5 2017, 11:51 AM · DC-Switchover-Prep-Q3-2016-17, Epic, Wikimedia-Multiple-active-datacenters, Operations
Joe added a comment to T132076: TTMServer should support multi-dc configuration.

@Gehel any updates on this? I guess it's going to impact our switchover this time as well?

Jan 5 2017, 11:19 AM · WMF-deploy-2017-02-28_(1.29.0-wmf.14), Patch-For-Review, Discovery-Search (Current work), Language-Engineering April-June 2016, Elasticsearch, MediaWiki-extensions-Translate, Discovery
Joe closed T147718: RFC: New puppet code organization paradigm/coding standards as "Resolved".
Jan 5 2017, 11:10 AM · Patch-For-Review, RfC, Puppet, Operations