Joe (Giuseppe Lavagetto)
Spy

Projects (23)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 5:57 AM (224 w, 2 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
GLavagetto (WMF) [ Global Accounts ]

Recent Activity

Today

Joe closed T200720: docker-pkg should attempt to pull dependent images from the registry as Resolved.
Sun, Jan 20, 12:04 PM · Release-Engineering-Team (Kanban), Patch-For-Review, docker-pkg

Fri, Jan 18

Joe added a comment to T207703: Pruning docker-pkg images.

Definitely. I'd also recommend keeping the previous version as well just in case we have to revert.

Fri, Jan 18, 10:42 AM · Patch-For-Review, docker-pkg, Continuous-Integration-Infrastructure
Joe closed T214149: Backport pygerrit2 to Debian Stretch as Resolved.
Fri, Jan 18, 9:31 AM · Operations, Release-Engineering-Team, Performance-Team
Joe closed T214149: Backport pygerrit2 to Debian Stretch, a subtask of T214015: Create gerrit bot for git pushes to specific repo from prod machines, as Resolved.
Fri, Jan 18, 9:31 AM · Release-Engineering-Team, Performance-Team
Joe claimed T214149: Backport pygerrit2 to Debian Stretch.
Fri, Jan 18, 9:21 AM · Operations, Release-Engineering-Team, Performance-Team

Thu, Jan 17

Joe added a comment to T213934: Set up a beta feature offering the use of PHP7.
Please [[phab:|report bugs]] if you see them.

[...] Small nitpick: Please avoid linking to the generic Phab front page. :) [https://phabricator.wikimedia.org/maniphest/task/edit/form/1/?projects=projecttag1,projecttag2 foo] could be more helpful (full URL as ? breaks stuff when using [[internal|links]]).

That's what the old entry did, except it linked directly to Bugzilla.

Is there any specific task/project Tech News should link to?

Thu, Jan 17, 10:00 AM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Joe updated subscribers of T213934: Set up a beta feature offering the use of PHP7.

@Krinkle did mention he saw a couple fatal errors that looked worrisome, so I'd wait for him to comment before backporting the beta feature and announcing it.

Thu, Jan 17, 9:54 AM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Joe added a comment to T213934: Set up a beta feature offering the use of PHP7.

Very good point @Mainframe98 - in fact I was planning to write an email to wikitech-l once the beta feature is set up and I have the green light from everyone involved in the project before activating it.

Thu, Jan 17, 9:39 AM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Joe added a comment to T213318: Wikibase Front-End Architecture.

So, in conclusion, Wikidata has a lot of edits, but several magnitudes fewer views than a Wikipedia of comparable size. So, while MediaWiki generally optimizes for heavy read loads, the Wikidata UI should be optimized for frequent edits, but doesn't have to worry about performance of reads too terribly much. It may for instance be feasible to entirely bypass or ignore varnish caching for Wikidata.

Thu, Jan 17, 6:22 AM · Wikidata, TechCom-RFC
Ladsgroup awarded T213934: Set up a beta feature offering the use of PHP7 a Love token.
Thu, Jan 17, 4:39 AM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations

Wed, Jan 16

Joe added a comment to T213963: Include git in our alpine docker image on docker-registry.wikimedia.org.

Do not use alpine as a base for your containers if you want to execute them in production. That is strictly limited to debian-based images, for which we can create a reliable upgrade pipeline. Also, we don't want a distro proliferation in production.

Wed, Jan 16, 11:48 PM · serviceops, EventBus, Analytics
Joe added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

So, we use caching in MediaWiki for a ton of different things: parser cache, revision cache, counters, rate limiting, and so on.

By default since 1.27, sessions are stored in the same object cache as anything else, but we can specifically cut out sessions to their own storage class with a configuration variable $wgSessionCacheType.

Is a reasonable path forward to:

  1. Move sessions only to the new object store @Eevans and team are working on, while the rest of the cached objects stay on the current Redis infrastructure
  2. Make a decision whether and how to move other object types to the same or different object store later?
Wed, Jan 16, 5:37 PM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops
Joe added a comment to T213934: Set up a beta feature offering the use of PHP7.

Do we want MW to tag edits etc like we did for HHVM?

Wed, Jan 16, 5:10 PM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Mainframe98 awarded T213934: Set up a beta feature offering the use of PHP7 a Love token.
Wed, Jan 16, 4:38 PM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Addshore awarded T213934: Set up a beta feature offering the use of PHP7 a Like token.
Wed, Jan 16, 4:20 PM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations
Joe triaged T213934: Set up a beta feature offering the use of PHP7 as Normal priority.
Wed, Jan 16, 3:33 PM · MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Patch-For-Review, Performance-Team, Core Platform Team, User-Joe, serviceops, Operations

Tue, Jan 15

Joe added a comment to T213561: Discovery for Kafka cluster brokers.

Might I suggest that you use a SRV dns record instead? It's more appropriate for enumerating members in a cluster. We use those for etcd discovery.

Tue, Jan 15, 3:26 PM · Patch-For-Review, Operations, Services (watching), EventBus, Analytics
Joe added a comment to T213561: Discovery for Kafka cluster brokers.

Sorry, I need some more specifics:

Tue, Jan 15, 3:11 PM · Patch-For-Review, Operations, Services (watching), EventBus, Analytics
Joe updated the diff for D1135: Invalidate PHP7's opcache when needed.
Tue, Jan 15, 12:39 PM · Release-Engineering-Team
Joe added a comment to T213371: Document and possibly fine-tune how Proton interacts with Varnish.

@Jhernandez I'm happy to explain to you whatever you might want to know about our load-balancing infrastructure, and how it interacts with proton.

Tue, Jan 15, 11:31 AM · Services (watching), serviceops, Readers-Web-Backlog, Traffic, Reading-Infrastructure-Team-Backlog, Operations, Proton
Joe added a project to T213371: Document and possibly fine-tune how Proton interacts with Varnish: serviceops.
Tue, Jan 15, 11:30 AM · Services (watching), serviceops, Readers-Web-Backlog, Traffic, Reading-Infrastructure-Team-Backlog, Operations, Proton
Joe added a comment to T213318: Wikibase Front-End Architecture.

Moving (even part of) the presentation layer outside of MediaWiki raises quite a few questions we have to make important decisions about.

Tue, Jan 15, 10:18 AM · Wikidata, TechCom-RFC

Fri, Jan 11

Joe added a comment to T210717: Find an alternative to HHVM curl connection pooling for PHP 7.

I did some *very* lame benchmarking of the response of the banner url for elasticsearch (/), with the following code:

Fri, Jan 11, 11:15 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations

Thu, Jan 10

Joe added a comment to T208524: RfC: Standards for external services that integrate with MediaWiki.

I think the current version of the RfC is reasonably well structured, to the point I think we should move the discussion here.

Thu, Jan 10, 3:30 PM · TechCom, TechCom-RFC
Joe updated the task description for T208524: RfC: Standards for external services that integrate with MediaWiki.
Thu, Jan 10, 3:29 PM · TechCom, TechCom-RFC
Joe added a comment to T199004: RFC: Add a frontend build step to skins/extensions to our deploy process.

I suspect the only feasible way forward that upholds those values is to make this dependant on the on-going cross-departmental project to deploy MediaWiki using immutable container images that contain all relevant source code within (e.g. using Docker containers, or similar). Those images would need to be securely built from a continuous integration/delivery pipeline, inspectable, testeable, and stored somewhere. See T170453 (RelEng/SRE goal) and its many sub tasks.

Thu, Jan 10, 12:34 PM · TechCom-RFC, Proposal, User-Jdlrobson
Joe added a comment to T213131: New ORES model relies on translatewiki.net API, which is not hosted on WMF production.

Another detail I kinda assumed was a given, but it's better to reiterate it:

Thu, Jan 10, 11:58 AM · Security, Scoring-platform-team, translatewiki.net, ORES
Joe added a comment to T199004: RFC: Add a frontend build step to skins/extensions to our deploy process.

Please note that the statement

Thu, Jan 10, 11:51 AM · TechCom-RFC, Proposal, User-Jdlrobson
Joe added a comment to T213131: New ORES model relies on translatewiki.net API, which is not hosted on WMF production.

Here's a possible variation on the "new ORES cluster" proposal. If @Nikerabbit and translatewiki.net wished to host their own ORES, with the understanding (an MoU, even) that WMF can advise but not provide operational support, then we might start to look at our predicament as a good opportunity to build precedent for other so-called third-party installations.

If others agree, I'll add that to the list above.

Thu, Jan 10, 11:40 AM · Security, Scoring-platform-team, translatewiki.net, ORES
Joe closed T212757: Please import php-xdebug to apt.wm.o thirdparty/php72 as Resolved.
Thu, Jan 10, 10:08 AM · Patch-For-Review, Operations
Joe closed T212757: Please import php-xdebug to apt.wm.o thirdparty/php72, a subtask of T212045: Vagrant has Xdebug no longer enabled, preventing debugging and code coverage generation, as Resolved.
Thu, Jan 10, 10:08 AM · MediaWiki-Vagrant
Joe added a comment to T212757: Please import php-xdebug to apt.wm.o thirdparty/php72.

php7.2-xdebug is now available in our repository:

Thu, Jan 10, 9:54 AM · Patch-For-Review, Operations

Wed, Jan 9

Joe added a comment to D1135: Invalidate PHP7's opcache when needed.

This change is still not ready to ship sadly, I need to fix the logic as we should:

Wed, Jan 9, 3:48 PM · Release-Engineering-Team

Tue, Jan 8

Joe added a comment to T210717: Find an alternative to HHVM curl connection pooling for PHP 7.

I think this explain why we've seen a +15ms when I broke connection pooling (T212768).

Tue, Jan 8, 8:26 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations
Joe added a comment to T210717: Find an alternative to HHVM curl connection pooling for PHP 7.

I think I have a decent idea of how to implement a basic version of what we want via nginx. I'll work on it this week hopefully.

Tue, Jan 8, 6:48 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations
Joe added a comment to T210717: Find an alternative to HHVM curl connection pooling for PHP 7.

Even more generally, it we install a reverse proxy for local TLS connection pooling on the application servers, does mediawiki even need to know directly? If we set https_proxy env var for the php application curl will pick that up and proxy all https requests through it. This would catch more than just elasticsearch requests, it would apply to anything making https requests from the application servers. In general it seems desirable that other requests would also get connection pooling?

Tue, Jan 8, 6:47 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations
Joe removed a project from T212830: Fawiki article cannot be edited: "Service Temporarily Unavailable" timeout upon saving at API execution limit (200 seconds): HHVM.
Tue, Jan 8, 6:31 AM · Wikimedia-General-or-Unknown, MediaWiki-Page-editing
Joe added a comment to T212830: Fawiki article cannot be edited: "Service Temporarily Unavailable" timeout upon saving at API execution limit (200 seconds).

FTR, I tried to save that page with php7 and failed as well. So I doubt this has to do with HHVM itself.

Tue, Jan 8, 6:31 AM · Wikimedia-General-or-Unknown, MediaWiki-Page-editing

Fri, Jan 4

Joe added a comment to T203625: mwdebug1001 and mwdebug1002 are reliably the last two hosts to finish scap-cdb-rebuild .

Yeah, that seems sensible unless there's some significant reason (e.g. hardware cost) not to.

Fri, Jan 4, 6:49 AM · Release-Engineering-Team (Watching / External), Scap, Operations

Thu, Jan 3

Joe updated the diff for D1135: Invalidate PHP7's opcache when needed.

Incorporated @thcipriani's comments

Thu, Jan 3, 7:34 AM · Release-Engineering-Team
Joe added projects to T212828: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 : Operations, serviceops, User-Joe.
Thu, Jan 3, 7:00 AM · User-Joe, serviceops, Operations
Joe created T212828: SRE FY2019 Q3 goal: Ramp-up serving traffic to PHP 7 .
Thu, Jan 3, 6:59 AM · User-Joe, serviceops, Operations
Joe added a project to T211964: Make scap and opcache work consistently together: User-Joe.
Thu, Jan 3, 6:57 AM · User-Joe, Patch-For-Review, Scap, User-ArielGlenn, Operations
Joe added a comment to D1135: Invalidate PHP7's opcache when needed.

Thanks for the comments @thcipriani, amending now. See my comments inline.

Thu, Jan 3, 5:54 AM · Release-Engineering-Team
Joe added a comment to T194724: Deprecate `base::service_unit` in puppet.

@Dzahn yup it's unused and useless as things stand, we should remove it.

Thu, Jan 3, 5:30 AM · Cloud-Services, Patch-For-Review, cloud-services-team, User-Joe, Traffic, Operations, Puppet

Wed, Jan 2

Joe added a comment to D1135: Invalidate PHP7's opcache when needed.

I frankly have no idea why these unit test issues pop up; I'm pretty sure I didn't touch anything remotely related.

Wed, Jan 2, 11:49 AM · Release-Engineering-Team
Joe requested review of D1135: Invalidate PHP7's opcache when needed.
Wed, Jan 2, 11:46 AM · Release-Engineering-Team
Joe added a comment to D1117: Only check logstash for canaries in the active datacenter.

We decided to use a different approach for this patch, containing it in puppet rather than here.

Wed, Jan 2, 11:46 AM · Release-Engineering-Team
Joe claimed T211964: Make scap and opcache work consistently together.
Wed, Jan 2, 11:25 AM · User-Joe, Patch-For-Review, Scap, User-ArielGlenn, Operations
Joe closed T211184: Correctly collect logs from php-fpm pools as Resolved.
Wed, Jan 2, 11:24 AM · Performance-Team (Radar), Patch-For-Review, User-Joe, Core Platform Team Backlog (Watching / External), User-ArielGlenn, HHVM, Operations
Joe closed T211184: Correctly collect logs from php-fpm pools, a subtask of T176370: Migrate to PHP 7 in WMF production, as Resolved.
Wed, Jan 2, 11:24 AM · Core Platform Team Kanban (Doing), Core Platform Team (PHP7 (TEC4)), Patch-For-Review, TechCom-RFC (TechCom-Approved), User-ArielGlenn, HHVM, Operations
Joe claimed T212757: Please import php-xdebug to apt.wm.o thirdparty/php72.
Wed, Jan 2, 6:26 AM · Patch-For-Review, Operations

Dec 21 2018

Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

To avoid misunderstandings: I was not questioning MediaWiki's action API being performant. By "lightweight" I was referring to "PHP has high startup time" point @daniel made above as one of the reason why no service should call MW API.

Dec 21 2018, 11:13 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.
Dec 21 2018, 10:48 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

Also, if we're going to build microservices, I'd like to not see applications that "grow", at least in terms of what they can do. A microservice should do one thing and do it well. In this case, it's using data from mediawiki to render an HTML fragment; unless you want to make it do something different, the thing that might change is what data it needs to use.

Dec 21 2018, 10:40 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

The "termbox" is more of an application than a template.
Only it knows which data it needs - actively "sending" data to it requires knowledge of which information is needed.
While seemingly trivial in the beginning this will, as the application grows, become a burden in maintenance - and potentially in performance if data data that has become obsolete is sent "just to be sure".

Dec 21 2018, 10:36 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

Let me state it again: the SSR service should not need to call the mediawiki api. It should accept all the information needed to render the termbox in the call from mediawiki.

Dec 21 2018, 10:31 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations

Dec 19 2018

Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

I agree with Joe that it would be better to have the service be internal, and be called from MW. It doesn't have to be that way, but it's preferable because:

  • we would not expose a new endpoint
  • we should in general avoid (more) services calling MediaWiki, because:
    • PHP has high startup time, and also for reasons of general hygiene of the architecture
    • we don't want MW and external services calling each other, back and forth
  • "pure functional" services that do not interact with storage are easier to reason about, and easier to run and maintain.

    However, these are general considerations. I see nothing that would totally block the architecture as proposed, if there are good reasons for doing it this way.
Dec 19 2018, 8:55 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe updated the task description for T211964: Make scap and opcache work consistently together.
Dec 19 2018, 8:40 AM · User-Joe, Patch-For-Review, Scap, User-ArielGlenn, Operations
Joe added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

Looking at live data, we have at least one shard that's doing evictions (150k of them) and all shards have 10M+ expired keys.

Dec 19 2018, 7:18 AM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops

Dec 18 2018

Joe added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

Well that discussion was limited to Session storage, and I stand by the idea that service, and its datastore, shouldn't be concerned with anything else than sessions, unless we come to a further agreement.

Dec 18 2018, 9:10 PM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops
Joe added a comment to T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..

We need persistence and replication. The plan is to use the same store as session for the rest of the object stash usage (probably Cassandra). Flags like WRITE_SYNC might be used in a few callers, and should use appropriate backend requests (e.g. QUOROM_* settings in Cassandra). The callers of the main object stash all need persistence and replication though (callers have already been migrated to stash vs WAN cache and such).

Dec 18 2018, 8:38 PM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

@mobrovac Please note that the term box is shown based on user preferences (languages spoken), the initially served DOM however needs to be the same for all users, so it can be cached. Also note that the language specific data that goes into the term box has to be loaded from the wikibase entity. So the only way to make it work as you suggested would be to always send all the terms in all languages, which, for some items, would be quite a bit of data.

Oh right, the languages. for caching we can vary the accept-language header, so that at least the most requested languages can be served from cache. But you are correct, shipping all the data may become unwieldy. I would still prefer the client making the extra request, though, since it still keeps the architecture saner and faster than doing it prematurely internally.

Dec 18 2018, 8:11 PM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

Also: it is stated in https://wikitech.wikimedia.org/wiki/WMDE/Wikidata/SSR_Service that "In case of no configured server-side rendering service or a malfunctioning of it, the client-side code will act as a fallback". This is a bit the other way around with respect to what is usually done with workers, but I see two advantages from our point of view:

Dec 18 2018, 11:15 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe added a comment to T212189: New Service Request: Wikidata Termbox SSR.

Looking at the attached diagrams, it seems that the flow of a request is as follows:

Dec 18 2018, 11:11 AM · Core Platform Team Backlog (Later), User-Addshore, serviceops, Services (next), Wikidata-Termbox-Hike, Wikidata, Service-deployment-requests, Operations
Joe renamed T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed. from Use a multi-dc aware store for `wgMainStash` if needed. to Use a multi-dc aware store for ObjectCache's MainStash if needed..
Dec 18 2018, 8:32 AM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops
Joe added a comment to T212147: Allow scap sync to deploy gradually .

From our meeting yesterday:

Dec 18 2018, 7:36 AM · User-jijiki, Release-Engineering-Team (Watching / External), serviceops, Scap

Dec 17 2018

Joe created T212129: Use a multi-dc aware store for ObjectCache's MainStash if needed..
Dec 17 2018, 2:55 PM · Core Platform Team Kanban (Doing), Core Platform Team (Security, stability, performance and scalability (TEC1)), Performance-Team (Radar), Operations, MediaWiki-Cache, serviceops
Joe added a comment to T210717: Find an alternative to HHVM curl connection pooling for PHP 7.

I see another problem here:

Dec 17 2018, 9:18 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations
Joe added a project to T205059: Excimer: new profiler for PHP: serviceops.
Dec 17 2018, 9:04 AM · serviceops, Core Platform Team (PHP7 (TEC4)), Excimer, Core Platform Team Kanban (Doing), Performance-Team (Radar), PHP 7.1 support
Joe added a comment to T205059: Excimer: new profiler for PHP.

We're going to rebuild the extension package today from the latest code version and install it across the fleet.

Dec 17 2018, 9:04 AM · serviceops, Core Platform Team (PHP7 (TEC4)), Excimer, Core Platform Team Kanban (Doing), Performance-Team (Radar), PHP 7.1 support
Joe created T212102: Add `supervised` option to redis configuration .
Dec 17 2018, 8:06 AM · User-jijiki, Operations, serviceops
Joe added a project to T210717: Find an alternative to HHVM curl connection pooling for PHP 7: serviceops.
Dec 17 2018, 7:58 AM · Patch-For-Review, serviceops, Discovery-Search, CirrusSearch, Operations
Joe added a comment to T211721: Establish an SLA for session storage.

While I guess we should create a new ticket to talk about MainObjectStash is used and where to migrate it.

Dec 17 2018, 7:35 AM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans
Joe added a project to T206015: Plan/design a session storage service: serviceops.
Dec 17 2018, 7:28 AM · serviceops, User-Clarakosi, Core Platform Team Kanban (Doing), Core Platform Team (Session Management Service (CDP2)), User-Eevans
Joe added projects to T210567: Create a way to intentionally trigger fatal errors in MediaWiki: User-Joe, serviceops.
Dec 17 2018, 7:27 AM · serviceops, User-Joe, Core Platform Team Kanban (Done with CPT), Patch-For-Review, Core Platform Team (PHP7 (TEC4)), PHP 7.2 support
Joe added a project to T210411: Applayer services without TLS: serviceops.
Dec 17 2018, 7:26 AM · serviceops, Operations, Traffic
Joe added a project to T211668: mw1272 crashed: Bad page map in process hhvm: serviceops.
Dec 17 2018, 7:25 AM · serviceops, ops-eqiad, Operations, HHVM
Joe added a project to T125976: Run mediawiki::maintenance scripts in Beta Cluster: serviceops.
Dec 17 2018, 7:24 AM · serviceops, Patch-For-Review, Wikidata, wikidata-tech-focus, User-Joe, Operations, User-Addshore, Beta-Cluster-Infrastructure
Joe added a comment to T211580: blubber template for nodejs should allow defining configuration files to copy to the container.

I think adding an ENV variable to service-runner is probably the way to go.

Dec 17 2018, 7:14 AM · Release Pipeline (Blubber), Operations

Dec 14 2018

Joe triaged T211964: Make scap and opcache work consistently together as Normal priority.
Dec 14 2018, 9:52 AM · User-Joe, Patch-For-Review, Scap, User-ArielGlenn, Operations
Joe added a comment to T211906: Expose PHP7/HHVM to NavTiming in a header, send with navtiming beacon so we can use it as a dimension.

The application server layer https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/mediawiki/templates/apache/mediawiki-vhost.conf.erb#14, and the caching layer https://gerrit.wikimedia.org/r/c/operations/puppet/+/478680/6/modules/varnish/templates/text-frontend.inc.vcl.erb are set to recognize the cookie PHP_ENGINE and send traffic to php-fpm if the value is php7.

Dec 14 2018, 9:07 AM · MediaWiki-extensions-NavigationTiming, Performance-Team
Joe added a comment to T211721: Establish an SLA for session storage.

To add to what @Tgr found, we have to search for usage of MediaWikiServices::getInstance()->getMainObjectStash(); as that's what that method uses under the hood.

Dec 14 2018, 7:20 AM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans
Joe added a comment to T211721: Establish an SLA for session storage.

Sessions are currently stored in Redis, a highly-optimized in-memory store with request latency reportedly on the order of ~1 ms.

Aren't we using Redis through nutcracker? That can't be 1ms, right?

Dec 14 2018, 7:12 AM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans

Dec 13 2018

Joe updated subscribers of T211721: Establish an SLA for session storage.

I was asking because looking at what's currently stored in the "service", I see both mwsession objects (that are created I guess by the user session), and objects that have the form $wiki:echo:(alert|seen|message) which seem to be created by... Flow?

There is a huge number of such objects, which is a problem in itself - we have 2M echo objects compared to just 16k session objects.

Is this part of redis_sessions? IOW, is it included in the rates show here: https://grafana.wikimedia.org/d/000000174/redis?orgId=1&panelId=8&fullscreen ?

Dec 13 2018, 6:03 AM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans

Dec 12 2018

Joe updated subscribers of T211721: Establish an SLA for session storage.

I was asking because looking at what's currently stored in the "service", I see both mwsession objects (that are created I guess by the user session), and objects that have the form $wiki:echo:(alert|seen|message) which seem to be created by... Flow?

Dec 12 2018, 9:30 PM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans
Joe added a comment to T211721: Establish an SLA for session storage.

I don't think we need to overthink this, but knowing what kind of latency increase we can expect might drive us to choices of implementation technologies different from the ones we currently picked.

Dec 12 2018, 1:48 PM · Core Platform Team Backlog (Later), Performance-Team (Radar), TechCom, Services (next), Operations, User-Clarakosi, Core Platform Team (Session Management Service (CDP2)), User-Eevans

Dec 11 2018

Joe added a comment to T210528: PHP/HHVM serialization incompatibility in some situations when using Serializable.

Is this still unresolved? If so, it should be marked as a blocker to the php7 transition. @Anomie do you think there is anything else left to do?

Dec 11 2018, 8:29 AM · Core Platform Team Kanban (Done with CPT), MW-1.33-notes (1.33.0-wmf.9; 2018-12-18), Core Platform Team (PHP7 (TEC4))
Joe added a comment to T207292: Review prometheus_nodes params.

I don't think this is the pattern widely adopted in our codebase.

Dec 11 2018, 8:26 AM · User-fgiunchedi, monitoring, Operations
Joe added a comment to T211547: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler).

Since we are arguably due for another puppet upgrade, and puppet 5 will be in buster, should we remove puppet_major_version and puppetdb_major_version completely? These would be useful for future updates, and in cases where labs versions may be lagging behind prod. What if we adjusted these to default to the desired version (and remove puppet 3 cruft) but kept the options available?

Dec 11 2018, 7:44 AM · puppet-compiler, Puppet, Operations
Joe added a comment to T211580: blubber template for nodejs should allow defining configuration files to copy to the container.

FYI, we can also have an alternative mount point for the config file, and can tell service-runner where to look for it by setting the APP_BASE_PATH environment variable. This approach would make building images more flexible, probably.

Dec 11 2018, 6:44 AM · Release Pipeline (Blubber), Operations

Dec 10 2018

Joe added a comment to T206152: Set up request profiling for PHP 7.

I did some benchmarks , using the same setup I used for T206341, with tideways enabled and disabled. I could not notice any clear trend besides a slight variance due probably to external factors.

Dec 10 2018, 1:38 PM · Performance-Team, MediaWiki-Debug-Logger, Operations
Joe created T211580: blubber template for nodejs should allow defining configuration files to copy to the container.
Dec 10 2018, 1:28 PM · Release Pipeline (Blubber), Operations
Joe created T211547: Cleanup the puppetmaster module so that we stop breaking expectations (and the puppet compiler).
Dec 10 2018, 7:52 AM · puppet-compiler, Puppet, Operations

Dec 7 2018

Joe added a comment to T206152: Set up request profiling for PHP 7.

Please install tideways, but it should only be enabled in php.ini on the debug servers, since it will cause a performance degradation even without being used. Also, please install php-mongodb, the PHP driver for MongoDB, since this is recommended for XHGui saving on PHP 7. I am working on the mediawiki-config patch which will use these extensions.

Dec 7 2018, 7:15 AM · Performance-Team, MediaWiki-Debug-Logger, Operations

Dec 6 2018

Joe added a project to T211184: Correctly collect logs from php-fpm pools: User-Joe.
Dec 6 2018, 12:31 PM · Performance-Team (Radar), Patch-For-Review, User-Joe, Core Platform Team Backlog (Watching / External), User-ArielGlenn, HHVM, Operations
Joe closed T206341: Evaluate scalability and performance of PHP7 compared to HHVM as Resolved.
Dec 6 2018, 12:28 PM · Patch-For-Review, Performance-Team (Radar), Operations
Joe closed T206341: Evaluate scalability and performance of PHP7 compared to HHVM, a subtask of T206336: SRE quarterly goal: Ability to serve a fraction of the production traffic from PHP7, as Resolved.
Dec 6 2018, 12:28 PM · Operations
Joe added a comment to T211250: Create a mediawiki::cronjob define.

I would add another requirement:

Dec 6 2018, 11:36 AM · Patch-For-Review, serviceops, User-jijiki, Operations
Joe removed a project from T211184: Correctly collect logs from php-fpm pools: TechCom-RFC (TechCom-Approved).
Dec 6 2018, 6:33 AM · Performance-Team (Radar), Patch-For-Review, User-Joe, Core Platform Team Backlog (Watching / External), User-ArielGlenn, HHVM, Operations