Page MenuHomePhabricator

thcipriani (Tyler Cipriani)
¯\_(ツ)_/¯Administrator

Projects (16)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Feb 9 2015, 10:04 PM (223 w, 2 d)
Roles
Administrator
Availability
Available
IRC Nick
thcipriani
LDAP User
Unknown
MediaWiki User
TCipriani (WMF) [ Global Accounts ]

Recent Activity

Today

thcipriani added a comment to T224239: All outgoing HTTP requests are throwing 504s.

hrm. I can recreate the test failures in a container on the CI infra.

Thu, May 23, 8:20 PM · Patch-For-Review, Page Content Service, Mobile-Content-Service, Reading-Infrastructure-Team-Backlog, Services
mmodell awarded T221026: Gerrit thread use GC thrashing a Orange Medal token.
Thu, May 23, 3:56 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani closed T221026: Gerrit thread use GC thrashing as Resolved.

Regarding the SendEmail thread, it took a while to remember about it but T131189 was about SendEmail having stuck tcp connections eventually blocking the task and thus the thread pool. Alexandros used gdb to close the sockets "manually" :D The task is worth reading.

Thu, May 23, 3:19 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Yesterday

thcipriani placed T223978: 1.34.0-wmf.3 generating lots of temporary tables on MySQL slaves up for grabs.

hrm, well I can say nothing changed with that specific function:

Wed, May 22, 12:42 PM · Patch-For-Review, Release-Engineering-Team (Watching / External), Performance-Team, User-Marostegui, MW-1.34-release, MediaWiki-Database, Regression

Tue, May 21

thcipriani triaged T224041: Kask integration testing with Cassandra via the Deployment Pipeline as Normal priority.

It seems that the cassandra subchart already exists for cask (via https://gerrit.wikimedia.org/r/#/c/operations/deployment-charts/+/509102/ ); however, to use that in the pipeline we would have to override some values at deployment time.

Tue, May 21, 4:49 PM · Core Platform Team Backlog (Next), Core Platform Team (Session Management Service (CDP2)), Services (next), User-Eevans, Release Pipeline, Operations, serviceops, Release-Engineering-Team
thcipriani created T224041: Kask integration testing with Cassandra via the Deployment Pipeline.
Tue, May 21, 4:48 PM · Core Platform Team Backlog (Next), Core Platform Team (Session Management Service (CDP2)), Services (next), User-Eevans, Release Pipeline, Operations, serviceops, Release-Engineering-Team
thcipriani triaged T224035: Create service-pipeline job aware of .pipeline/config.yaml as Normal priority.
Tue, May 21, 4:03 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Release Pipeline
thcipriani created T224035: Create service-pipeline job aware of .pipeline/config.yaml.
Tue, May 21, 4:03 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Release Pipeline
thcipriani committed rGERRITDEPLOY2de900145492: Gerrit v2.15.13 (authored by thcipriani).
Gerrit v2.15.13
Tue, May 21, 3:28 PM

Mon, May 20

thcipriani committed rGERRITDEPLOY64ba16dd9f82: Merge "Merge tag 'v2.15.13' into wmf/stable-2.15" into wmf/stable-2.15 (authored by thcipriani).
Merge "Merge tag 'v2.15.13' into wmf/stable-2.15" into wmf/stable-2.15
Mon, May 20, 9:48 PM

Sat, May 18

thcipriani closed Restricted Task, a subtask of T218750: Re-enable use of Gerrit HTTP token to push patchsets, as Invalid.
Sat, May 18, 1:50 PM · VPS-project-libraryupgrader, Release-Engineering-Team, Gerrit

Wed, May 15

thcipriani added a comment to T222539: Scap deployments are not purging MessageBlobStore (was: Stale localized messages).

Change 508488 merged by jenkins-bot:
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

Merged in scap master, will have to create a new version of the scap deb and upload it to get it to production. I can do that next week if no one beats me to it.

Wed, May 15, 5:56 PM · Performance-Team, Patch-For-Review, Release-Engineering-Team, Scap, Regression, MediaWiki-ResourceLoader

Tue, May 14

thcipriani moved T221709: scap service restarts for WDQS are inconsistent from Needs triage to External/Watching on the Scap board.
Tue, May 14, 4:58 PM · Wikidata, Scap, Wikidata-Query-Service
thcipriani moved T222372: scap: look at removing scap/sh.py from Needs triage to Debt on the Scap board.
Tue, May 14, 4:57 PM · Release-Engineering-Team (Backlog), Scap
thcipriani moved T223287: Investigate scap-cdb-rebuild idling until pressing ENTER repeatedly from Needs triage to Debt on the Scap board.
Tue, May 14, 4:57 PM · Scap

Mon, May 13

thcipriani added a comment to T181833: Figure out why HHVM kept running stale code for hours.

I think I have a reasonable explanation for this over in: T217830#5009234

Mon, May 13, 5:44 PM · Performance-Team (Radar), Release-Engineering-Team (Backlog), Wikimedia-Incident, Deployments, HHVM
thcipriani added a comment to T222015: Add abi to l10n-watchers group in Gerrit .

I am not sure I am allowed to do that per the new privilege policy.

Mon, May 13, 4:24 PM · Gerrit-Privilege-Requests

Thu, May 9

thcipriani moved T222921: extensions/CirrusSearch/includes/Sanity/Checker.php:369 Cannot fetch ids from index from Untriaged to Found during 1.34-wmf.4 on the Wikimedia-production-error board.
Thu, May 9, 8:25 PM · CirrusSearch, Discovery-Search, Wikimedia-production-error
thcipriani created T222921: extensions/CirrusSearch/includes/Sanity/Checker.php:369 Cannot fetch ids from index.
Thu, May 9, 8:24 PM · CirrusSearch, Discovery-Search, Wikimedia-production-error
thcipriani added a comment to T222539: Scap deployments are not purging MessageBlobStore (was: Stale localized messages).

Change 508488 merged by jenkins-bot:
[mediawiki/tools/scap@master] Clear MessageBlobStore after syncing i18n data

https://gerrit.wikimedia.org/r/508488

Thu, May 9, 12:27 AM · Performance-Team, Patch-For-Review, Release-Engineering-Team, Scap, Regression, MediaWiki-ResourceLoader

Wed, May 8

thcipriani closed T222792: Gerrit: Cannot assign user name "msyn" to account 7123; name already in use. as Resolved.

@MisterSynergy sorry I missed your username on T220867 :(

Wed, May 8, 7:05 PM · LDAP, Gerrit
thcipriani added a comment to T141324: Look into shoving gerrit logs into logstash.

Deployed the change above and restarted Gerrit. The new file /var/log/gerrit/gerrit.json has been created now and logs are written to it. Works! Thanks for the change, Paladox.

Wed, May 8, 6:31 PM · observability, Patch-For-Review, Release-Engineering-Team (Backlog), Technical-Debt, Wikimedia-Logstash, Gerrit
thcipriani assigned T222829: merge branch.py and make-wmf-branch to mmodell.

@mmodell and I talked about this a bit during our pairing, assigning to him to work on.

Wed, May 8, 5:48 PM · MediaWiki-Release-Tools, Release-Engineering-Team (Kanban)
thcipriani created T222829: merge branch.py and make-wmf-branch.
Wed, May 8, 5:47 PM · MediaWiki-Release-Tools, Release-Engineering-Team (Kanban)
thcipriani closed T196516: Automate updating deployment notes, a subtask of T196515: Automate the Train, as Resolved.
Wed, May 8, 5:43 PM · Epic, Release-Engineering-Team, Goal, Scap
thcipriani closed T196516: Automate updating deployment notes as Resolved.

As of 1.34.0-wmf.4 this is working! https://www.mediawiki.org/wiki/MediaWiki_1.34/wmf.4/Changelog was generated by jenkins with no manual steps other than cutting the branch.

Wed, May 8, 5:43 PM · Release-Engineering-Team (Kanban), Scap
thcipriani assigned T222820: Experiment with hosted kubernetes solutions for Beta to dduvall.

assigning to @dduvall based on hangout discussion

Wed, May 8, 4:52 PM · Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure, Release Pipeline
thcipriani added a subtask for T220235: Migrate Beta cluster services to use Kubernetes : T222820: Experiment with hosted kubernetes solutions for Beta.
Wed, May 8, 4:51 PM · Patch-For-Review, Editing-team, Core Platform Team Backlog (Next), Services (next), Kubernetes, Release Pipeline, serviceops, Beta-Cluster-Infrastructure
thcipriani added a parent task for T222820: Experiment with hosted kubernetes solutions for Beta: T220235: Migrate Beta cluster services to use Kubernetes .
Wed, May 8, 4:51 PM · Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure, Release Pipeline
thcipriani created T222820: Experiment with hosted kubernetes solutions for Beta.
Wed, May 8, 4:50 PM · Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure, Release Pipeline

Tue, May 7

thcipriani added a comment to T222767: integration/docroot error: unable to unlink old 'org/wikimedia/doc/default.html': Permission denied.

I did update the file by manually replacing it with the version in master.

Tue, May 7, 10:42 PM · Continuous-Integration-Infrastructure
thcipriani created T222767: integration/docroot error: unable to unlink old 'org/wikimedia/doc/default.html': Permission denied.
Tue, May 7, 10:41 PM · Continuous-Integration-Infrastructure
thcipriani closed T221440: Gerrit: cannot assign username "aldnonymous" to account XXX; name already in use as Resolved.

Updated your account in the DB, please reopen if that does not resolve your issue.

Tue, May 7, 4:55 PM · Gerrit
thcipriani closed T220867: Gerrit: Cannot assign user name "vladi2016" to account XXXX; name already in use. as Resolved.

Updated your account in the DB, please reopen if that does not resolve your issue.

Tue, May 7, 4:55 PM · Release-Engineering-Team (Kanban), LDAP, Gerrit
thcipriani closed T222186: Gerrit login failure for user tk-999 as Resolved.

Updated your account in the DB, please reopen if that does not resolve your issue.

Tue, May 7, 4:55 PM · Gerrit, LDAP
thcipriani added a comment to T222621: Not possible to edit items via wbeditentity if they have same label and description.

@thcipriani yes it has been reverted already

Tue, May 7, 3:25 PM · MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, User-Ladsgroup, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata
thcipriani added a comment to T222621: Not possible to edit items via wbeditentity if they have same label and description.

I notice that this is still blocking the train, but no one has addressed this yet.

Tue, May 7, 1:50 PM · MW-1.34-notes (1.34.0-wmf.5; 2019-05-14), Patch-For-Review, User-Ladsgroup, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata

Mon, May 6

thcipriani added a comment to T141324: Look into shoving gerrit logs into logstash.
  • structured logging from log4j can be exposed in a number of ways. The easier is probably to continue logging to file, use syslog for transport, but have messages in a structured format. A json layout could be used for that (here is one from Jetbrains, which I haven't tested, but I tend to trust Jetbrains). This would allow to output all logging context and not just a few basic information.

This sounds like a solid option taking into account the ongoing migration towards shipping to logstash via the local rsyslogd, which produces to the Kafka logging pipeline. Can this output json formatted messages to syslog prefixed by an @cee cookie?

Mon, May 6, 7:34 PM · observability, Patch-For-Review, Release-Engineering-Team (Backlog), Technical-Debt, Wikimedia-Logstash, Gerrit

Fri, May 3

thcipriani renamed T222472: Investigate gerrit session expiration from Investigate missing gerrit sessions to Investigate gerrit session expiration.
Fri, May 3, 7:05 PM · Release-Engineering-Team, Gerrit
thcipriani triaged T222472: Investigate gerrit session expiration as Normal priority.
Fri, May 3, 7:01 PM · Release-Engineering-Team, Gerrit
thcipriani created T222472: Investigate gerrit session expiration.
Fri, May 3, 7:01 PM · Release-Engineering-Team, Gerrit

Thu, May 2

thcipriani closed T220728: 1.34.0-wmf.3 deployment blockers as Resolved.
Thu, May 2, 10:14 PM · Release-Engineering-Team (Kanban), Release, Train Deployments
thcipriani added a project to T222229: [Regression wmf.3] Cannot save edit after switching to the source editor from mobile VE if no other edits are made on source editor mode : MobileFrontend.

Someone whose local environment is running node 6.11.0 will need to regenerate this, as I can't.

Thu, May 2, 8:30 PM · Verified, User-Ryasmeen, MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), MobileFrontend, VisualEditor-MediaWiki-Mobile, VisualEditor (Current work)
thcipriani edited parent tasks for T222347: wbsearchentities now returns an error with type=lexeme, added: T220729: 1.34.0-wmf.4 deployment blockers; removed: T220728: 1.34.0-wmf.3 deployment blockers.
Thu, May 2, 7:52 PM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), Discovery-Search (Current work), Patch-For-Review, CirrusSearch, Wikidata
thcipriani removed a subtask for T220728: 1.34.0-wmf.3 deployment blockers: T222347: wbsearchentities now returns an error with type=lexeme.
Thu, May 2, 7:51 PM · Release-Engineering-Team (Kanban), Release, Train Deployments
thcipriani added a subtask for T220729: 1.34.0-wmf.4 deployment blockers: T222347: wbsearchentities now returns an error with type=lexeme.
Thu, May 2, 7:51 PM · Release-Engineering-Team (Kanban), Release, Train Deployments
thcipriani created T222391: Gerrit Hardware Upgrade.
Thu, May 2, 6:57 PM · Release-Engineering-Team (Watching / External), serviceops, ops-eqiad, Operations, Gerrit
thcipriani created E1019: thcipriani afk.
Thu, May 2, 4:46 PM · Release-Engineering-Team, events
thcipriani created E1018: thcipriani afk.
Thu, May 2, 4:46 PM · Release-Engineering-Team, events
thcipriani added a comment to T222324: Unable to perform revision deletion on Commons.

@thcipriani - these are the two security patches that were deployed on Tuesday: T222036#5142596, T222038#5142604 (though not the -formatter patch.) These should only affect granular view permissions for certain revdel logs.

Thu, May 2, 4:21 PM · MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), Performance, MediaWiki-Revision-deletion, Security
thcipriani added a comment to T222347: wbsearchentities now returns an error with type=lexeme.

Looking at logstash, I'm seeing this happen on mwdebug, so maybe folks are already looking into it. Anyway, here's a stacktrace if it's helpful

Thu, May 2, 4:02 PM · Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), MW-1.34-notes (1.34.0-wmf.3; 2019-04-30), Discovery-Search (Current work), Patch-For-Review, CirrusSearch, Wikidata
thcipriani added a comment to T222324: Unable to perform revision deletion on Commons.

Note security did some minor adjustments to the revdel process on tuesday but nothing that should cause this.

Thu, May 2, 3:45 PM · MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), Performance, MediaWiki-Revision-deletion, Security
thcipriani triaged T222329: Special:Search generates "TypeError: this.pushPending is not a function" as Unbreak Now! priority.

Tentatively setting T220728 as parent task. Please feel free to correct me.

Thu, May 2, 3:32 PM · MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), Discovery-Search, MediaWiki-Search, Regression, OOUI
thcipriani awarded T204762: On deployment-prep scap cache_git_info takes 12 minutes (that is too slow) a Yellow Medal token.
Thu, May 2, 3:19 PM · Patch-For-Review, Release-Engineering-Team (Kanban), Scap, Beta-Cluster-Infrastructure

Wed, May 1

thcipriani created T222307: Received cirrusSearchElasticaWrite job for an unwritable cluster cloudelastic..
Wed, May 1, 8:19 PM · Discovery-Search (Current work), CirrusSearch, Wikimedia-production-error
thcipriani added a comment to T222195: CAS update failed on user_touched. The version of the user to be saved is older than the current version..

@thcipriani the patch above should fix it; if you create a separate task please comment here and I'll update the patch.

Wed, May 1, 7:47 PM · GrowthExperiments, Patch-For-Review, Growth-Team (Current Sprint), Wikimedia-production-error
thcipriani triaged T222300: UniversalLanguageSelector: CAS update failed on user_touched. The version of the user to be saved is older than the current version. as Normal priority.
Wed, May 1, 7:45 PM · Language-Team (Language-2019-April-June), MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), UniversalLanguageSelector, Wikimedia-production-error
thcipriani created T222300: UniversalLanguageSelector: CAS update failed on user_touched. The version of the user to be saved is older than the current version..
Wed, May 1, 7:45 PM · Language-Team (Language-2019-April-June), MW-1.34-notes (1.34.0-wmf.4; 2019-05-07), UniversalLanguageSelector, Wikimedia-production-error
thcipriani added a comment to T222195: CAS update failed on user_touched. The version of the user to be saved is older than the current version..

Seems as though this is still happening in 1.34.0-wmf.3 -- stacktrace is a little different though, so maybe a different path to the same problem? This seems to be coming from api.php

Wed, May 1, 7:33 PM · GrowthExperiments, Patch-For-Review, Growth-Team (Current Sprint), Wikimedia-production-error
thcipriani created P8466 (An Untitled Masterwork).
Wed, May 1, 3:15 PM

Tue, Apr 30

thcipriani added a comment to T187153: Special:Abuselog throws when viewing details or examining (BadMethodCallException: Call get getId() on null).

This is currently the largest producers of errors in logstash. It doesn't sound like this is actually anything to be concerned about, but fixing logging would be nice since it looks like 3 separate train deploys have been worried enough to comment on this task.

Tue, Apr 30, 6:56 PM · MW-1.34-notes (1.34.0-wmf.6; 2019-05-21), User-zeljkofilipin, MW-1.33-notes (1.33.0-wmf.12; 2019-01-08), Patch-For-Review, User-Daimona, Regression, Multi-Content-Revisions, User-Addshore, Wikimedia-production-error, Chinese-Sites, AbuseFilter
thcipriani reopened T221437: Archive "JADE" extension repository, a subtask of T211046: Rename "JADE" extension to "Jade", as Open.
Tue, Apr 30, 6:34 PM · Scoring-platform-team (Current), MW-1.33-notes (1.33.0-wmf.14; 2019-01-22), Jade
thcipriani reopened T221437: Archive "JADE" extension repository as "Open".

This is not done until it has been removed from make-wmf-branch. I don't know the status of JADE so I don't know what the consequences are of renaming it in terms of localization updates for new branch deployments. I, unfortunately, discovered this in the middle of cutting a branch and consequently don't have the cycles to look more deeply at this space. Release-Engineering-Team can look at this this coming week.

Tue, Apr 30, 6:34 PM · Scoring-platform-team, Cleanup, Jade
thcipriani assigned T222199: Post generated docs for pipelinelib to dduvall.
Tue, Apr 30, 4:20 PM · Patch-For-Review, Continuous-Integration-Config, Release-Engineering-Team (Kanban), Release Pipeline
thcipriani created T222199: Post generated docs for pipelinelib.
Tue, Apr 30, 4:20 PM · Patch-For-Review, Continuous-Integration-Config, Release-Engineering-Team (Kanban), Release Pipeline
thcipriani renamed T210267: Execution of the deployment pipeline should be configurable via .pipeline/config.yaml from The continuous release pipeline should support more than one service per repo to Execution of the deployment pipeline should be configurable via .pipeline/config.yaml.
Tue, Apr 30, 4:16 PM · Release Pipeline, Release-Engineering-Team (Backlog), Operations, ORES, Scoring-platform-team
thcipriani removed a parent task for T210268: Build blubber file for ORES: T212801: TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production CD Pipeline.
Tue, Apr 30, 4:14 PM · Release Pipeline (Blubber), Operations, ORES, Scoring-platform-team
thcipriani removed a subtask for T212801: TEC3:O3:O3.1:Q3 Goal - Move cxserver, citoid, changeprop, eventgate (new service) and ORES (partially) through the production CD Pipeline: T210268: Build blubber file for ORES.
Tue, Apr 30, 4:14 PM · Core Platform Team Backlog (Watching / External), Services (watching), Release Pipeline, serviceops, Release-Engineering-Team
thcipriani added a parent task for T182331: [Epic] Deploy ORES in kubernetes cluster: T220398: TEC3:O3:O3.1:Q4 Goal - Move cpjobqueue, Wikidata Termbox SSR (new service), Kask (session storage service) and ORES (partially) through the production CD Pipeline.
Tue, Apr 30, 4:12 PM · Operations, ORES, Scoring-platform-team
thcipriani added a subtask for T220398: TEC3:O3:O3.1:Q4 Goal - Move cpjobqueue, Wikidata Termbox SSR (new service), Kask (session storage service) and ORES (partially) through the production CD Pipeline: T182331: [Epic] Deploy ORES in kubernetes cluster.
Tue, Apr 30, 4:12 PM · Core Platform Team Backlog (Watching / External), Services (watching), Release Pipeline, Operations, serviceops, Release-Engineering-Team
thcipriani merged T220400: Migrate ORES to kubernetes into T182331: [Epic] Deploy ORES in kubernetes cluster.
Tue, Apr 30, 4:12 PM · Operations, ORES, Scoring-platform-team
thcipriani merged task T220400: Migrate ORES to kubernetes into T182331: [Epic] Deploy ORES in kubernetes cluster.
Tue, Apr 30, 4:11 PM · Release Pipeline, Operations, serviceops, Release-Engineering-Team
thcipriani added a subtask for T220403: TEC3:Q4 Tracking task: T205923: TEC3:O1:O1.2 Goal – Formalize the collection of CI infrastructure and tooling metrics.
Tue, Apr 30, 4:08 PM · Operations, serviceops
thcipriani added a parent task for T205923: TEC3:O1:O1.2 Goal – Formalize the collection of CI infrastructure and tooling metrics: T220403: TEC3:Q4 Tracking task.
Tue, Apr 30, 4:08 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure

Mon, Apr 29

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

I have noticed 2 problems:

Mon, Apr 29, 9:30 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

It seems like we've tuned a good number of gerrit parameters at this point and we're still experiencing GC thrashing (although less than previous) which means we ought to start looking at JVM GC tuning.

Mon, Apr 29, 1:56 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Wed, Apr 24

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

I managed to capture a threaddump from the moment tasks started piling up: https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDQvMTcvLS1qc3RhY2stMTktMDQtMTctMjAtNTgtMDIuZHVtcC0tMjEtNDctNA==

I'll post this upstream to see if it reveals any insights.

[0]. https://groups.google.com/d/msg/repo-discuss/pBMh09-XJsw/vuhDiuTWAAAJ

Wed, Apr 24, 11:58 AM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 23 2019

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

changeid_projects cache is, today, looking like it's in poor shape:

Apr 23 2019, 5:31 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 19 2019

thcipriani added a comment to T218750: Re-enable use of Gerrit HTTP token to push patchsets.

Before we move forward and enable this, let's make sure we have understood the security repercussions and have mitigated them (and if we find it impossible to do so, avoid it).

Apr 19 2019, 7:51 PM · VPS-project-libraryupgrader, Release-Engineering-Team, Gerrit
thcipriani closed T221428: Scap should only sync built CDB files to production appserver hosts, not the build files as well as Invalid.

We don't actually sync the cdb files https://gerrit.wikimedia.org/r/plugins/gitiles/mediawiki/tools/scap/+/master/scap/tasks.py#58

Apr 19 2019, 12:25 PM · Scap

Apr 18 2019

thcipriani committed rGERRITDEPLOYe3c340f9ea5b: Add barricade v1.0 (authored by thcipriani).
Add barricade v1.0
Apr 18 2019, 7:58 PM
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

Other thing to note in gerrit show-caches output:

Apr 18 2019, 3:31 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 17 2019

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

I upgraded Gerrit to 2.15.12 in preparation for plugins still in development. I'd like to not change too many things at once, but I am a bit stuck

Apr 17 2019, 10:23 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani committed rGERRITDEPLOY4dcb85164566: Gerrit 2.15.12 (authored by thcipriani).
Gerrit 2.15.12
Apr 17 2019, 8:43 PM
thcipriani committed rGERRITDEPLOY606a5d50c33b: Bump lfs for v2.15.12 (authored by thcipriani).
Bump lfs for v2.15.12
Apr 17 2019, 8:01 PM
thcipriani committed rGERRITDEPLOY17d23167891b: Merge tag 'v2.15.12' into wmf/stable-2.15 (authored by thcipriani).
Merge tag 'v2.15.12' into wmf/stable-2.15
Apr 17 2019, 8:01 PM
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.
RequestsIPDNS PTR
699212620:0:861:102:10:64:16:8phab1001.eqiad.wmnet.
Apr 17 2019, 7:43 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani committed rGBLBR4c44e3d0d25d: Edit Project Config (authored by thcipriani).
Edit Project Config
Apr 17 2019, 6:06 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rGERRITDEPLOY3b002f494aa4: Modify access rules (authored by thcipriani).
Modify access rules
Apr 17 2019, 5:19 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rGERRITDEPLOY57be7f446a1a: Modify access rules (authored by thcipriani).
Modify access rules
Apr 17 2019, 5:19 PM
thcipriani updated subscribers of T221026: Gerrit thread use GC thrashing.

Here's the GCEasy report from around the time gerrit started thrashing today:
https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTkvMDQvMTYvLS1qdm1fZ2MuZ2Vycml0LmxvZy4xLS0yMC01NS0zMQ==&channel=WEB

A threaddump from right before and right after the first GC at 10UTC:
https://fastthread.io/my-thread-report.jsp?p=c2hhcmVkLzIwMTkvMDQvMTYvLS1qc3RhY2stMTktMDQtMTYtMDktNTAtMDMuZHVtcC0tMjItMjUtMjE7Oy0tanN0YWNrLTE5LTA0LTE2LTEwLTAwLTAyLmR1bXAtLTIyLTI1LTIx

The notable change I see in the threaddumps is there some blocking on reading a packfile in the second dump. That lock clears in subsequent dumps. That kind of BLOCKING doesn't seem uncommon -- 13% of the threaddump files I have (currently about 550 files from 5 or so days).

Apr 17 2019, 1:19 AM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

From looking at http requests per minute in javamelody, over 1 year, I see that traffic has increased a lot:

	https://gerrit.wikimedia.org/r/monitoring?part=graph&graph=httpHitsRate

Each simultaneous request allocates a significant chunk of ram. I think we need to increase the heap size a bit and possibly reduce the size of the thread pool to limit memory use.

Apr 17 2019, 1:13 AM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 16 2019

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

Here's the GCEasy report from around the time gerrit started thrashing today:
https://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTkvMDQvMTYvLS1qdm1fZ2MuZ2Vycml0LmxvZy4xLS0yMC01NS0zMQ==&channel=WEB

Apr 16 2019, 11:26 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

Once again, around 10UTC JVM GC started kicking in and tread use is up and has not dropped back down:

Apr 16 2019, 3:43 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani reopened T218515: Upgrade Gerrit to 2.15.12 as "Open".

Reopening following rollback

Apr 16 2019, 1:29 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Gerrit
thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

From the "Scaling Gerrit" link above:

Apr 16 2019, 3:32 AM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 15 2019

thcipriani added a comment to T221026: Gerrit thread use GC thrashing.

Just to clarify, when you say GC, do you mean Java virtual machine garbage collection or git repository object garbage collection?

Apr 15 2019, 6:25 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit
thcipriani created T221026: Gerrit thread use GC thrashing.
Apr 15 2019, 6:20 PM · VPS-project-codesearch, Patch-For-Review, Release-Engineering-Team, Gerrit

Apr 10 2019

thcipriani updated subscribers of T219086: Add legoktm to gerritadmin LDAP group (restoring previously held access).

Can we add back both @Legoktm and @QChris to Gerrit Administrators? Seems like they were doing work that needs Administrator privilege and exercising their privilege judiciously. This was discussed in a higher-bandwidth convo with me, @greg, and @hashar which I think are the folks who needed to agree for this ticket to be resolved.

Apr 10 2019, 3:25 PM · Release-Engineering-Team (Kanban), User-greg, LDAP-Access-Requests
thcipriani added a comment to T218783: `scap clean` failure.

Since these are vms, can their disk be expanded easily? I ask because they're the oddballs in the mw fleet :) I know Tyler wants to fix this issue this week, though.

Apr 10 2019, 1:42 PM · Patch-For-Review, Gerrit, Scap, Release-Engineering-Team (Kanban), User-zeljkofilipin