Page MenuHomePhabricator

hashar (Antoine "hashar" Musso (WMF))
WMF Software developer - Release Engineering

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:31 PM (259 w, 2 d)
Availability
Available
IRC Nick
hashar
LDAP User
Hashar
MediaWiki User
Unknown

https://www.mediawiki.org/wiki/User:Hashar

Based in Nantes, France CET/CEST (UTC+1, UTC+2)

Main IRC channel is #wikimedia-releng

antoine-approve

Recent Activity

Fri, Sep 20

hashar closed T233391: zuul-server should not start on spare server when the Debian package is upgraded, a subtask of T233390: zuul-merger fails to fetch from Gerrit, as Resolved.
Fri, Sep 20, 9:04 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar closed T233391: zuul-server should not start on spare server when the Debian package is upgraded as Resolved.

Solved with the assistance of @Dzahn

Fri, Sep 20, 9:04 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar created T233440: Raise quota for integration project.
Fri, Sep 20, 5:06 PM · Continuous-Integration-Infrastructure, Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Cloud-VPS (Quota-requests)
hashar added a comment to T233430: selenium-QuickSurveys browser test went from 20 seconds to 1+ minute.

I filled that task solely because the time went up. That indicates there is most probably an issue on the beta cluster somewhere rather than in the framework itself. So I would like to reproduce the run locally, see whether it is slow as well and find out the root cause.

Fri, Sep 20, 4:42 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (Unit & Int & System Tooling), Ruby, Browser-Tests, QuickSurveys
hashar added a comment to T229110: Upgrade Gerrit to 2.15.17.

Upstream has released a 2.15.17

Fri, Sep 20, 4:39 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (Development services), Gerrit
hashar created T233430: selenium-QuickSurveys browser test went from 20 seconds to 1+ minute.
Fri, Sep 20, 3:47 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (Unit & Int & System Tooling), Ruby, Browser-Tests, QuickSurveys
hashar updated the task description for T224591: Migrate contint* hosts to Buster.
Fri, Sep 20, 1:49 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie), Operations
hashar set Due Date to Mar 29 2020, 10:00 PM on T224591: Migrate contint* hosts to Buster.
Fri, Sep 20, 1:47 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie), Operations
hashar renamed T224591: Migrate contint* hosts to Buster from Migrate contint* hosts to Stretch/Buster to Migrate contint* hosts to Buster.
Fri, Sep 20, 1:46 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie), Operations
hashar added projects to T224591: Migrate contint* hosts to Buster: Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO.

From a quick chat with @MoritzMuehlenhoff :

Fri, Sep 20, 1:46 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie), Operations
hashar closed T153856: Add lint/CI to all wikimedia/discovery analytics repositories as Declined.

Re declining, we did a quick experiment two years ago, but it never concretized. Maybe later we can revisit using Docker containers and pairing with people knowledgeable about R and its test/package infrastructure.

Fri, Sep 20, 1:25 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO, Product-Analytics, Patch-For-Review, Discovery-Analysis (Current work), Discovery, Continuous-Integration-Config
hashar added a comment to T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch.

Eventually I wanted to reuse the exact same Docker package on Stretch (T226236) which got rejected. After some madness that seems to work (so far). The Stretch instances would receive Docker 18.09.7 from thirdparty/ci instead of 18.06.2 on Jessie.

Fri, Sep 20, 1:14 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar updated the task description for T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch.
Fri, Sep 20, 1:13 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar updated the task description for T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch.
Fri, Sep 20, 1:03 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar renamed T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch from Rebuild integration-slave-docker-* instances to use less RAM, new name to Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch.
Fri, Sep 20, 1:02 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar claimed T233391: zuul-server should not start on spare server when the Debian package is upgraded.
Fri, Sep 20, 10:21 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar added a comment to T233390: zuul-merger fails to fetch from Gerrit.

@hashar I guess the CI servers should have more relaxed thresholds? Is it even possible to configure gerrit to whitelist some host?

Fri, Sep 20, 8:40 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar created T233391: zuul-server should not start on spare server when the Debian package is upgraded.
Fri, Sep 20, 8:35 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar closed T233390: zuul-merger fails to fetch from Gerrit as Resolved.

I have upgraded zuul on contint2001 (T203846) which eventually got the zuul-server to start and establish two connections to the Gerrit server. I have stopped the service freeing the extra connections.

Fri, Sep 20, 8:30 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar added a comment to T233390: zuul-merger fails to fetch from Gerrit.

Gerrit has two connections from each contint servers for a total of four connections. We have Gerrit restricting to a total of four ssh connections. Hence the zuul-merger is no more able to fetch

Session    User            Remote Host
--------------------------------------------------------------
9bb66493   jenkins-bot     contint2001.wikimedia.org
7b67f043   jenkins-bot     contint2001.wikimedia.org
836929be   jenkins-bot     contint1001.wikimedia.org
e36ee520   jenkins-bot     contint1001.wikimedia.org
Fri, Sep 20, 8:26 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar added a comment to T233390: zuul-merger fails to fetch from Gerrit.

On contint1001

$ sudo su - zuul
$ cd /srv/zuul/git/operations/puppet
$ git fetch -v
Received disconnect from 2620:0:861:3:208:80:154:85: 12: Too many concurrent connections (4) - max. allowed: 4
fatal: Could not read from remote repository.
Fri, Sep 20, 8:24 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar triaged T233390: zuul-merger fails to fetch from Gerrit as Unbreak Now! priority.
Fri, Sep 20, 8:23 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure
hashar created T233390: zuul-merger fails to fetch from Gerrit.
Fri, Sep 20, 8:23 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Zuul, Continuous-Integration-Infrastructure

Thu, Sep 19

Andrew awarded T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002 a Orange Medal token.
Thu, Sep 19, 8:54 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar added a comment to T232612: WikibaseCirrusSearch emits cirrussearch-too-busy-error errors.

Thank you for the explanation and triage :]

Thu, Sep 19, 7:49 PM · Discovery-Search, Elasticsearch, Wikidata, Wikimedia-production-error
hashar added a project to T232646: Move integration-castor03.integration.eqiad.wmflabs to a newer cloudvirt machine: Cloud-VPS.

I think last time I synced with @aborrero to have the instance moved.

Thu, Sep 19, 6:19 PM · Cloud-VPS, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar added a parent task for T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002: T188375: castor rsync's taking 3-5 minutes for mwgate-npm jobs.
Thu, Sep 19, 6:18 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar added a parent task for T232646: Move integration-castor03.integration.eqiad.wmflabs to a newer cloudvirt machine: T188375: castor rsync's taking 3-5 minutes for mwgate-npm jobs.
Thu, Sep 19, 6:18 PM · Cloud-VPS, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar added subtasks for T188375: castor rsync's taking 3-5 minutes for mwgate-npm jobs: T232646: Move integration-castor03.integration.eqiad.wmflabs to a newer cloudvirt machine, T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002.
Thu, Sep 19, 6:18 PM · Continuous-Integration-Infrastructure
hashar added a comment to T188375: castor rsync's taking 3-5 minutes for mwgate-npm jobs.

The network is barely capped anymore, it got bumped to 800Mbits for egress traffic.

Thu, Sep 19, 6:17 PM · Continuous-Integration-Infrastructure
hashar closed T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002 as Resolved.

Solved by applying labstore::traffic_shaping::egress: 100mbps to the instance hiera configuration.

Thu, Sep 19, 6:15 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar triaged T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git" as Normal priority.
Thu, Sep 19, 1:19 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".

The jobs have been corrected. Quibble would then need to fatal out as soon as a repository can not be cloned/fetched etc.

Thu, Sep 19, 1:19 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar edited projects for T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git", added: Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services); removed Release-Engineering-Team.
Thu, Sep 19, 1:18 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar claimed T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".
Thu, Sep 19, 1:13 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".
quibble/zuul.py
with ThreadPoolExecutor(max_workers=workers) as executor:
    for project, dest in dests.items():
        # Copy and hijack the logger
        project_cloner = copy.copy(zuul_cloner)
        project_cloner.log = project_cloner.log.getChild(project)
Thu, Sep 19, 10:15 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".

I think that is due to Support to clone repositories in parallel (5f58fd252e499a37f19da753c064b7e34fc35028) released with 0.0.30. Passing `

Thu, Sep 19, 10:12 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar renamed T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git" from Quibble jobs error (non-fatal) "ERROR:zuul.Repo:Unable to initialize repo for npm-test.git" to Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".
Thu, Sep 19, 10:05 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar updated subscribers of T233291: Set up CI for the deployment-charts repository.

@Jdforrester-WMF has added an experimental helm-lint job to the repository: T216049. It runs help lint --strict charts/*/ :]

Thu, Sep 19, 10:03 AM · Release-Engineering-Team-TODO (201909), Continuous-Integration-Config, Kubernetes, local-charts, Release Pipeline, serviceops, Operations
hashar added a comment to T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch.

Will do the Stretch upgrade later on when I can also handle the upgrade to a more recent Docker daemon. Lets stick to the current stack for now.

Thu, Sep 19, 9:52 AM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar renamed T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch from Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch to Rebuild integration-slave-docker-* instances to use less RAM, new name.
Thu, Sep 19, 9:51 AM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch, a subtask of T224591: Migrate contint* hosts to Buster, as Declined.
Thu, Sep 19, 9:51 AM · Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie), Operations
hashar closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch as Declined.

It is not about downgrading Docker, but rather to keep the same version we are currently using on the Jessie instances. My primary intent was just to migrate to Stretch, not to have to deal with a Docker migration and more puppet work. containerd for example is no more managed by Docker but by systemd and the 18.09 Docker package is no more provided for Jessie. It is just too risky/long to migrate both the OS and the Docker engine at the sametime.

Thu, Sep 19, 9:51 AM · serviceops, Operations, Continuous-Integration-Infrastructure (phase-out-jessie)
hashar closed T226236: Upload docker-ce 18.06.3 upstream package for Stretch, a subtask of T226233: Rebuild integration-slave-docker-* instances to use less RAM, new name and Stretch, as Declined.
Thu, Sep 19, 9:51 AM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure (phase-out-jessie)
hashar awarded Blog Post: Wikipedia's JavaScript initialisation on a budget a 100 token.
Thu, Sep 19, 9:42 AM
hashar closed T233264: Jenkins jobs failing with Composer TransportException: 404 Not Found (September 2019) as Resolved.
Thu, Sep 19, 8:54 AM · Upstream, Wikimedia-production-error (Shared Build Failure), Continuous-Integration-Infrastructure, Release-Engineering-Team, Composer
hashar added a project to T233264: Jenkins jobs failing with Composer TransportException: 404 Not Found (September 2019): Upstream.

Seldaek comment Sep 19th 2019 - 08:05 UTC:

@Krinkle did it get resolved? I messed something up and we had delays in processing metadata updates last night.

Thu, Sep 19, 8:53 AM · Upstream, Wikimedia-production-error (Shared Build Failure), Continuous-Integration-Infrastructure, Release-Engineering-Team, Composer

Wed, Sep 18

hashar updated subscribers of T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".

@Legoktm did the optimizations for mediawiki/core. I guess we can revisit what should be run for mediawiki/core and maybe drop some of the optimization that have been made.

Wed, Sep 18, 7:49 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".
00:00:31.608 ERROR:zuul.Repo:Unable to initialize repo for https://gerrit.wikimedia.org/r/npm-test
00:00:31.609 Traceback (most recent call last):
00:00:31.609   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 51, in __init__
00:00:31.609     self._ensure_cloned()
00:00:31.610   File "/usr/local/lib/python3.5/dist-packages/zuul/merger/merger.py", line 63, in _ensure_cloned
00:00:31.610     git.Repo.clone_from(self.remote_url, self.local_path)
00:00:31.610   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 925, in clone_from
00:00:31.611     return cls._clone(git, url, to_path, GitCmdObjectDB, progress, **kwargs)
00:00:31.611   File "/usr/lib/python3/dist-packages/git/repo/base.py", line 880, in _clone
00:00:31.611     finalize_process(proc, stderr=stderr)
00:00:31.611   File "/usr/lib/python3/dist-packages/git/util.py", line 341, in finalize_process
00:00:31.612     proc.wait(**kwargs)
00:00:31.612   File "/usr/lib/python3/dist-packages/git/cmd.py", line 291, in wait
00:00:31.612     raise GitCommandError(self.args, status, errstr)
00:00:31.612 git.exc.GitCommandError: Cmd('git') failed due to: exit code(128)
00:00:31.613   cmdline: git clone -v https://gerrit.wikimedia.org/r/npm-test /workspace/src/npm-test
00:00:31.613   stderr: 'Cloning into '/workspace/src/npm-test'...
00:00:31.613 fatal: remote error: npm-test unavailable
00:00:31.613 '
Wed, Sep 18, 7:40 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T233143: Quibble should fatal out on clone/fetch failure"ERROR:zuul.Repo:Unable to initialize repo for npm-test.git".

The build was for https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/537445/2 and there is no extension dependencies injected to it.

Wed, Sep 18, 7:32 PM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO (201909), Quibble, Continuous-Integration-Config
hashar added a comment to T232104: Sprint: Prepare MediaWiki generated docs for easier writing of Markdown and misc clean up (Sept 2019).

Thank you @Krinkle :-]

Wed, Sep 18, 6:30 PM · MediaWiki-Documentation, Core Platform Team, Performance-Team
hashar updated subscribers of T233215: ConfirmEdit seemingly erroneously enabled for some users on wikitech.

@Reedy might know more about the magic of the Fancy captchas.

Wed, Sep 18, 1:27 PM · wikitech.wikimedia.org, Wikimedia-production-error, ConfirmEdit (CAPTCHA extension), Operations
hashar added a project to T233215: ConfirmEdit seemingly erroneously enabled for some users on wikitech: ConfirmEdit (CAPTCHA extension).
Wed, Sep 18, 1:26 PM · wikitech.wikimedia.org, Wikimedia-production-error, ConfirmEdit (CAPTCHA extension), Operations
hashar edited projects for T233089: Export zuul metrics to Prometheus, added: Release-Engineering-Team-TODO, Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure; removed Release-Engineering-Team.
Wed, Sep 18, 9:38 AM · Patch-For-Review, Continuous-Integration-Infrastructure, Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO, observability, Operations
hashar updated subscribers of T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org.

Hi @herron, others have pointed me to you for this task since you are on ops clinic duty. The package can't really be rebuild on the SRE box since it requires network access and would vary due to the installation of python dependencies from https://pypi..org/ . It is legacy and a bad practice, but that is predates a lot of changes we have done since (such as using scap, components in apt.wikimedia.org etc).

Wed, Sep 18, 9:33 AM · Operations, Zuul, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
hashar updated the task description for T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org.
Wed, Sep 18, 9:25 AM · Operations, Zuul, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
hashar closed T140912: Write / update tutorial for Zuul Debian packaging as Resolved.

That has been done by @Paladox on https://www.mediawiki.org/wiki/Continuous_integration/Zuul#new_package and I updated the doc.

Wed, Sep 18, 9:22 AM · Release-Engineering-Team (CI & Testing services), Release-Engineering-Team-TODO, Zuul, Continuous-Integration-Infrastructure
hashar added a subtask for T203846: Zuul cancels all changes when a change is manually merged: T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org.
Wed, Sep 18, 8:49 AM · Release-Engineering-Team-TODO (201909), Continuous-Integration-Infrastructure, Gerrit, Zuul
hashar added a parent task for T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org: T203846: Zuul cancels all changes when a change is manually merged.
Wed, Sep 18, 8:49 AM · Operations, Zuul, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
hashar added a comment to T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.

I had not checked which instances was used for logstash-beta.wmflabs.org. One sure thing they are both broken in the same way due to rsyslog / logstash udp input conflicting on port 11514 :]

Wed, Sep 18, 8:42 AM · observability, Wikimedia-Logstash, Release-Engineering-Team-TODO (201909), Beta-Cluster-Infrastructure
hashar added a comment to T231862: Selenium tests for Wikibase are being ran twice.

Looking at wmf-quibble-core-vendor-mysql-php72-docker build timing ( https://integration.wikimedia.org/ci/job/wmf-quibble-core-vendor-mysql-php72-docker/buildTimeTrend ):

Wed, Sep 18, 8:32 AM · Quibble, Continuous-Integration-Config, ci-test-error
hashar closed T233117: MediaWiki with sqlite lacks a CACHE_DB as Resolved.
Wed, Sep 18, 8:16 AM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, MW-1.34-release, SQLite, MediaWiki-Cache, MediaWiki-Installer
hashar added a comment to T219694: Enable compression for MW web responses in Jenkins jobs (e.g. Quibble, Fresnel).

Also confirmed from mw-debug-www.log:

wfClientAcceptsGzip: client accepts gzip.
MediaWiki\OutputHandler::handleGzip() is compressing output
Wed, Sep 18, 8:03 AM · Performance-Team-publish, User-Ladsgroup, Patch-For-Review, patch-welcome, Quibble, Performance-Team, Fresnel
hashar added a comment to T225730: Reduce runtime of MW shared gate Jenkins jobs to 5 min.

Change 528933 had a related patch set uploaded (by Ladsgroup; owner: Amir Sarabadani):
[integration/quibble@master] Set cache directory
https://gerrit.wikimedia.org/r/528933

Wed, Sep 18, 7:53 AM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.17; 2019-08-06), Release-Engineering-Team (Unit & Int & System Tooling), Release-Engineering-Team-TODO, Code-Health, Performance-Team (Radar), Epic, MediaWiki-Core-Testing, Continuous-Integration-Config
hashar added a comment to T231862: Selenium tests for Wikibase are being ran twice.

Released and deployed on September 17th with Quibble 0.0.35.

Wed, Sep 18, 7:51 AM · Quibble, Continuous-Integration-Config, ci-test-error
hashar added a comment to T219694: Enable compression for MW web responses in Jenkins jobs (e.g. Quibble, Fresnel).

Well done, and thank you for the verification.

Wed, Sep 18, 7:49 AM · Performance-Team-publish, User-Ladsgroup, Patch-For-Review, patch-welcome, Quibble, Performance-Team, Fresnel
hashar added a comment to T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002.

I'm sorry I didn't get to this! It sounds like you are (probably) all set.

Wed, Sep 18, 7:40 AM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team

Tue, Sep 17

hashar updated subscribers of T232796: [betalabs] Cannot create a new user account .

I would recommend to push 1.34.0-wmf.23 to testwiki and attempt to reproduce there. If that works in production, then that is probably beta that is to blame somehow.

Tue, Sep 17, 8:37 PM · Release-Engineering-Team-TODO (201909), User-zeljkofilipin, Beta-Cluster-Infrastructure, MediaWiki-extensions-CentralAuth
hashar added a comment to T223287: Investigate scap-cdb-rebuild idling until pressing ENTER repeatedly.

Most probably related to imported scap/sh.py which probably should be removed T222372

Tue, Sep 17, 8:35 PM · Scap
hashar added a comment to F30381842: Spam of Index change for mediawiki/core.

@Zoranzoki21 just a trace coming from Gerrit. That represents a queue of tasks the server has to do, in this case indexing change (whatever indexing means, I have no idea). :)

Tue, Sep 17, 7:38 PM
hashar added a comment to F30381842: Spam of Index change for mediawiki/core.

Spotted on Sep. 17 at 19:10. Roughly 1700 of them producing a nice spike of threads. But that is probably harmless.

Tue, Sep 17, 7:31 PM
hashar updated subscribers of T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.
Tue, Sep 17, 5:38 PM · observability, Wikimedia-Logstash, Release-Engineering-Team-TODO (201909), Beta-Cluster-Infrastructure
hashar added projects to T233134: logstash-beta.wmflabs.org does not receive any mediawiki events: Wikimedia-Logstash, observability.

So the logstash input for udp tries to bind on 11514 but rsyslogd is already listening there.

Tue, Sep 17, 5:38 PM · observability, Wikimedia-Logstash, Release-Engineering-Team-TODO (201909), Beta-Cluster-Infrastructure
hashar created T233134: logstash-beta.wmflabs.org does not receive any mediawiki events.
Tue, Sep 17, 5:26 PM · observability, Wikimedia-Logstash, Release-Engineering-Team-TODO (201909), Beta-Cluster-Infrastructure
hashar added a comment to T230729: Cypress testing framework evaluation .

A note, from https://opensource.intuit.com/ I found cyphell to convert from wdio to cypress. https://github.com/intuit/cyphfell . Quote:

Tue, Sep 17, 4:44 PM · Release-Engineering-Team-TODO, Release-Engineering-Team (Unit & Int & System Tooling), Patch-For-Review, User-zeljkofilipin
hashar created T233117: MediaWiki with sqlite lacks a CACHE_DB.
Tue, Sep 17, 3:27 PM · MW-1.34-notes (1.34.0-wmf.24; 2019-09-24), Performance-Team, MW-1.34-release, SQLite, MediaWiki-Cache, MediaWiki-Installer
hashar added a comment to T219694: Enable compression for MW web responses in Jenkins jobs (e.g. Quibble, Fresnel).

Ok I gave it a try creating a phpinfo file at the root of the mediawiki/core checkout and running something like:

rm src/LocalSettings.php
quibble --skip-zuul --skip-deps --db sqlite -c 'xdg-open http://127.0.0.1:9412/phpinfo.php'
Tue, Sep 17, 3:00 PM · Performance-Team-publish, User-Ladsgroup, Patch-For-Review, patch-welcome, Quibble, Performance-Team, Fresnel
hashar added a comment to T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002.

We will see the effect on https://graphite-labs.wikimedia.org/render/?width=1280&height=720&target=integration.integration-castor03.network.eth0.tx_bit

Tue, Sep 17, 2:30 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar updated the task description for T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002.
Tue, Sep 17, 2:29 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar claimed T232644: Check bandwidth limitation on integration-castor03.integration.eqiad.wmflabs / cloudvirt1002.

Eventually I have dig in the puppet log. I found out that all wmcs instance have a nfsclient puppet class applied which ends up invoking labstore::traffic_shapping. That classes creates a file /usr/local/sbin/tc-setup which has various shaping parameters.

Tue, Sep 17, 2:22 PM · Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure, cloud-services-team
hashar changed the status of T221969: Puppet catalog compiler - increasing max concurrent jobs from Stalled to Open.

To workaround the insanely slow puppet run from T228056, I commented out base::resolving::labs_additional_domains in hiera and the provisioning has been super fast.

Tue, Sep 17, 1:48 PM · Release-Engineering-Team, puppet-compiler, Continuous-Integration-Infrastructure
hashar added a parent task for T228056: Puppet times out on newly created instance in the puppet-diffs project: T221969: Puppet catalog compiler - increasing max concurrent jobs.
Tue, Sep 17, 1:11 PM · Cloud-VPS, cloud-services-team
hashar added a subtask for T221969: Puppet catalog compiler - increasing max concurrent jobs: T228056: Puppet times out on newly created instance in the puppet-diffs project.
Tue, Sep 17, 1:11 PM · Release-Engineering-Team, puppet-compiler, Continuous-Integration-Infrastructure
hashar closed T228056: Puppet times out on newly created instance in the puppet-diffs project as Declined.

I could not figure it out, so I guess we just have to wait for a while on the initial puppet run. After that the instance seems to behave properly.

Tue, Sep 17, 1:11 PM · Cloud-VPS, cloud-services-team
hashar moved T233020: gbp buildpackage with GIT_PBUILDER_AUTOCONF=no causes DIST to be ignored from Backlog to Reported Upstream on the Upstream board.
Tue, Sep 17, 9:50 AM · Upstream, Release-Engineering-Team-TODO, Packaging
hashar added a project to T233020: gbp buildpackage with GIT_PBUILDER_AUTOCONF=no causes DIST to be ignored: Upstream.
Tue, Sep 17, 9:49 AM · Upstream, Release-Engineering-Team-TODO, Packaging
hashar updated the task description for T233020: gbp buildpackage with GIT_PBUILDER_AUTOCONF=no causes DIST to be ignored.
Tue, Sep 17, 9:49 AM · Upstream, Release-Engineering-Team-TODO, Packaging
hashar placed T231089: WikibaseClient.php: PHP Notice: Undefined index: up for grabs.
Tue, Sep 17, 8:30 AM · PHP 7.2 support, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, Wikimedia-production-error
hashar closed T231089: WikibaseClient.php: PHP Notice: Undefined index: as Resolved.

I have checked in logstash, the error is gone since we have deployed the fix for T232613#5494695 (had to upgrade php-memcached).

Tue, Sep 17, 8:30 AM · PHP 7.2 support, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), Wikidata, Wikimedia-production-error
hashar added a comment to T225628: On CI, stop testing MediaWiki with php7.0 ahead of dropping support.

The PHP 7.0 and 7.1 jobs have been removed from the pipelines for the master branch! T216165 T216166

Tue, Sep 17, 8:23 AM · TechCom, Continuous-Integration-Config

Mon, Sep 16

hashar added a comment to T232495: selenium-daily-beta-CirrusSearch is broken.

From my review on https://gerrit.wikimedia.org/r/#/c/mediawiki/extensions/CirrusSearch/+/537149/:

Mon, Sep 16, 7:20 PM · Discovery-Search (Current work), Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), User-zeljkofilipin, Continuous-Integration-Infrastructure, CirrusSearch
hashar added a comment to T232903: Standardise `.mw-infobox` by relying on `.warningbox`.

Given we now have .messagebox and .warningbox, if the shared.css style is applied to the installer pages, I see no reason for not using them. So the later option I guess, migrate to the new standard and deprecate the barely used and dated method that nobody uses anyway.

Mon, Sep 16, 7:09 PM · Patch-For-Review, MediaWiki-Installer, Readers-Web-Backlog, UI-Standardization
TK-999 awarded T232613: LBFactoryMulti.php: PHP Notice: Undefined index: a Party Time token.
Mon, Sep 16, 3:47 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms, PHP 7.2 support, Wikimedia-production-error
hashar added a project to T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org: Operations.
Mon, Sep 16, 3:09 PM · Operations, Zuul, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
hashar created T233025: Upload zuul_2.5.1-wmf10 to apt.wikimedia.org.
Mon, Sep 16, 3:09 PM · Operations, Zuul, Release-Engineering-Team-TODO (201909), Release-Engineering-Team (CI & Testing services), Continuous-Integration-Infrastructure
hashar triaged T233020: gbp buildpackage with GIT_PBUILDER_AUTOCONF=no causes DIST to be ignored as Low priority.

Maybe someday, I have just logged this task for later and to be able to look it up whenever I encounter the issue. The hot fix is trivial (hack git-pbuilder to keep the envs).

Mon, Sep 16, 2:41 PM · Upstream, Release-Engineering-Team-TODO, Packaging
hashar created T233020: gbp buildpackage with GIT_PBUILDER_AUTOCONF=no causes DIST to be ignored.
Mon, Sep 16, 2:38 PM · Upstream, Release-Engineering-Team-TODO, Packaging
Krinkle awarded T232613: LBFactoryMulti.php: PHP Notice: Undefined index: a Orange Medal token.
Mon, Sep 16, 2:30 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms, PHP 7.2 support, Wikimedia-production-error
kostajh awarded T232613: LBFactoryMulti.php: PHP Notice: Undefined index: a Yellow Medal token.
Mon, Sep 16, 2:27 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.22; 2019-09-10), Core Platform Team Workboards (Clinic Duty Team), Wikimedia-Rdbms, PHP 7.2 support, Wikimedia-production-error
hashar closed T220747: 1.34.0-wmf.22 deployment blockers as Resolved.

Seems all fine!

Mon, Sep 16, 1:39 PM · Patch-For-Review, Release-Engineering-Team (Deployment services), Release, Train Deployments