hashar (Antoine "hashar" Musso (WMF))
WMF Software developer - Release Engineering

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 2:31 PM (146 w, 4 d)
Availability
Available
IRC Nick
hashar
LDAP User
Hashar
MediaWiki User
Unknown

https://www.mediawiki.org/wiki/User:Hashar

Based in Nantes, France CET/CEST (UTC+1, UTC+2)

Main IRC channel is #wikimedia-releng

antoine-approve

Recent Activity

Today

hashar added a comment to T171724: wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/workspace/wikimedia-fundraising-civicrm/src/wikimedia/fundraising/civicrm-buildkit/bin/amp/src/Amp/Database/MySQL.php on line 58.

I have rebuild the job which happened to run on 1002. It failed with the exact same reason. So that does not seem to be due to differences between slaves :-(

Wed, Jul 26, 12:07 PM · Continuous-Integration-Infrastructure, Wikimedia-Fundraising-CiviCRM, Release-Engineering-Team (Kanban)
hashar created T171724: wikimedia-fundraising-civicrm fails with Call to a member function getDriver() on null in phar:///srv/jenkins-workspace/workspace/wikimedia-fundraising-civicrm/src/wikimedia/fundraising/civicrm-buildkit/bin/amp/src/Amp/Database/MySQL.php on line 58.
Wed, Jul 26, 11:17 AM · Continuous-Integration-Infrastructure, Wikimedia-Fundraising-CiviCRM, Release-Engineering-Team (Kanban)
hashar updated the task description for T150623: Upgrade CI emulator to API 25.
Wed, Jul 26, 11:06 AM · Release-Engineering-Team (Kanban), Jenkins, Continuous-Integration-Infrastructure, Patch-For-Review, Technical-Debt, Wikipedia-Android-App-Backlog
hashar moved T94684: Browser test jobs should use xUnit publisher instead of Junit from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.
Wed, Jul 26, 10:20 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Config, Browser-Tests-Infrastructure
hashar triaged T166756: Where to trigger WebPageTest jobs? as Normal priority.
Wed, Jul 26, 10:20 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar moved T166756: Where to trigger WebPageTest jobs? from In-progress to Done (within RelEng) on the Release-Engineering-Team (Kanban) board.
Wed, Jul 26, 10:20 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar closed T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 as Resolved.
Wed, Jul 26, 9:43 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar placed T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51 up for grabs.
Wed, Jul 26, 9:12 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar added a comment to T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51.
$ ./hiera_lookup -v --fqdn=`hostname -f` discovery::app_routes
...
DEBUG: 2017-07-26 09:09:10 +0000: Fetching https://wikitech.wikimedia.org/w/api.php?action=query&prop=revisions&format=json&rvprop=content&titles=Hiera:Integration/host/integration-puppetmaster01
/var/lib/git/operations/puppet/modules/wmflib/lib/hiera/mwcache.rb:40:in `rescue in read': Reading data from Integration/host/integration-puppetmaster01 failed: TypeError: Data retrieved from Integration/host/integration-puppetmaster01 is String, not Hash or nil (RuntimeError)
	from /var/lib/git/operations/puppet/modules/wmflib/lib/hiera/mwcache.rb:32:in `read'
	from /var/lib/git/operations/puppet/modules/wmflib/lib/hiera/backend/mwyaml_backend.rb:28:in `block in lookup'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:76:in `block in datasources'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:74:in `map'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:74:in `datasources'
	from /var/lib/git/operations/puppet/modules/wmflib/lib/hiera/backend/mwyaml_backend.rb:14:in `lookup'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:206:in `block in lookup'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:203:in `each'
	from /usr/lib/ruby/vendor_ruby/hiera/backend.rb:203:in `lookup'
	from /usr/lib/ruby/vendor_ruby/hiera.rb:60:in `lookup'
	from ./hiera_lookup:122:in `<main>'
Wed, Jul 26, 9:09 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar claimed T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51.
Wed, Jul 26, 9:04 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar created T171712: integration puppetmaster yield String, not Hash or nil at /etc/puppet/manifests/realm.pp:51.
Wed, Jul 26, 9:04 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure

Yesterday

hashar added a comment to T171160: Move wikiba.se repository from github to gerrit.

Well done @Ladsgroup

Tue, Jul 25, 9:52 PM · wikiba.se, Wikidata-Sprint, User-Ladsgroup, Wikidata
hashar added a comment to T171562: Android SDK is suddenly failing to auto-install, blocking tests from being executed.

@hashar , has any configuration around these machines (integration-slave-jessie-1001, integration-slave-jessie-1002) changed in the past 24 hours? I can't think of any change on our end that could be causing this.

Tue, Jul 25, 9:22 PM · Patch-For-Review, Continuous-Integration-Config, Wikipedia-Android-App-Backlog
hashar added a comment to T170995: Setup a mirror for R language dependencies (CRAN).

@mpopov using twitter to get the size was a smart move :-]

Tue, Jul 25, 9:11 PM · Discovery-Analysis, Continuous-Integration-Infrastructure, Operations, Release-Engineering-Team (Watching / External), Discovery
hashar added a comment to T171173: puppet dependency loop on deployment-sca hosts.

Thank you @thcipriani for the analysis and the patch!

Tue, Jul 25, 8:56 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar awarded T171173: puppet dependency loop on deployment-sca hosts a Party Time token.
Tue, Jul 25, 8:55 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar added a comment to T171632: Fix or remove Blubber's node_modules optimization.

Could it be that /tmp is a tmpfs and thus moving files under / is actually a copy+delete?

Tue, Jul 25, 8:52 PM · Release Pipeline (Blubber), Release-Engineering-Team (Kanban)
hashar added a comment to T170880: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink.

The fix is https://gerrit.wikimedia.org/r/#/c/366496/ by @Legoktm

Tue, Jul 25, 1:36 PM · MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)), Release-Engineering-Team (Watching / External), Parsing-Team, MediaWiki-Parser, MediaWiki-Core-Tests, Mobile App Sprint 52 - Android, Reading-Web-Backlog, Continuous-Integration-Config
hashar closed T162235: enwiki file "Lock_icon_blue.gif" in sites CSS has to be switched to commons wiki as Resolved.

They all have been edited:

$ mwgrep 'wikipedia/en/0/00/Lock_icon_blue.gif'
Tue, Jul 25, 9:50 AM · Release-Engineering-Team (Watching / External), Wikimedia-General-or-Unknown
hashar added a comment to T166756: Where to trigger WebPageTest jobs?.

@Krinkle regarding the bad slave being connected, that is entirely my fault. I created the new slave in Jenkins by copying saucelabs02. And apparently Jenkins connected to that host and kept that host ssh key. That sounds like a bug in Jenkins.

Tue, Jul 25, 9:44 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar added a comment to T170599: Wikibase: $idSerialization must match /^Q[1-9]\d{0,9}\z/i.

In logstash, the last trace matching idSerialization is 2017-07-20T16:56:19 and had:

/w/api.php?action=wbcheckconstraints&format=json&uselang=en&id=P3212   InvalidArgumentException from line 39 of /srv/mediawiki/php-1.30.0-wmf.9/extensions/Wikidata/vendor/wikibase/data-model/src/Entity/ItemId.php: $idSerialization must match /^Q[1-9]\d{0,9}\z/i
Tue, Jul 25, 8:32 AM · Patch-For-Review, Wikimedia-log-errors, Wikidata, Wikibase-DataModel
hashar updated the task description for T170599: Wikibase: $idSerialization must match /^Q[1-9]\d{0,9}\z/i.
Tue, Jul 25, 8:29 AM · Patch-For-Review, Wikimedia-log-errors, Wikidata, Wikibase-DataModel
hashar updated the task description for T170599: Wikibase: $idSerialization must match /^Q[1-9]\d{0,9}\z/i.
Tue, Jul 25, 8:26 AM · Patch-For-Review, Wikimedia-log-errors, Wikidata, Wikibase-DataModel

Mon, Jul 24

hashar added a comment to T170880: Parser tests fail if default Skin for unit tests makes use of doEditSectionLink.

I will visit this task later on this week. Came here to mention I have hit a similar wall with thumbnailBeforeProduceHTML hook which change the output expected by core or other extensions. T69302

Mon, Jul 24, 7:22 PM · MW-1.30-release-notes (WMF-deploy-2017-08-01_(1.30.0-wmf.12)), Release-Engineering-Team (Watching / External), Parsing-Team, MediaWiki-Parser, MediaWiki-Core-Tests, Mobile App Sprint 52 - Android, Reading-Web-Backlog, Continuous-Integration-Config
hashar updated subscribers of T78342: Create a basic RSpec unit test for operations/puppet.

@Joe proposed a rewriting of the Puppet Rakefile as part of T166888 Patch is https://gerrit.wikimedia.org/r/#/c/366591/ and implements the logic described in previous comment.

Mon, Jul 24, 4:44 PM · Ruby, User-zeljkofilipin, Release-Engineering-Team (Kanban), Continuous-Integration-Config, Patch-For-Review, Operations
hashar edited projects for T50002: Jenkins: Assert no PHP errors (notices, warnings) were raised or exceptions were thrown, added: Release-Engineering-Team (Kanban); removed Release-Engineering-Team.

@Krinkle definitely. Will probably want to limit it to the master branch for a while.

Mon, Jul 24, 4:26 PM · Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar added a comment to T171482: Programmatic generation of grafana dashboards.

OpenStack infrastructure has a python based utility to generate Grafana board based on a YAML DSL. That is similar to their Jenkins Job Builder used to generate jobs.

Mon, Jul 24, 3:43 PM · monitoring, Operations
hashar moved T166756: Where to trigger WebPageTest jobs? from Backlog to In-progress on the Release-Engineering-Team (Kanban) board.
Mon, Jul 24, 3:21 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar added a comment to T166756: Where to trigger WebPageTest jobs?.
webperformance:~$ nodejs --version
v6.11.0
webperformance:~$ npm -version
2.15.2
Mon, Jul 24, 3:08 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar claimed T166756: Where to trigger WebPageTest jobs?.
Mon, Jul 24, 2:52 PM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure, Performance-Team, WebPageTest
hashar closed T171174: a lot of beta cluster instances are not reachable over SSH as Resolved.

I have removed faulty puppet classes, ran puppet, restarted nslcd and reapplied the puppet classes

Mon, Jul 24, 11:10 AM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar committed rEPSO0861e773d0f2: build: add json-lint / banana i18n checker (authored by hashar).
build: add json-lint / banana i18n checker
Mon, Jul 24, 10:52 AM
hashar added a subtask for T171174: a lot of beta cluster instances are not reachable over SSH: T171454: deployment-ms-beXX Duplicate declaration: Exec[swift_udev_reload].
Mon, Jul 24, 9:59 AM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a parent task for T171454: deployment-ms-beXX Duplicate declaration: Exec[swift_udev_reload]: T171174: a lot of beta cluster instances are not reachable over SSH.
Mon, Jul 24, 9:59 AM · User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
hashar created T171454: deployment-ms-beXX Duplicate declaration: Exec[swift_udev_reload].
Mon, Jul 24, 9:49 AM · User-fgiunchedi, media-storage, Beta-Cluster-Infrastructure
hashar created T171441: Create Phabricator project for mediawiki/extensions/Wigo3.
Mon, Jul 24, 8:51 AM · Project-Admins
hashar added a comment to T167452: Undeploy and archive Cards extension.

I have marked Cards read-only in Gerrit

Mon, Jul 24, 6:59 AM · Patch-For-Review, MW-1.30-release-notes (WMF-deploy-2017-06-27_(1.30.0-wmf.7)), Unplanned-Sprint-Work, Reading-Web-Kanban-Board, MediaWiki-extensions-Cards, Wikimedia-Site-requests, Reading-Web-Backlog

Sun, Jul 23

hashar added a comment to T170963: Create code health mailing list .

Potentially you could reuse the QA list, it is very low traffic nowadays https://lists.wikimedia.org/pipermail/qa/ :]

Sun, Jul 23, 9:22 PM · Wikimedia-Mailing-lists, Release-Engineering-Team (Kanban)

Fri, Jul 21

hashar triaged T171352: recheck is ignored if there are also inline comments as Normal priority.

The detection is based on a regular expression in integration/config zuul/layout.yaml:

Fri, Jul 21, 10:15 PM · Patch-For-Review, Release-Engineering-Team (Backlog), Continuous-Integration-Config, Zuul
hashar changed the status of T158014: Investigate how to improve Android CI performance and stability from Declined to Resolved.

So that is partly resolved thanks to @Mholloway . He made patches that made it possible to enable QEMU2 and thus let the emulator use multiple CPU. Thus the run time went from 45 minutes to 11 minutes, largely because retrieving screenshots is way faster.

Fri, Jul 21, 9:50 PM · Continuous-Integration-Config, Release-Engineering-Team, Spike, Technical-Debt, Wikipedia-Android-App-Backlog
hashar added a comment to T150623: Upgrade CI emulator to API 25.

status update

Fri, Jul 21, 9:48 PM · Release-Engineering-Team (Kanban), Jenkins, Continuous-Integration-Infrastructure, Patch-For-Review, Technical-Debt, Wikipedia-Android-App-Backlog
hashar added a comment to T169918: Limit apps-android-wikipedia-periodic-test to only the screenshot tests.

Note after some magic happened, the job runs in ~ 11 minutes (was 45 minutes) \o/ ( T150623 )

Fri, Jul 21, 9:47 PM · Wikipedia-Android-App-Backlog
hashar added a comment to T157750: mw-tools-codesniffer-mwcore-testrun test fails with php is not hhvm.

Definitely yes. Looks like I have messed it up. Paladox did submit the same change https://gerrit.wikimedia.org/r/#/c/337225/ which I have abandoned, but I forgot to abandon mine.

Fri, Jul 21, 2:44 PM · Release-Engineering-Team (Kanban), Patch-For-Review, MediaWiki-Codesniffer, Continuous-Integration-Config, Continuous-Integration-Infrastructure
hashar added a comment to T171280: wikitech api list=novainstances not returning list of instances.

Yup because I have added novaadmin as a member of the deployment-prep tenant. But for tools it is still empty:

$ curl 'https://wikitech.wikimedia.org/w/api.php?action=query&list=novainstances&niregion=eqiad&format=json&niproject=deployment-prep' 
{"batchcomplete":"","query":{"novainstances":[]}}
Fri, Jul 21, 2:30 PM · Operations, Cloud-Services
hashar added a comment to T171280: wikitech api list=novainstances not returning list of instances.

The Icinga alert should probably be more noisy. Left to figure out is whether novaadmin should actually be a member.

Fri, Jul 21, 2:18 PM · Operations, Cloud-Services
hashar updated the task description for T171160: Move wikiba.se repository from github to gerrit.
Fri, Jul 21, 2:10 PM · wikiba.se, Wikidata-Sprint, User-Ladsgroup, Wikidata
Gerrit Code Review <gerrit@wikimedia.org> committed rWBbd8363843091: Modify access rules (authored by hashar).
Modify access rules
Fri, Jul 21, 1:41 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rWB015ce0813987: Modify access rules (authored by hashar).
Modify access rules
Fri, Jul 21, 1:41 PM
Gerrit Code Review <gerrit@wikimedia.org> committed rWB1cf33afafdfd: Rename group wikibase to obsolete-wikibase (authored by hashar).
Rename group wikibase to obsolete-wikibase
Fri, Jul 21, 1:41 PM
hashar added a comment to T171160: Move wikiba.se repository from github to gerrit.

I created the repo in Gerrit with the wrong name, thus I have marked the repo hidden and recreated it as wikibase/wikiba.se.git

Fri, Jul 21, 11:47 AM · wikiba.se, Wikidata-Sprint, User-Ladsgroup, Wikidata
hashar added a comment to T171160: Move wikiba.se repository from github to gerrit.

I have created the repo and imported all references from github (including pull requests refs, though they are not recognized as changes by gerrit):

Fri, Jul 21, 10:56 AM · wikiba.se, Wikidata-Sprint, User-Ladsgroup, Wikidata
hashar added a comment to T171280: wikitech api list=novainstances not returning list of instances.

And in the nova logs, I also see 401 for the tools project for requests from Silver

Fri, Jul 21, 9:46 AM · Operations, Cloud-Services
hashar added a comment to T171280: wikitech api list=novainstances not returning list of instances.

Code is in api/ApiListNovaInstances.php. Replaying it on silver:

$ mwscript eval.php --wiki=labswiki
> global $wgOpenStackManagerLDAPUsername;
> global $wgOpenStackManagerLDAPUserPassword;
> $user = new OpenStackNovaUser( $wgOpenStackManagerLDAPUsername );
Fri, Jul 21, 9:42 AM · Operations, Cloud-Services
hashar added a comment to T171160: Move wikiba.se repository from github to gerrit.

There is already a /wikibase project, so maybe /wikibase/www or /wikibase/wikibase.se It is all up to you :]

Fri, Jul 21, 9:10 AM · wikiba.se, Wikidata-Sprint, User-Ladsgroup, Wikidata

Thu, Jul 20

hashar added a comment to T162235: enwiki file "Lock_icon_blue.gif" in sites CSS has to be switched to commons wiki.

Thank you @Samtar I guess I was assuming that lot of bot operators already have global edit interface that is why I suggested reached out to them to automatize the edition. But your idea is way easier and as you said, it is "only" 66 edits :-}

Thu, Jul 20, 9:13 PM · Release-Engineering-Team (Watching / External), Wikimedia-General-or-Unknown
hashar closed T134863: Reflected XSS in GlobalGroupPermissions as Resolved.

@Bawolff sorry I failed to notice the fix made it to master age ago and effectively made it to REL1_29 when we branched. Thanks :-}

Thu, Jul 20, 9:06 PM · Patch-For-Review, Security-Team, MediaWiki-extensions-CentralAuth, Vuln-XSS, Security-Extensions, Security
hashar edited projects for T67478: Graph User::pingLimiter() actions in Grafana, added: Performance-Team; removed Patch-For-Review.

I filled this task in the hope someone could figure out the links to be added in Gdash. Nowadays that can be done via Grafana hence I reopened this task and rephrased the topic.

Thu, Jul 20, 8:29 PM · monitoring, Wikimedia-Incident, Performance-Team
hashar lowered the priority of T67478: Graph User::pingLimiter() actions in Grafana from Normal to Low.
Thu, Jul 20, 8:21 PM · monitoring, Wikimedia-Incident, Performance-Team
hashar renamed T67478: Graph User::pingLimiter() actions in Grafana from Graph User::pingLimiter() actions in gdash to Graph User::pingLimiter() actions in Graphana.
Thu, Jul 20, 8:21 PM · monitoring, Wikimedia-Incident, Performance-Team
hashar reopened T67478: Graph User::pingLimiter() actions in Grafana as "Open".
Thu, Jul 20, 8:20 PM · monitoring, Wikimedia-Incident, Performance-Team
hashar closed T171177: New instance in deployment prep can't run puppet for the first time as Resolved.

Andrew has deleted the instance and created a new one. The instance has been created when LDAP/Certs etc were screwed out that certainly explains the issue.

Thu, Jul 20, 8:19 PM · Services (watching), VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar closed T171177: New instance in deployment prep can't run puppet for the first time, a subtask of T171174: a lot of beta cluster instances are not reachable over SSH, as Resolved.
Thu, Jul 20, 8:19 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a comment to T171173: puppet dependency loop on deployment-sca hosts.

I did remove profile::recommendation_api on deployment-sca01 earlier but was hitting another puppet issue. That got fixed meanwhile.

Thu, Jul 20, 8:15 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar added a comment to T171174: a lot of beta cluster instances are not reachable over SSH.

Announced on the QA list pointing back to this task

Thu, Jul 20, 4:41 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar triaged T171174: a lot of beta cluster instances are not reachable over SSH as High priority.
Thu, Jul 20, 4:35 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a comment to T171174: a lot of beta cluster instances are not reachable over SSH.

So the state as I understand it right now:

Thu, Jul 20, 4:35 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a project to T171174: a lot of beta cluster instances are not reachable over SSH: Wikimedia-Incident.

https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2Fbeta

Thu, Jul 20, 4:30 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added projects to T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded: Wikimedia-Incident, Release-Engineering-Team (Kanban).

https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2Fbeta

Thu, Jul 20, 4:30 PM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar edited projects for T171148: CI jobs are blocked because castor is unreachable, added: Wikimedia-Incident; removed Patch-For-Review.

https://wikitech.wikimedia.org/wiki/Incident_documentation/20170719-ldap#CI.2Fbeta

Thu, Jul 20, 4:29 PM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar added a comment to T171177: New instance in deployment prep can't run puppet for the first time.

Seems the initial puppet run refuses to process for whatever reason. The instance is not even in salt so we can't access it at all. Trying to ssh into it yields Password:

Thu, Jul 20, 3:20 PM · Services (watching), VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar updated the task description for T171177: New instance in deployment prep can't run puppet for the first time.
Thu, Jul 20, 3:18 PM · Services (watching), VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a comment to T171173: puppet dependency loop on deployment-sca hosts.

I have added profile::recommendation_api back on deployment-sca01.

Thu, Jul 20, 3:17 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar added a comment to T171173: puppet dependency loop on deployment-sca hosts.

On deployment-sca01 I have removed profile::recommendation_api puppet then fails with:

Error: Failed to apply catalog:
  Could not find dependent Service[eventlogging/init]
  for File[/usr/local/lib/eventlogging/filters.py]
  at /etc/puppet/modules/eventlogging/manifests/plugin.pp:49?[0m
Thu, Jul 20, 3:10 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar added a comment to T171173: puppet dependency loop on deployment-sca hosts.

deployment-trending01.deployment-prep.eqiad.wmflabs has a similar issue:

(Exec[trendingedits config deploy] => Service::Node::Config::Scap3[trendingedits] => Scap::Target[trending-edits/deploy] => User[deploy-service] => Exec[trendingedits config deploy])
Thu, Jul 20, 3:00 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar updated the task description for T171174: a lot of beta cluster instances are not reachable over SSH.
Thu, Jul 20, 2:55 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar added a comment to T171148: CI jobs are blocked because castor is unreachable.

Beta cluster instances have the exact same issue. Filled as T171174

Thu, Jul 20, 2:54 PM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar created T171174: a lot of beta cluster instances are not reachable over SSH.
Thu, Jul 20, 2:53 PM · Services (watching), Wikimedia-Incident, VPS-Projects, Operations, Release-Engineering-Team (Kanban), Beta-Cluster-Infrastructure
hashar created T171173: puppet dependency loop on deployment-sca hosts.
Thu, Jul 20, 2:42 PM · User-Joe, Services (next), Release-Engineering-Team, Beta-Cluster-Infrastructure
hashar added a comment to T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.

I can confirm that resolved the issue completely. Thank you!

Thu, Jul 20, 1:57 PM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar added a comment to T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.

Seems the nova database is on m5-master.eqiad.wmnet db name nova.

Thu, Jul 20, 11:37 AM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar added a comment to T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.

The Nodepool launch errors https://grafana.wikimedia.org/dashboard/db/nodepool?panelId=12&fullscreen&orgId=1&from=now-20h&to=now

Thu, Jul 20, 11:25 AM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar added a comment to T158350: contintcloud project thinks it is using 206 fixed-ip quota errantly.

That is happening again after something got restarted yesterday. Filled as T171158

Thu, Jul 20, 11:16 AM · Patch-For-Review, Cloud-Services, Operations, Release-Engineering-Team
hashar updated subscribers of T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.

labnet1001.eqiad.wmnet has a lot of such errors in /var/log/nova/nova-network.log*

Thu, Jul 20, 11:15 AM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar updated the task description for T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.
Thu, Jul 20, 11:01 AM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar created T171158: contintcloud instance refuses to launch due to "Maximum number of fixed ips exceeded.
Thu, Jul 20, 11:01 AM · Release-Engineering-Team (Kanban), Wikimedia-Incident, Continuous-Integration-Infrastructure, Cloud-VPS
hashar created P5771 Puppet run for https://gerrit.wikimedia.org/r/#/c/365416/.
Thu, Jul 20, 9:32 AM
hashar added a comment to T152941: Make changing puppetmasters for Labs instances more easy.

I have updated the workaround using the one I originally wrote on T148929. The proposed one did not work for me on CI instances with a self puppet master.

Thu, Jul 20, 9:24 AM · Puppet, Cloud-Services
hashar updated the task description for T152941: Make changing puppetmasters for Labs instances more easy.
Thu, Jul 20, 9:23 AM · Puppet, Cloud-Services
hashar added a comment to T150502: Set up experimental Docker CI slave.

I have removed integration-slave-docker-1000 since puppet is completely broken on it.

Thu, Jul 20, 9:15 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Continuous-Integration-Infrastructure
hashar closed T171148: CI jobs are blocked because castor is unreachable as Resolved.
Thu, Jul 20, 9:13 AM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar added a comment to T171148: CI jobs are blocked because castor is unreachable.

I have manually repopulated the cache for operations/puppet.git by triggering https://integration.wikimedia.org/ci/job/operations-puppet-cache-update-jessie/

Thu, Jul 20, 9:13 AM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar claimed T171148: CI jobs are blocked because castor is unreachable.
Thu, Jul 20, 8:59 AM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar updated the task description for T152941: Make changing puppetmasters for Labs instances more easy.
Thu, Jul 20, 8:46 AM · Puppet, Cloud-Services
hashar closed T168511: Labs Jessie images come with puppet 3.7.2, should be 3.8.5 as Resolved.

I have booted a Jessie instance with the latest labs image and it comes with puppet 3.8.5:

apt-cache policy puppet
puppet:
  Installed: 3.8.5-2~bpo8+2
  Candidate: 3.8.5-2~bpo8+2
  Version table:
     4.8.2-5~bpo8+1 0
        100 http://mirrors.wikimedia.org/debian/ jessie-backports/main amd64 Packages
 *** 3.8.5-2~bpo8+2 0
       1001 http://apt.wikimedia.org/wikimedia/ jessie-wikimedia/backports amd64 Packages
        100 /var/lib/dpkg/status
     3.7.2-4+deb8u1 0
        500 http://security.debian.org/ jessie/updates/main amd64 Packages
     3.7.2-4 0
        500 http://httpredir.debian.org/debian/ jessie/main amd64 Packages
Thu, Jul 20, 8:44 AM · Release-Engineering-Team (Kanban), Cloud-VPS
hashar added a comment to T171148: CI jobs are blocked because castor is unreachable.

From the console log, puppet-agent on boot reports:

SSL_connect returned=1 errno=0 state=error: certificate verify failed: [self signed certificate in certificate chain for /CN=Puppet CA: virt1000.wikimedia.org]
Thu, Jul 20, 8:38 AM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure
hashar created T171148: CI jobs are blocked because castor is unreachable.
Thu, Jul 20, 7:54 AM · Wikimedia-Incident, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure

Wed, Jul 19

hashar added a comment to T150623: Upgrade CI emulator to API 25.

gdb wait-for-devices complained with:

Your emulator is out of date, please update by launching Android Studio:
 - Start Android Studio
 - Select menu "Tools > Android > SDK Manager"
 - Click "SDK Tools" tab
 - Check "Android SDK Tools" checkbox
 - Click "OK"
Wed, Jul 19, 3:24 PM · Release-Engineering-Team (Kanban), Jenkins, Continuous-Integration-Infrastructure, Patch-For-Review, Technical-Debt, Wikipedia-Android-App-Backlog
hashar added projects to T150623: Upgrade CI emulator to API 25: Continuous-Integration-Infrastructure, Jenkins, Release-Engineering-Team (Kanban).
Wed, Jul 19, 2:12 PM · Release-Engineering-Team (Kanban), Jenkins, Continuous-Integration-Infrastructure, Patch-For-Review, Technical-Debt, Wikipedia-Android-App-Backlog
hashar added a comment to T162235: enwiki file "Lock_icon_blue.gif" in sites CSS has to be switched to commons wiki.

I have updated the list. There are 66 entries.

Wed, Jul 19, 1:46 PM · Release-Engineering-Team (Watching / External), Wikimedia-General-or-Unknown
hashar updated the task description for T162235: enwiki file "Lock_icon_blue.gif" in sites CSS has to be switched to commons wiki.
Wed, Jul 19, 1:45 PM · Release-Engineering-Team (Watching / External), Wikimedia-General-or-Unknown