Joe (Giuseppe Lavagetto)
Spy

Projects (22)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 5:57 AM (163 w, 6 d)
Availability
Available
LDAP User
Giuseppe Lavagetto
MediaWiki User
Unknown

Recent Activity

Wed, Nov 22

Joe removed a project from T181029: Upgrade dump hosts to a newer distribution: TechCom-RfC.
Wed, Nov 22, 9:26 PM · User-ArielGlenn, Dumps-Generation, MediaWiki-General-or-Unknown
Joe moved T181029: Upgrade dump hosts to a newer distribution from Inbox to Backlog (blocked or draft) on the TechCom-RfC board.
Wed, Nov 22, 9:26 PM · User-ArielGlenn, Dumps-Generation, MediaWiki-General-or-Unknown
Joe removed a project from T181029: Upgrade dump hosts to a newer distribution: RfC.
Wed, Nov 22, 9:25 PM · User-ArielGlenn, Dumps-Generation, MediaWiki-General-or-Unknown
Joe added a subtask for T172165: Require either PHP 7.0+ or HHVM in MW 1.31: T168470: Setup wikitech, horizon, and striker on new labweb hardware.
Wed, Nov 22, 5:36 PM · RfC, TechCom-RfC, MediaWiki-General-or-Unknown
Joe added a parent task for T168470: Setup wikitech, horizon, and striker on new labweb hardware: T172165: Require either PHP 7.0+ or HHVM in MW 1.31.
Wed, Nov 22, 5:36 PM · cloud-services-team (Kanban), Cloud-Services
Joe updated the task description for T180023: [DRAFT][RfC] Deployment of python applications in production.
Wed, Nov 22, 8:08 AM · User-Joe, Operations

Tue, Nov 21

Joe closed T178799: Revisit Pybal depool thresholds for app servers as Resolved.
Tue, Nov 21, 4:01 PM · Patch-For-Review, Operations
Joe created T181029: Upgrade dump hosts to a newer distribution.
Tue, Nov 21, 9:55 AM · User-ArielGlenn, Dumps-Generation, MediaWiki-General-or-Unknown
Joe claimed T181027: Create email alias for the TechCom.
Tue, Nov 21, 7:27 AM · TechCom, Operations
Joe created T181027: Create email alias for the TechCom.
Tue, Nov 21, 7:27 AM · TechCom, Operations

Thu, Nov 16

Joe added a comment to T178799: Revisit Pybal depool thresholds for app servers.

To summarize the historical reasons of those values:

Thu, Nov 16, 10:58 AM · Patch-For-Review, Operations
Joe added a comment to T180037: [Spike] Can the new render service run on Debian Stretch?.

@bmansurov heh, I though about the building process, but of course locally you can just use a stretch container with the packages fetched from nodesource.com (this is only important if your dependencies use any sort of shared library, which is sometimes the case).

Thu, Nov 16, 10:50 AM · Readers-Web-Kanban-Board, Proton, Spike, Readers-Web-Backlog
Joe added a comment to T177276: Unify production and CI docker image build process.

Thanks @thcipriani for converting the use of ops/puppet already!

Thu, Nov 16, 10:45 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a project to T180671: puppet compiler fail compilation on manifests using puppetdb: Operations.
Thu, Nov 16, 10:14 AM · Operations, puppet-compiler
Joe added a comment to T180671: puppet compiler fail compilation on manifests using puppetdb.

This looks like a problem in the specific puppetdb instance you're hitting, not a compiler one properly. Adding the tag operations as this is not a software bug.

Thu, Nov 16, 10:14 AM · Operations, puppet-compiler

Wed, Nov 15

Joe added a comment to T180524: Upgrade latest docker-registry.wikimedia.org/nodejs-devel to stretch.

@MoritzMuehlenhoff did you ever took a look at the deb packages that are officially distributed by node?

Wed, Nov 15, 6:58 PM · Release-Engineering-Team (Kanban), Operations, Release Pipeline
Joe added a comment to T180037: [Spike] Can the new render service run on Debian Stretch?.

Hi all, thank you for the great research work on this!

Wed, Nov 15, 6:55 PM · Readers-Web-Kanban-Board, Proton, Spike, Readers-Web-Backlog
Joe updated subscribers of T180524: Upgrade latest docker-registry.wikimedia.org/nodejs-devel to stretch.

@dduvall the reason this is happening right now is that stretch doesn't have a package for npm and I didn't get around to tackle that issue properly.

Wed, Nov 15, 8:47 AM · Release-Engineering-Team (Kanban), Operations, Release Pipeline
Joe added a comment to T180023: [DRAFT][RfC] Deployment of python applications in production.

Which deployment method to choose

I would mention also cases in which the upstream package or dependencies release quite often, like for example web apps.

Wed, Nov 15, 7:05 AM · User-Joe, Operations
Joe added a comment to T180023: [DRAFT][RfC] Deployment of python applications in production.

I would argue that including the source of the software as a submodule should be optional. The specific use case I have in mind is the deployment of mapzen, where all sources are external and all modules are downloaded.

Wed, Nov 15, 6:51 AM · User-Joe, Operations

Tue, Nov 14

Joe added a comment to T180462: puppet-compiler issue with CloudVPS instances.

This issue happens because the VM you're using has an outdated version of the compiler. I just tagged version 0.3.5, you can use it by just setting puppet_compiler::version: 0.3.5 in your project's hiera.

Tue, Nov 14, 2:59 PM · Cloud-VPS, cloud-services-team
Joe added a comment to T180384: Turn off Trending Service.

An undeployment procedure would be:

Tue, Nov 14, 12:30 PM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
Joe added a comment to T180384: Turn off Trending Service.

! In T180384#3758434, @Pchelolo wrote:
I went to hive to check the external traffic to the endpoint from web request logs. For a random day there were just 300 requests PER DAY to the endpoint. Most of the external requests ore done with node-fetch user-agent and only about 50 req/day with a browser. So there is some real traffic on the endpoint, but the numbers are really really low.

That seems low enough not to need further work than just removing the service from the scb cluster, and lvs.

Tue, Nov 14, 12:26 PM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
Joe added a comment to T180384: Turn off Trending Service.

Really the concept needs more testing for product viability. Unfortunately, we were unable to test in a non-production environment due to Kafka not being available outside of production.

Is it actually impossible to use that in Labs or is it just that whoever put it into production didn't properly mirror it in beta?

Tue, Nov 14, 11:55 AM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
Joe added a comment to T180384: Turn off Trending Service.

This whole event brings forward a larger question about microservices and their cost.

Tue, Nov 14, 10:49 AM · Operations, Services (designing), Reading-Infrastructure-Team-Backlog (Kanban), Trending-Service
Joe added a comment to T173129: Prove helm as a potential k8s deployment tool.

I have been playing with helm quite a bit in the last couple weeks, I think it is, in the end, the best tool for the job we want to accomplish.

Tue, Nov 14, 8:21 AM · User-Joe, Release-Engineering-Team (Next), Release Pipeline
Joe moved T147204: Update confd package from Doing to Blocking others on the User-Joe board.
Tue, Nov 14, 8:07 AM · User-Joe, Beta-Cluster-reproducible, Operations
Joe moved T180023: [DRAFT][RfC] Deployment of python applications in production from Backlog to Doing on the User-Joe board.
Tue, Nov 14, 8:07 AM · User-Joe, Operations
Joe moved T177276: Unify production and CI docker image build process from Doing to Blocked on others on the User-Joe board.
Tue, Nov 14, 8:07 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe closed T162013: etcd cluster in codfw has raft consensus issues as Resolved.
Tue, Nov 14, 8:06 AM · Patch-For-Review, User-Joe, Operations
Joe added a comment to T162013: etcd cluster in codfw has raft consensus issues.

Since we had no more alarms in real-world situations, I think we can safely close this ticket now
.

Tue, Nov 14, 8:06 AM · Patch-For-Review, User-Joe, Operations

Fri, Nov 10

Joe added projects to T179786: Update trending-edits' node-rdkafka to v1.x: Operations, User-Joe.
Fri, Nov 10, 11:59 AM · Patch-For-Review, User-Joe, Operations, Wikimedia-Incident, Reading-Infrastructure-Team-Backlog (Kanban), User-Jdlrobson, Trending-Service, Services (watching)
Joe added a comment to T179786: Update trending-edits' node-rdkafka to v1.x.

I would be tempted to raise the priority of this task to UBN since this has cost debug sessions twice in a week to multiple people. I'm also adding operations to the task as we're clearly impacted.

Fri, Nov 10, 11:59 AM · Patch-For-Review, User-Joe, Operations, Wikimedia-Incident, Reading-Infrastructure-Team-Backlog (Kanban), User-Jdlrobson, Trending-Service, Services (watching)

Thu, Nov 9

Joe added a comment to T179099: puppetmaster hostcert and hostprivkey point to nonexistent files.

In all this, some random person revoked puppetmaster1001's own certificate, which is used to access the ca_server, as far as I understand, which cannot be good.

Thu, Nov 9, 11:40 AM · Patch-For-Review, Puppet, User-Joe, Operations
Joe added a comment to T179099: puppetmaster hostcert and hostprivkey point to nonexistent files.

Mistery solved: in the method @ssh_host.certificate calls, that is Puppet::SSL::Host.certificate, we have

Thu, Nov 9, 11:25 AM · Patch-For-Review, Puppet, User-Joe, Operations
Joe added a comment to T179099: puppetmaster hostcert and hostprivkey point to nonexistent files.

I was able to extract a semi-meaningful backtrace from rhodium:

Thu, Nov 9, 8:53 AM · Patch-For-Review, Puppet, User-Joe, Operations

Wed, Nov 8

Joe updated the task description for T180023: [DRAFT][RfC] Deployment of python applications in production.
Wed, Nov 8, 12:06 PM · User-Joe, Operations
Joe created T180023: [DRAFT][RfC] Deployment of python applications in production.
Wed, Nov 8, 12:05 PM · User-Joe, Operations

Fri, Nov 3

Joe added a comment to T178570: How should we get Chromium for use in puppeteer?.

Please note that my biggest concern here is the security one. Citing myself:

Fri, Nov 3, 3:07 PM · Spike, Release-Engineering-Team (Watching / External), Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
Joe added a comment to T178570: How should we get Chromium for use in puppeteer?.

@phuedx no, getting the headers while you are downloading would not be enough, you would need to supply your script with a non-tamperable checksum (so I'd say at least sha256) of the file, at the very least.

Fri, Nov 3, 2:54 PM · Spike, Release-Engineering-Team (Watching / External), Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
Joe added a comment to T179395: Cluster puppet variable and ganglia decommission.

Also, don't forget our code needs to work within labs.

Fri, Nov 3, 8:01 AM · Patch-For-Review, monitoring, Operations
Joe added a comment to T179395: Cluster puppet variable and ganglia decommission.

All you write would be great, if only

Given that we're going to have a single Puppet role per host

Fri, Nov 3, 7:59 AM · Patch-For-Review, monitoring, Operations

Thu, Nov 2

Joe moved T177387: Decomission mw1161-69 from Doing to Blocked on others on the User-Joe board.
Thu, Nov 2, 11:43 AM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad
Joe triaged T179562: Create jenkins job for creating deployment artifacts for `docker-pkg-deploy` as High priority.
Thu, Nov 2, 8:53 AM · Release-Engineering-Team, User-Joe, Operations
Joe created T179562: Create jenkins job for creating deployment artifacts for `docker-pkg-deploy`.
Thu, Nov 2, 8:53 AM · Release-Engineering-Team, User-Joe, Operations

Thu, Oct 26

Joe closed T179023: Puppet: Use of 'import' has been discontinued in favor of a manifest directory. as Invalid.
Thu, Oct 26, 2:27 PM · Puppet, User-Joe, Operations
Joe closed T179023: Puppet: Use of 'import' has been discontinued in favor of a manifest directory., a subtask of T177254: Upgrade to puppet 4 (4.8 or newer), as Invalid.
Thu, Oct 26, 2:27 PM · Patch-For-Review, cloud-services-team (FY2017-18), Puppet, User-Joe, Operations
Joe added a comment to T179023: Puppet: Use of 'import' has been discontinued in favor of a manifest directory..

Everything that moves to puppet 4 is "environment future" now until WMCS moves at least to the future parser.

Thu, Oct 26, 2:27 PM · Puppet, User-Joe, Operations
Joe moved T179033: Puppet: Error: Evaluation Error: Error while evaluating a Function Call, undefined local variable or method `known_resource_types' from Backlog to Doing on the User-Joe board.
Thu, Oct 26, 1:30 PM · Puppet, User-Joe, Operations
Joe claimed T179033: Puppet: Error: Evaluation Error: Error while evaluating a Function Call, undefined local variable or method `known_resource_types'.
Thu, Oct 26, 11:09 AM · Puppet, User-Joe, Operations
Joe added a comment to T179033: Puppet: Error: Evaluation Error: Error while evaluating a Function Call, undefined local variable or method `known_resource_types'.

I feared this would happen.

Thu, Oct 26, 11:09 AM · Puppet, User-Joe, Operations
Joe added a comment to T179019: deployment-prep statsd hiera does not have port.

yeah we just need to add the port, it's obviously an error.

Thu, Oct 26, 11:03 AM · Patch-For-Review, Services (watching), MediaWiki-JobQueue

Wed, Oct 25

Joe added a comment to T177387: Decomission mw1161-69.

I did all the steps in decom up to the uninterruptible tasks. @Cmjohnson the servers are yours to fully decom.

Wed, Oct 25, 8:08 AM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad
Joe updated the task description for T177387: Decomission mw1161-69.
Wed, Oct 25, 8:07 AM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad
Joe moved T177387: Decomission mw1161-69 from Backlog to Doing on the User-Joe board.
Wed, Oct 25, 8:06 AM · Patch-For-Review, User-Elukey, User-Joe, Operations, ops-eqiad

Oct 24 2017

Joe closed T172498: Switch databases to the future parser as Resolved.
Oct 24 2017, 6:33 AM · DBA, Puppet, Operations
Joe closed T172498: Switch databases to the future parser, a subtask of T171704: Switch all hosts to the future parser, as Resolved.
Oct 24 2017, 6:33 AM · Patch-For-Review, User-Joe, Puppet, Operations
Joe added a comment to T172498: Switch databases to the future parser.

Since we switched all of production to the future parser almost 2 months ago, we clearly fixed these issues as part of the more general ticket about the future parser.

Oct 24 2017, 6:33 AM · DBA, Puppet, Operations

Oct 23 2017

Joe added a project to T178810: Wikibase: Increase batch size for HTMLCacheUpdateJobs triggered by repo changes.: User-Joe.
Oct 23 2017, 4:44 PM · User-Joe, Operations, JobRunner-Service, MediaWiki-extensions-WikibaseClient, Wikidata
Joe added a comment to T175297: Define new Jenkins pipeline for container build phase.

As a suggestion: I would host your own registry under the CI project in labs for testing/managing local build you might need to retrieve.

Oct 23 2017, 5:25 AM · Release-Engineering-Team (Kanban), Patch-For-Review, Release Pipeline

Oct 20 2017

Joe added a comment to T178570: How should we get Chromium for use in puppeteer?.

OTOH there's nothing to stop us from launching a Chromium process ourselves and using command line switches to make it save the page as a PDF: https://peter.sh/experiments/chromium-command-line-switches/#print-to-pdf (this list is linked to from https://www.chromium.org/developers/how-tos/run-chromium-with-flags).

Before we go any further investigating how we can best support using the puppeteer library, we should first revalidate whether we should use it in light of all of this recent (both productive and enlightening!) discussion.

If that's feature-equivalent to puppeteer, that sounds like the best solution to me. Also in terms of updates, that's far more light-weight; Debian would release the updates and all we'd need to do after new releases is to ensure that no regressions (or intentional changes in the headless mode) happened.

Oct 20 2017, 2:15 PM · Spike, Release-Engineering-Team (Watching / External), Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
Joe claimed T178606: Auth fails for `docker-pusher` script on `contint1001`.
Oct 20 2017, 6:27 AM · Patch-For-Review, Release Pipeline
Joe claimed T177055: Update docker image docker-registry.wikimedia.org/wikimedia-jessie.
Oct 20 2017, 6:14 AM · Release Pipeline, Continuous-Integration-Infrastructure (shipyard), Operations
Joe added a comment to T178606: Auth fails for `docker-pusher` script on `contint1001`.

The reason why you're not currently able to upload to the registry is that it whitelists the clients that can upload images. I will need to add the contint machines, and maybe even create separate credentials for different namespaces. For now, i'll work with @hashar to fix this.

Oct 20 2017, 6:09 AM · Patch-For-Review, Release Pipeline
Joe updated subscribers of T178570: How should we get Chromium for use in puppeteer?.

I think there are a few things at play here:

  • How do we distribute chromium to the servers in the cluster efficiently?
  • How do we ensure security upgrades happen in a timely manner for this component? The team maintaining it will need to set up a process for this (following puppeteer releases, and upgrade in a timely manner)
  • How do we download chromium in the fist place in a verifiable way? I've checked puppeteer and it doesn't do any form of checksum of the downloaded zip file or anything, and I can't find checksums of the zip files on the chromium releases website
Oct 20 2017, 6:02 AM · Spike, Release-Engineering-Team (Watching / External), Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
Joe added a project to T178570: How should we get Chromium for use in puppeteer?: Operations.
Oct 20 2017, 5:53 AM · Spike, Release-Engineering-Team (Watching / External), Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Readers-Web-Backlog, Proton, Electron-PDFs
Joe added a comment to T178189: [spike] Temporarily allow pushing large objects.

I'm not sure why we need another task or to tag RelEng since I've been following along since the beginning.

Oct 20 2017, 5:52 AM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit

Oct 19 2017

Joe added a comment to T178189: [spike] Temporarily allow pushing large objects.

@Paladox actually I'd open a new ticket describing the problem instead of requesting a change of configuration to gerrit.

Oct 19 2017, 6:43 AM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit
Joe added a comment to T177276: Unify production and CI docker image build process.

1 more thing to throw into the mix.

Right now we have a mediawiki-phan image, and I want to be able to create multiple versions of this image for multiple versions of phan (phan 0.8, 0.9) etc.

Is there a way that we can also make this work?
In my head these should all be the same image just with different labels at least.

Oct 19 2017, 6:41 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a comment to T178189: [spike] Temporarily allow pushing large objects.

I took a look at how puppeteer downloads chromium and it's underwhelmning to be honest: they plainly download a zip file and unpack it. As far as I can see no way of verifying the downloaded package is made; what is worse, I cannot see any page on the chromium downloads website reporting the checksums for the archives we'd need to download.

Oct 19 2017, 6:36 AM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit

Oct 18 2017

Joe added a comment to T178189: [spike] Temporarily allow pushing large objects.

Puppeteer documentations warns against using versions of Chromium that doesn't come with pupeeteer:

NOTE Puppeteer works best with the version of Chromium it is bundled with. There is no guarantee it will work with any other version. Use executablePath option with extreme caution. If Google Chrome (rather than Chromium) is preferred, a Chrome Canary or Dev Channel build is suggested.

https://github.com/GoogleChrome/puppeteer/blob/v0.11.0/docs/api.md#puppeteerlaunchoptions

I wonder whether this a good reason to not use the Debian version of Chromium.

Also, the latest Debian Jessie has the Chromium version 57.0.2987.98-1~deb8u1, and the headless Chromium first appeared in versoin 59. Does that mean we should compile our own version of Chromium? Wouldn't it defeat the purpose of getting free security fixes from the Debian package maintainers?

Also, I created a proof of concept patch that uses the distribution's Chromium, except the patch doesn't work and puppeteer warns against using non-bundled Chromium: https://gerrit.wikimedia.org/r/385044.

Given the above, would it make sense to stick to the version of Chromium provided by puppeteer?

Oct 18 2017, 8:56 PM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit
Joe updated subscribers of T178189: [spike] Temporarily allow pushing large objects.

I would still advise distributing such a large binary (and the corresponding libraries) as a deb package, or as an archive somehow. We do have an artifacts repository in archiva, but I fear that only works for jars.

Oct 18 2017, 8:53 PM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit
Joe added a comment to T178189: [spike] Temporarily allow pushing large objects.

Hi! I'm not sure I understand the details or the requirements, in fact last time I looked at your project, you were planning on working with python, while I see puppeteer is javascript, so I can assume any information I have about the project is now outdated.

Oct 18 2017, 6:20 AM · Spike, Operations, Unplanned-Sprint-Work, Readers-Web-Kanban-Board, Patch-For-Review, Readers-Web-Backlog, Gerrit

Oct 17 2017

Joe added a comment to T177276: Unify production and CI docker image build process.

I have a proposal: what about controlling semantic versioning via the changelog but allowing people to specify a --nightly CLI switch to inject the date in the version number?

Oct 17 2017, 6:58 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)

Oct 13 2017

Joe added a comment to T177276: Unify production and CI docker image build process.

With a quick skim at the CI repo, the following things done there are not supported by the current build system:

Oct 13 2017, 9:26 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a comment to T177276: Unify production and CI docker image build process.

Status update: I extracted the build script from operations/docker-images/production-images and it is able to build the docker containers in that directory. A first public commit will be ready once I'm done writing tests/documentation.

Oct 13 2017, 9:04 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe claimed T177276: Unify production and CI docker image build process.
Oct 13 2017, 8:58 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a comment to T176370: Migrate to PHP 7 in WMF production.

Probably. This task would block that; we can't change MW core until WMF production is migrated and stable on PHP7 instead of HHVM

MW 1.31 is going out in June 2018. All the potential timelines I've seen so far did not plan to finish the wmf-prod migration this fiscal year (that is, by end of June 2018). If this task is really a blocker for bumping the mw version requirement, either the migration has to happen faster, mw1.31 has to be delayed or we'll have to support PHP5 until 2021. In that sense, I agree with what Tgr wrote on the other task: This task should not be blocker for T172165.

Oct 13 2017, 6:26 AM · User-ArielGlenn, NewPHP, HHVM, TechCom-RfC, MediaWiki-Platform-Team, Operations

Oct 11 2017

Joe updated the task description for T177958: Decommission ocg1001-3.
Oct 11 2017, 3:51 PM · ops-eqiad, hardware-requests, Operations
Joe placed T177958: Decommission ocg1001-3 up for grabs.
Oct 11 2017, 3:51 PM · ops-eqiad, hardware-requests, Operations
Joe updated the task description for T177931: Decommission OCG from production.
Oct 11 2017, 3:51 PM · Patch-For-Review, Services (watching), OCG-General, Operations
Joe created T177958: Decommission ocg1001-3.
Oct 11 2017, 3:48 PM · ops-eqiad, hardware-requests, Operations
Joe updated the task description for T177931: Decommission OCG from production.
Oct 11 2017, 3:22 PM · Patch-For-Review, Services (watching), OCG-General, Operations
Joe added a comment to T177276: Unify production and CI docker image build process.
  • There is no need for cache busters as we ignore cache at image build time. That is actually the only way around the broken cache model docker employs.

I was thinking more along the lines of "we have this image, is there new commits to the git repo that was built/included in it?" I was thinking of something similar to the proposed debmonitor but for included git repos.

Also for actual builds that will be used I agree with disabling caching, but for local testing we still need the option to use caching otherwise builds could be pretty slow.

Oct 11 2017, 7:58 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)

Oct 10 2017

Joe moved T177276: Unify production and CI docker image build process from Backlog to Doing on the User-Joe board.
Oct 10 2017, 3:21 PM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a comment to T177276: Unify production and CI docker image build process.

Some requirements of this build process:

Oct 10 2017, 7:50 AM · Patch-For-Review, User-Joe, Operations, Continuous-Integration-Infrastructure (shipyard)
Joe added a comment to T177815: Alerts on LVS services with one single realserver.

I would suggest we need to add a condition to the alert so that it gets skipped when the pool size is one backend only.

Oct 10 2017, 6:58 AM · Patch-For-Review, Operations, Pybal, Traffic

Oct 9 2017

Joe updated the task description for T177254: Upgrade to puppet 4 (4.8 or newer).
Oct 9 2017, 10:07 AM · Patch-For-Review, cloud-services-team (FY2017-18), Puppet, User-Joe, Operations
Joe updated the task description for T177254: Upgrade to puppet 4 (4.8 or newer).
Oct 9 2017, 9:47 AM · Patch-For-Review, cloud-services-team (FY2017-18), Puppet, User-Joe, Operations
Joe added projects to T177254: Upgrade to puppet 4 (4.8 or newer): User-Joe, Puppet.
Oct 9 2017, 9:44 AM · Patch-For-Review, cloud-services-team (FY2017-18), Puppet, User-Joe, Operations
Joe closed T172498: Switch databases to the future parser, a subtask of T171704: Switch all hosts to the future parser, as Resolved.
Oct 9 2017, 9:43 AM · Patch-For-Review, User-Joe, Puppet, Operations
Joe closed T172498: Switch databases to the future parser as Resolved.
Oct 9 2017, 9:43 AM · DBA, Puppet, Operations

Oct 5 2017

Joe added a subtask for T177397: Create scaffolding of services templates for deployment in production/staging: T173129: Prove helm as a potential k8s deployment tool.
Oct 5 2017, 8:54 AM · Patch-For-Review, Prod-Kubernetes, User-Joe, Operations, Kubernetes
Joe added a parent task for T173129: Prove helm as a potential k8s deployment tool: T177397: Create scaffolding of services templates for deployment in production/staging.
Oct 5 2017, 8:54 AM · User-Joe, Release-Engineering-Team (Next), Release Pipeline
Joe added projects to T177397: Create scaffolding of services templates for deployment in production/staging: User-Joe, Prod-Kubernetes.
Oct 5 2017, 8:53 AM · Patch-For-Review, Prod-Kubernetes, User-Joe, Operations, Kubernetes
Joe added projects to T177396: Design pod-level monitoring and service-level alerting: User-Joe, Prod-Kubernetes.
Oct 5 2017, 8:52 AM · Prod-Kubernetes, User-Joe, Kubernetes, Operations
Joe added projects to T177395: Improve monitoring of the Kubernetes clusters: User-Joe, Prod-Kubernetes.
Oct 5 2017, 8:49 AM · Patch-For-Review, monitoring, User-Joe, Prod-Kubernetes, User-fgiunchedi, Operations, Kubernetes
Joe added projects to T177394: Experiment with a TLS proxy/router for pods: Prod-Kubernetes, User-Joe.
Oct 5 2017, 8:48 AM · User-Joe, Prod-Kubernetes, Operations, Kubernetes
Joe added a project to T177393: Implement authentication/authorization in Kubernetes clusters: Prod-Kubernetes.
Oct 5 2017, 8:48 AM · Patch-For-Review, Prod-Kubernetes, Operations, Kubernetes
Joe added a comment to T173415: puppet-compiler should display newly introduced resources entirely.

I agree that would be preferrable; I just reproduced what other catalog differs do, which is admittedly a bit lame.

Oct 5 2017, 8:46 AM · User-Joe, puppet-compiler