akosiaris (Alexandros Kosiaris)
Senior Operations Engineer

Projects (19)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 8:40 AM (185 w, 1 d)
Availability
Available
IRC Nick
akosiaris
LDAP User
Alexandros Kosiaris
MediaWiki User
AKosiaris (WMF)

Blurb

Recent Activity

Thu, Apr 19

akosiaris added a comment to T150532: Upgrade qemu on ganeti clusters to 2.7.

Or we could upgrade the Ganeti cluster to stretch? It provides qemu 2.8 out of the box.

Thu, Apr 19, 3:32 PM · Operations
akosiaris changed the status of T150532: Upgrade qemu on ganeti clusters to 2.7 from Stalled to Open.

With cache=none being set in all cluster for unrelated reasons, this is now unblocked. In the meantime jessie-backports has upgrade to 2.8. Fortunately the changelog[1] does not have any worrying items in it. The upgrade will require a round of VM reboots, but otherwise looks ok. I 'll empty an eqiad ganeti host, upgrade to 2.8 and move a few VMs to it for testing.

Thu, Apr 19, 12:21 PM · Operations
akosiaris closed T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O as Resolved.

All VMs have been migrated to using cache=none. I 'll resolve this successfully, hopefully we will not meet this issue again

Thu, Apr 19, 12:14 PM · Operations
akosiaris updated the task description for T192531: puppetdb does not start up on reboot.
Thu, Apr 19, 10:07 AM · Patch-For-Review, Operations, Puppet
akosiaris updated the task description for T192531: puppetdb does not start up on reboot.
Thu, Apr 19, 9:49 AM · Patch-For-Review, Operations, Puppet
akosiaris triaged T192531: puppetdb does not start up on reboot as High priority.
Thu, Apr 19, 9:49 AM · Patch-For-Review, Operations, Puppet
akosiaris created T192531: puppetdb does not start up on reboot.
Thu, Apr 19, 9:49 AM · Patch-For-Review, Operations, Puppet
akosiaris changed the status of T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O from Stalled to Open.

1 month with no incident. I 'll proceed with rebooting all ganeti VMs on row_C and then move on to codfw

Thu, Apr 19, 7:16 AM · Operations

Wed, Apr 18

akosiaris changed the status of T164376: [Discuss] Split ORES scores in datacenters based on wiki from Declined to Resolved.

Since this was a [Discuss] task, resolved was conceptually correct.

Wed, Apr 18, 2:06 PM · User-Ladsgroup, Traffic, ORES, Scoring-platform-team, Operations, ChangeProp
akosiaris changed the status of T164376: [Discuss] Split ORES scores in datacenters based on wiki from Resolved to Declined.

Declined actually.

Wed, Apr 18, 2:00 PM · User-Ladsgroup, Traffic, ORES, Scoring-platform-team, Operations, ChangeProp
akosiaris closed T191767: Important critical Etherpad release – 1.6.4 as Resolved.
Wed, Apr 18, 7:52 AM · Wikimedia-Etherpad, Operations, Security
akosiaris changed the visibility for T191767: Important critical Etherpad release – 1.6.4.
Wed, Apr 18, 7:51 AM · Wikimedia-Etherpad, Operations, Security
akosiaris added a comment to T191767: Important critical Etherpad release – 1.6.4.

https://github.com/ether/etherpad-lite/commit/9daade0b95bbc5443637977652d3cd0dbc44e112 fixes this but it's not yet in a release. I 've imported it locally and have been giving some testing but I 'll hold on the upgrade for a bit more time

Wed, Apr 18, 7:51 AM · Wikimedia-Etherpad, Operations, Security

Tue, Apr 17

akosiaris committed rDEPLOYCHARTSb39e36b0a5c3: mathoid: Refresh deployment on config changes (authored by akosiaris).
mathoid: Refresh deployment on config changes
Tue, Apr 17, 2:09 PM

Mon, Apr 16

akosiaris moved T170150: Evaluate Grafana's LDAP group options and deprecate grafana-admin if possible from Up next to In progress on the monitoring board.
Mon, Apr 16, 3:24 PM · Patch-For-Review, monitoring, Operations
akosiaris committed rDEPLOYCHARTS0a239a802cd2: Update the helm charts repo index (authored by akosiaris).
Update the helm charts repo index
Mon, Apr 16, 10:41 AM
akosiaris committed rDEPLOYCHARTS9b87f7f5c9a9: Add a NOTES.txt template for mathoid (authored by akosiaris).
Add a NOTES.txt template for mathoid
Mon, Apr 16, 10:41 AM
akosiaris committed rDEPLOYCHARTSb243d2ce2ea4: Add a simple NOTES.txt template to scaffolding (authored by akosiaris).
Add a simple NOTES.txt template to scaffolding
Mon, Apr 16, 10:41 AM
akosiaris added a comment to T191767: Important critical Etherpad release – 1.6.4.

https://github.com/ether/etherpad-lite/commit/9daade0b95bbc5443637977652d3cd0dbc44e112 fixes this but it's not yet in a release. I 've imported it locally and have been giving some testing but I 'll hold on the upgrade for a bit more time

Mon, Apr 16, 10:38 AM · Wikimedia-Etherpad, Operations, Security

Fri, Apr 13

akosiaris committed rDEPLOYCHARTS587a7f022127: Update the helm charts repo index (authored by akosiaris).
Update the helm charts repo index
Fri, Apr 13, 4:21 PM
akosiaris committed rDEPLOYCHARTS114f5e32f022: Add a NOTES.txt template for mathoid (authored by akosiaris).
Add a NOTES.txt template for mathoid
Fri, Apr 13, 4:21 PM
akosiaris committed rDEPLOYCHARTS81c08c19a407: Add a simple NOTES.txt template to scaffolding (authored by akosiaris).
Add a simple NOTES.txt template to scaffolding
Fri, Apr 13, 4:21 PM
akosiaris committed rDEPLOYCHARTSb6625765099d: mathoid: Dump all namespace definitions from manifests (authored by akosiaris).
mathoid: Dump all namespace definitions from manifests
Fri, Apr 13, 4:21 PM
akosiaris committed rDEPLOYCHARTS84476d58a248: Remove all namespace directives (authored by akosiaris).
Remove all namespace directives
Fri, Apr 13, 4:21 PM
akosiaris added a comment to T189524: Add Reading Infrastructure engineers to contacts for mobileapps.

That worked! Thank you! I added a comment to the eqiad one. Not sure how to view or remove it but I made it not persistent so hopefully it should go away automatically after some time.

Fri, Apr 13, 3:16 PM · Patch-For-Review, Services (watching), monitoring, Operations
akosiaris updated the task description for T192102: deprecate and remove --autoload in uwsgi puppet class.
Fri, Apr 13, 12:24 PM · Patch-For-Review, Operations, Puppet
akosiaris added a comment to T189524: Add Reading Infrastructure engineers to contacts for mobileapps.

Same here. Tried with bearND but same result.

Fri, Apr 13, 10:23 AM · Patch-For-Review, Services (watching), monitoring, Operations
akosiaris added a comment to T191821: Host packaged helm charts at https://releases.wikimedia.org/charts.

Oh! That's great. Now I can do the following locally on my minikube instance

Fri, Apr 13, 9:26 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Release Pipeline
akosiaris added a comment to T191767: Important critical Etherpad release – 1.6.4.

I went ahead and pulled 1.6.4 and 1.6.5 but they suffer from https://github.com/ether/etherpad-lite/issues/3378 so given that we are not currently vulnerable I 'll refrain from upgrading.

Fri, Apr 13, 8:49 AM · Wikimedia-Etherpad, Operations, Security

Thu, Apr 12

akosiaris added a comment to T191648: uwsgi::app sorts config keys, but the .ini file behavior depends on order.

Fixing this looks to be as easy as passing $service_settings => '--die-on-term'in openstack::puppet::master::encapi

Indeed, that seems to have fixed this particular bug. Should we be worried about other users of the uwsgi class breaking with the same version update that broke this one?

Thu, Apr 12, 10:06 AM · Patch-For-Review, Operations, Puppet

Wed, Apr 11

akosiaris added a comment to T191648: uwsgi::app sorts config keys, but the .ini file behavior depends on order.

I wonder if the specific ordering issue is the callable and plugins lines?

I thought that too, but the behavior is the same (and correct) if the plugins line is directly before or after the callable line.

Wed, Apr 11, 2:59 PM · Patch-For-Review, Operations, Puppet

Wed, Apr 4

akosiaris closed T186786: Upload new zuul and jenkins-debian-glue packages to apt.wikimedia.org, a subtask of T186381: Exception while launching job: TypeError: 'int' object has no attribute '__getitem__', as Resolved.
Wed, Apr 4, 8:27 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Zuul, Continuous-Integration-Infrastructure
akosiaris closed T186786: Upload new zuul and jenkins-debian-glue packages to apt.wikimedia.org as Resolved.

Done.

Wed, Apr 4, 8:27 AM · Release-Engineering-Team (Kanban), Packaging, Zuul, Continuous-Integration-Infrastructure, Operations
akosiaris closed T186786: Upload new zuul and jenkins-debian-glue packages to apt.wikimedia.org, a subtask of T186494: jenkins-debian-glue should run the lintian version from cowbuilder instead of from host, as Resolved.
Wed, Apr 4, 8:27 AM · Upstream, Release-Engineering-Team (Kanban), Continuous-Integration-Infrastructure, Packaging
akosiaris closed T186786: Upload new zuul and jenkins-debian-glue packages to apt.wikimedia.org, a subtask of T189859: Zuul coverage pipeline is no more processing mwext-phpunit-coverage-patch jobs, as Resolved.
Wed, Apr 4, 8:27 AM · Release-Engineering-Team (Kanban), Upstream, Zuul, Continuous-Integration-Infrastructure
akosiaris added a comment to T188933: install kubectl on integration agents.

I think @akosiaris told me once we should aim at not using kubectl

Wed, Apr 4, 7:59 AM · Release-Engineering-Team (Kanban), Release Pipeline
akosiaris closed T187910: Define a special range in constants.pp for the LVS hosts as Declined.

I am gonna close this as declined. Feel free to reopen though.

Wed, Apr 4, 7:53 AM · Operations
akosiaris added a comment to T187910: Define a special range in constants.pp for the LVS hosts.

For what is worth, I don't like the idea of adding anything like that in network::constants. I don't even like the current $special_hosts construct (it has gotten out of hand) and I am the one who started it. ferm rules should not be defined using the macro way, since that is not immediately clear how it is constructed and thus difficult to reason about. The macro is only populated on the hosts, using ERB, it's uppercase and git grepping for it in our repo only reveals the uses, not the definition. Doing the mental jump from that to network::constants is not something we should be forcing ourselves to do. Instead we should be using role specific hiera lookups.

Wed, Apr 4, 7:53 AM · Operations

Mon, Apr 2

akosiaris triaged T191199: Page allocation stalls on scb1001, scb1002 as High priority.
Mon, Apr 2, 2:08 PM · SCB, Services (watching), Operations
akosiaris created T191199: Page allocation stalls on scb1001, scb1002.
Mon, Apr 2, 1:51 PM · SCB, Services (watching), Operations
akosiaris added a comment to T190238: Create "loading" schema for loading external data.

I don't even see a database ct on any of maps100X, maps200X or maps-test200X clusters. I am thinking the database should be created before we even delve into this task ? When is @Gehel coming back ? I 'd rather refer back to him on this one after all, I now feel I am missing crucial context.

pnorman@maps-test2004:~$ psql -h localhost -U tilerator -l
                                  List of databases
   Name    |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------+----------+----------+-------------+-------------+-----------------------
 ct        | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 gis       | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 postgres  | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres
 template1 | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
           |          |          |             |             | postgres=CTc/postgres

I thought I had made it clear it was maps-test2004 which was the only one that needed this, but see that it was just in the IRC messages, which didn't get a responsible over multiple days.

Mon, Apr 2, 11:50 AM · Maps-Sprint

Fri, Mar 30

akosiaris added a comment to T190238: Create "loading" schema for loading external data.

Maybe I can help, but I 'll need a bit more information as to what the problems is. Which tool fails, with what invocation and what is the error?

/srv/deployment/tilerator/deploy/node_modules/@kartotherian/meddo/get-external-data.py -c /srv/deployment/tilerator/deploy/node_modules/@kartotherian/meddo/external-data.yml

Note that PGHOST, PGUSER, etc will need to be set to run that.

It fails with the error psycopg2.ProgrammingError: permission denied for database ct

Fri, Mar 30, 5:54 PM · Maps-Sprint

Thu, Mar 29

akosiaris added a comment to T140539: Update translation-server, citoid.

"Wasted" is a little strong here. It provides a real and immediate current benefit for however long it's running. And, Zotero is a project that is used in many other contexts so worse comes to worse you're helping the open source community at large ;).

Thu, Mar 29, 12:10 PM · Services (watching), Patch-For-Review, Citoid, VisualEditor
akosiaris added a comment to T140539: Update translation-server, citoid.

@Lokal_Profil and I discussed how we can move forward with making new translators and our proposal is:

  1. Write translators for the latest version of Zotero, add them to Zotero's translator repo and sync Wikimedia's translator repo, as usual.
  2. Push patch to Wikimedia's translator repo with code that is compatible with the version of translation-server that Citoid is using.
  3. Future step: once Wikimedia's translation-server is updated, revert patches for older version.
Thu, Mar 29, 8:26 AM · Services (watching), Patch-For-Review, Citoid, VisualEditor

Wed, Mar 28

akosiaris added a comment to T140539: Update translation-server, citoid.

As an FYI, T187194 has been filled in February under the is https://www.mediawiki.org/wiki/Code_stewardship_reviews context. At this point in time it remains unclear if and when an upgrade can/will happen.

Wed, Mar 28, 8:27 PM · Services (watching), Patch-For-Review, Citoid, VisualEditor
akosiaris added a comment to T190238: Create "loading" schema for loading external data.

Maybe I can help, but I 'll need a bit more information as to what the problems is. Which tool fails, with what invocation and what is the error?

Wed, Mar 28, 8:21 PM · Maps-Sprint

Tue, Mar 27

akosiaris closed T184923: Validate whether the (implemented) standardized application environment works as expected as Resolved.
  • Network policy has been validated
  • statsd_prometheus_exporter has been validated and prometheus is scraping each pod and collecting data
  • logging approach has not been validated, turns out we need to upgrade components for this to work and reevaluate the sidecar approach. However logging works just fine for the mathoid service with each pod sending logs directly to logstash in the gelf format as well as logging to stdout, making the logs accessible to kubectl logs.
Tue, Mar 27, 8:07 AM · Patch-For-Review, Prod-Kubernetes, Kubernetes, Operations
akosiaris closed T184923: Validate whether the (implemented) standardized application environment works as expected, a subtask of T184462: Serve one production service via Kubernetes, as Resolved.
Tue, Mar 27, 8:07 AM · Prod-Kubernetes, Kubernetes, Operations
akosiaris updated the task description for T184462: Serve one production service via Kubernetes.
Tue, Mar 27, 8:05 AM · Prod-Kubernetes, Kubernetes, Operations

Mon, Mar 26

akosiaris changed the status of T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O from Open to Stalled.

~2 weeks with no incident yet. That's very encouraging but we 've been in that position again around the new years holidays. Given that easter holidays are approaching I am reluctant to do any changes so I think this should stay in the waiting state for another ~2 weeks.

Mon, Mar 26, 4:43 PM · Operations
akosiaris added a comment to T184924: Utilize the deployment pipeline (stretch).

https://gerrit.wikimedia.org/r/#/c/421935/ for allowing access to staging related clients

Mon, Mar 26, 4:20 PM · Patch-For-Review, Prod-Kubernetes, Kubernetes, Operations
akosiaris closed T161031: Fully document process for building a new version of Kubernetes debs as Resolved.

This is old enough and I 've recently upgraded to 1.7.10 following the documentation at https://wikitech.wikimedia.org/wiki/Tools_Kubernetes#Building_debian_packages. Comment above was using the wrong version of the tar file by the kubernetes project IIRC. I 'll resolve this, feel free to reopen

Mon, Mar 26, 4:18 PM · Toolforge, Prod-Kubernetes, Kubernetes, Cloud-Services, Tools-Kubernetes
akosiaris closed T161031: Fully document process for building a new version of Kubernetes debs, a subtask of T153943: Coordinate Kubernetes efforts between Tool Labs and Production, as Resolved.
Mon, Mar 26, 4:17 PM · Epic, Toolforge, Prod-Kubernetes, Tools-Kubernetes, Cloud-Services, Kubernetes
akosiaris closed T190551: Update Debian package of Blubber as Resolved.

Package built and uploaded to stretch-wikimedia and jessie-wikimedia. Resolving this, feel free to reopen.

Mon, Mar 26, 3:10 PM · Operations, Release Pipeline, Release-Engineering-Team (Watching / External)
akosiaris closed T190586: Package minikube for stretch as Resolved.

I 've just added minikube to thirdparty/ci component for stretch-wikimedia. I 'll resolve this, feel free to reopen

Mon, Mar 26, 2:27 PM · Release Pipeline, Continuous-Integration-Infrastructure
akosiaris closed T190586: Package minikube for stretch, a subtask of T190584: Update integration docker agents to stretch, as Resolved.
Mon, Mar 26, 2:27 PM · Release Pipeline, Continuous-Integration-Infrastructure
TheDJ awarded T184919: Serve at least 50% of Mathoid via kubernetes a 100 token.
Mon, Mar 26, 2:20 PM · User-mobrovac, Services (next), Mathoid, Patch-For-Review, Prod-Kubernetes, Operations, Kubernetes
akosiaris closed T190585: Package docker-ce for stretch as Resolved.

I 've just added docker-ce under the thirdparty/ci component for stretch as well. Resolving, feel free to reopen

Mon, Mar 26, 2:19 PM · Release Pipeline, Continuous-Integration-Infrastructure
akosiaris closed T190585: Package docker-ce for stretch, a subtask of T190584: Update integration docker agents to stretch, as Resolved.
Mon, Mar 26, 2:19 PM · Release Pipeline, Continuous-Integration-Infrastructure
akosiaris closed T184919: Serve at least 50% of Mathoid via kubernetes as Resolved.

This has been achieved successfully and even surpassed the goal by achieving 100%. I 'll happily resolve this

Mon, Mar 26, 2:16 PM · User-mobrovac, Services (next), Mathoid, Patch-For-Review, Prod-Kubernetes, Operations, Kubernetes
akosiaris closed T184919: Serve at least 50% of Mathoid via kubernetes, a subtask of T184462: Serve one production service via Kubernetes, as Resolved.
Mon, Mar 26, 2:16 PM · Prod-Kubernetes, Kubernetes, Operations
akosiaris committed rDEPLOYCHARTSc302a1585349: Update helm repository with version 0.0.2 of mathoid (authored by akosiaris).
Update helm repository with version 0.0.2 of mathoid
Mon, Mar 26, 2:12 PM
akosiaris added a comment to T190364: eqiad 10G ports needs.
  • Backups servers (heze/helium in the current incarnation) will definitely have 10G (we 've already budgeted for it).
  • Ganeti hosts are not so clear. Per grafana eqiad [1] and grafana codfw [2] we still don't need 10G there. codfw's traffic is the actual representative one since the latest large spikes/plateaus in eqiad are probably due to me doing many very heavy IO tests for T181121. Since this is long term planning and T181121 has probably been resolved, we should wait a few weeks and see if that is true. Of course we can only do simple projections and can't really predict the future, so it's difficult to say for sure. My hunch is that for now we don't need 10G and we probably won't need 10G on ganeti hosts for another 1-2 years. After that, I don't know.
  • Kubernetes hosts have just got in production, are handling very minimal traffic and the entire idea of that infrastructure is to scale out, not scale up, so even if we end up running kafka stream processing (or anything for that matter) in kubernetes, to me it seems that 10G will be a waste of money, so I agree on the "pretty sure we won't need 10G".
Mon, Mar 26, 11:12 AM · netops, Operations
akosiaris committed rDEPLOYCHARTS154a5fc4d9c7: Add network policy objects to the helm charts (authored by akosiaris).
Add network policy objects to the helm charts
Mon, Mar 26, 10:33 AM
akosiaris added a comment to T188753: WMCZ want to use its own mail system instead of OTRS queue wm-cz@wikimedia.org.

Well... when I was talking about export I thought about something like eml/mbox file containing the queue like inbox. As it seems it won't be easy, I would like to remind the task's description, quoted below.

If easy enough: Export all existing tickets and send the archive to me so we can import them

But again, if this isn't "click at export button in the admin interface", let's make this resolved as the redirect itself is working (thank you @eross).

Mon, Mar 26, 9:57 AM · Office-IT, Mail, WMF-Legal, OTRS, User-Urbanecm

Sat, Mar 24

akosiaris added a comment to T190589: Packaged helm points to non-existent tiller.

I had a deeper look. The original build process uses versioning.mk to do the work of figuring out the version. We don't use that in the debian package we ship and our package has version v2.8+unreleased. I am not sure we need to go down the road of actually using versioning.mk during our built and if that actually adds value given that we intend to use our own compatible tiller image

Sat, Mar 24, 12:12 AM · Release Pipeline

Fri, Mar 23

akosiaris added a comment to T190589: Packaged helm points to non-existent tiller.

Just pass --tiller-image=docker-registry.discovery.wmnet/tiller:latest to helm init

Fri, Mar 23, 11:03 PM · Release Pipeline
akosiaris committed rDEPLOYCHARTS616f2829ae94: WIP Add network policy objects to the helm charts (authored by akosiaris).
WIP Add network policy objects to the helm charts
Fri, Mar 23, 9:18 AM
akosiaris committed rDEPLOYCHARTS7581008dd266: Annotate namespace with a default deny policy (authored by akosiaris).
Annotate namespace with a default deny policy
Fri, Mar 23, 9:18 AM
akosiaris committed rDEPLOYCHARTS3b7ac23de6e5: Give tiller the right to manage network policies (authored by akosiaris).
Give tiller the right to manage network policies
Fri, Mar 23, 9:18 AM

Mar 22 2018

akosiaris reopened Unknown Object (Task), a subtask of T186808: Non-redundant power supply on helium, as Open.
Mar 22 2018, 2:48 PM · Operations, ops-eqiad
akosiaris added a comment to T189801: setup backup1001.eqiad.wmnet.

Unfortunately wmf4750 will not do after all. After we powered off and unracked helium we figured out the raid card was too big for the space available in the R430. We need either a different server from the spares or a new server :-(

Mar 22 2018, 2:44 PM · Patch-For-Review, Operations, ops-eqiad
akosiaris closed T190110: Increase osm2pgsql import cache as Resolved.

This caught my eye and I merged it just now. With the number of nodes at 4401630061 40GB should be enough indeed to cache old node positions for a full planet import. As a side note, maps boxes have 128GB so we are well below the 75% mark osm devs suggest. maps-test boxes vary between 64GB and 92GB, but those are to be reclaimed and put out of service anyway.

Mar 22 2018, 12:44 PM · Patch-For-Review, Maps-Sprint
akosiaris added a comment to T184923: Validate whether the (implemented) standardized application environment works as expected.

The metrics part has been validated. In fact https://grafana.wikimedia.org/dashboard/db/service-mathoid?orgId=1 currently has graphs being generated using prometheus and showing the same data that is being generated directly from statsd

Mar 22 2018, 11:38 AM · Patch-For-Review, Prod-Kubernetes, Kubernetes, Operations
akosiaris committed rDEPLOYCHARTS0acf7ee98ed0: Switch mathoid to use the local statsd_prometheus (authored by akosiaris).
Switch mathoid to use the local statsd_prometheus
Mar 22 2018, 11:03 AM
akosiaris committed rDEPLOYCHARTSc5437cfa2659: Annotate pods for prometheus statsd scraping (authored by akosiaris).
Annotate pods for prometheus statsd scraping
Mar 22 2018, 11:03 AM
akosiaris committed rDEPLOYCHARTSd017a32459c9: Annotate pods for prometheus statsd scraping (authored by akosiaris).
Annotate pods for prometheus statsd scraping
Mar 22 2018, 10:59 AM

Mar 21 2018

akosiaris committed rDEPLOYCHARTS32d336642875: Expose K8S_NODE_IP to the fluent-bit sidecar (authored by akosiaris).
Expose K8S_NODE_IP to the fluent-bit sidecar
Mar 21 2018, 2:15 PM
akosiaris committed rDEPLOYCHARTS68e779ae11b3: Update mathoid chart to resemble current production (authored by akosiaris).
Update mathoid chart to resemble current production
Mar 21 2018, 2:15 PM
akosiaris closed T189781: Reboot oresrdb as Resolved.

Indeed. Here it is https://wikitech.wikimedia.org/wiki/Incident_documentation/20180314-ORES.

Mar 21 2018, 10:29 AM · Operations, ORES, Scoring-platform-team (Current)

Mar 20 2018

akosiaris closed T184457: Installation method for Minikube on CI for k8s testing as Resolved.

Aside from having to tag the release locally with v0.25.0 so that gbp could generate the source and using buster to build this, everything else worked out fine. Being go it even worked on jessie so I 've uploaded it already to thirdparty/ci. I 'll resolve this, feel free to reopen

Mar 20 2018, 10:04 AM · Release-Engineering-Team (Kanban), Release Pipeline
akosiaris closed T184457: Installation method for Minikube on CI for k8s testing, a subtask of T183165: Verify functionality of the 'production' image in the context of an isolated k8s deployment, as Resolved.
Mar 20 2018, 10:04 AM · Release-Engineering-Team (Kanban), Release Pipeline
akosiaris closed T184220: Build service-checker image for use with helm test as Resolved.

This has been done. Resolving

Mar 20 2018, 9:44 AM · Patch-For-Review, Release-Engineering-Team (Kanban), Release Pipeline
akosiaris closed T184220: Build service-checker image for use with helm test, a subtask of T184219: Method for running e2e/smoke tests on deployments, as Resolved.
Mar 20 2018, 9:44 AM · Release-Engineering-Team (Kanban), Release Pipeline

Mar 19 2018

akosiaris committed rDEPLOYCHARTS2fe6843ab2c0: Update mathoid chart to resemble current production (authored by akosiaris).
Update mathoid chart to resemble current production
Mar 19 2018, 2:20 PM
akosiaris committed rDEPLOYCHARTS24117338d34f: Fix wrongly indented externalIPs field (authored by akosiaris).
Fix wrongly indented externalIPs field
Mar 19 2018, 2:20 PM
akosiaris added a comment to T189655: Switchover m1 master from db1016 to db1063.

Yes, that's fine.

Mar 19 2018, 12:17 PM · Patch-For-Review, Operations, DBA
akosiaris committed rDEPLOYCHARTSe25b2f9312ec: Update mathoid chart to resemble current production (authored by akosiaris).
Update mathoid chart to resemble current production
Mar 19 2018, 11:30 AM

Mar 16 2018

akosiaris added a comment to T189790: Reimage deployment-ores01 as Stretch.

I am guessing this was resolved and I am no longer needed.

Mar 16 2018, 9:10 AM · ORES, Scoring-platform-team

Mar 15 2018

akosiaris added a comment to T188446: [Blocked] Package word2vec binaries.

I see we have git-lfs on ores*.eqiad.wmnet, so we're almost ready to give this a try.

What I'm missing is the scoring/ores/assets repo (request submitted for its creation), and @akosiaris I would like to deploy this repo onto ores*, separately from the current ores deployment so that we can test LFS in isolation.

Mar 15 2018, 7:39 AM · Patch-For-Review, Packaging, Scoring-platform-team (Current)
akosiaris committed rDEPLOYCHARTS4a36ea843539: Pass --tiller-image to helm init (authored by akosiaris).
Pass --tiller-image to helm init
Mar 15 2018, 7:35 AM

Mar 13 2018

akosiaris committed rDEPLOYCHARTSfe1e7a078808: Create the basic structure of a helm chart repo (authored by akosiaris).
Create the basic structure of a helm chart repo
Mar 13 2018, 9:50 PM
akosiaris committed rDEPLOYCHARTS1a4a9c5cc6d4: Add apiVersion attribute to deploy ClusterRole (authored by akosiaris).
Add apiVersion attribute to deploy ClusterRole
Mar 13 2018, 3:40 PM
akosiaris committed rDEPLOYCHARTSb3a7f350f4d1: Add tiller/deploy RBAC clusterroles (authored by akosiaris).
Add tiller/deploy RBAC clusterroles
Mar 13 2018, 3:40 PM
akosiaris added a comment to T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O.

All row_A eqiad VMs have been rebooted with cache=none. We are now again in a waiting period.

Mar 13 2018, 1:37 PM · Operations
akosiaris committed rDEPLOYCHARTSea520c4e2151: Add initialize_namespace.sh (authored by akosiaris).
Add initialize_namespace.sh
Mar 13 2018, 11:08 AM
akosiaris added a comment to T180628: Install git-lfs client (at least on scap targets & masters).

@akosiaris: I think it's needed on masters, at least to enable deployers to issue git-lfs commands. I'm unsure if scap itself does git-lfs commands on the master, I believe it's only on targets.

Mar 13 2018, 10:07 AM · Patch-For-Review, Packaging, Operations, Scap

Mar 12 2018

akosiaris added a comment to T180628: Install git-lfs client (at least on scap targets & masters).

The scap targets that would benefit from this (namely ores* boxes) now have git-lfs installed. @mmodell do we also need this on the scap masters ? I am not fully clear about the workflow that is going to be used here and where the git lfs related files are going to be fetched from.

Mar 12 2018, 4:42 PM · Patch-For-Review, Packaging, Operations, Scap
akosiaris added a comment to T188985: https://meta.wikimedia.org/wiki/Special:Contact/Stewards is being abused by spammers.

No complaints in 6 days, I consider the problem resolved. I 'll keep this open for a few more days so that any problems reported find their way into this and then I 'll resolve the task as well.

Mar 12 2018, 4:21 PM · Stewards-and-global-tools, Operations, OTRS
akosiaris added a comment to T181121: Kernels errors on ganeti1005- ganeti1008 under high I/O.

cache=none tests during the weekend showed no problems. I 'll find a quiet point in time during the day and restart all VMs in cluster with that setting set. Then we are in waiting for a while.

Mar 12 2018, 8:44 AM · Operations