akosiaris (Alexandros Kosiaris)
Senior Site Reliability Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 3 2014, 8:40 AM (211 w, 4 d)
Availability
Available
IRC Nick
akosiaris
LDAP User
Alexandros Kosiaris
MediaWiki User
AKosiaris (WMF) [ Global Accounts ]

Blurb

Recent Activity

Yesterday

akosiaris awarded T152012: Silence or address E_WOULDBLOCK warning a Yellow Medal token.
Mon, Oct 22, 9:38 PM · Patch-For-Review, User-Ladsgroup, Scoring-platform-team (Current), ORES
akosiaris triaged T207693: Evaluate (and potentially implement) upgrade of docker-engine to docker-ce 17+ for production (kubernetes) as Normal priority.
Mon, Oct 22, 8:02 PM · Kubernetes, Operations
akosiaris renamed T207693: Evaluate (and potentially implement) upgrade of docker-engine to docker-ce 17+ for production (kubernetes) from Evaluate (and potentially implement) upgrade of docker-engine to docker-engine 17+ to Evaluate (and potentially implement) upgrade of docker-engine to docker-ce 17+ for production (kubernetes).
Mon, Oct 22, 8:02 PM · Kubernetes, Operations
akosiaris created T207693: Evaluate (and potentially implement) upgrade of docker-engine to docker-ce 17+ for production (kubernetes).
Mon, Oct 22, 7:31 PM · Kubernetes, Operations
akosiaris closed T153416: docker-engine pulled into our repositories only keeps the latest version as Resolved.

Per T158583, our reprepro now supports multiple components. docker-engine is now moved to thirdparty/k8s for production. This allows having multiple versions of docker-engine around (the package is now now as docker-ce as pointed out by T153416#3076211). I think this partially solves the issues discussed above so resolving

Mon, Oct 22, 7:12 PM · Kubernetes, Operations, Cloud-Services

Thu, Oct 18

akosiaris added a comment to T207263: Scap not restarting Proton.

oh dammit, I 've never killed that commit. Sorry about that.

Thu, Oct 18, 9:04 PM · Proton, Core Platform Team Backlog (Watching / External), Services (watching), Scap

Tue, Oct 16

akosiaris updated the task description for T207200: Revisit the logging work done on Q1 2017-2018 for the standard pod setup.
Tue, Oct 16, 6:32 PM · Core Platform Team Backlog (Watching / External), Services (watching), Release Pipeline, Operations, Release-Engineering-Team
akosiaris triaged T207200: Revisit the logging work done on Q1 2017-2018 for the standard pod setup as Normal priority.
Tue, Oct 16, 6:27 PM · Core Platform Team Backlog (Watching / External), Services (watching), Release Pipeline, Operations, Release-Engineering-Team
akosiaris closed T207091: Parsoid no longer active-active as Resolved.

This is now fixed per https://grafana.wikimedia.org/dashboard/db/prometheus-cluster-breakdown?from=now-15m&to=now&cluster=parsoid&orgId=1&var-datasource=codfw%20prometheus%2Fops&var-cluster=parsoid&var-instance=All&panelId=87&fullscreen as we are back to where we were before the switchover.

Tue, Oct 16, 4:38 PM · Datacenter-Switchover-2018, Operations, Parsoid
akosiaris closed T206766: Update Debian package of Blubber (0.6.0-1) as Resolved.

Package built and uploaded to both jessie-wikimedia and stretch-wikimedia

Tue, Oct 16, 9:54 AM · Release Pipeline (Blubber), Operations, Release-Engineering-Team (Watching / External)
akosiaris added a comment to T207091: Parsoid no longer active-active.

Yes this indeed has happened and it's true for all services. We meant to return to the normal state on Monday (yesterday) but we didn't for unrelated to this reasons. We will be doing so today.

Tue, Oct 16, 6:25 AM · Datacenter-Switchover-2018, Operations, Parsoid

Mon, Oct 15

akosiaris awarded T206841: Evaluate the consequences of the parsercache being empty post-switchover a Love token.
Mon, Oct 15, 3:35 PM · User-Joe, Datacenter-Switchover-2018, DBA, Operations
akosiaris added a comment to T206841: Evaluate the consequences of the parsercache being empty post-switchover.

All this seems pretty correct to me and does explain what we 've experienced pretty well

Mon, Oct 15, 3:35 PM · User-Joe, Datacenter-Switchover-2018, DBA, Operations

Sat, Oct 13

akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

I can reproduce it with the version downloaded from wikipedia:

$ wget "https://es.wikipedia.org/w/index.php?title=Usuario:Danielalfredo/Taller&oldid=111186880&action=raw" -O 111186880.txt
 

Plus the code of P7674 which is only a tiny variation from the original P7672.

Maybe when moving the wikitext from the wikipedia revision to a local file something 'fixed' it?

$ md5sum 111186880.txt 
e306383a70dae20a0d6451a619ee5af2  111186880.txt
Sat, Oct 13, 7:50 AM · ORES, Scoring-platform-team (Current)

Fri, Oct 12

akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

The dumped memory when converted back to utf8 has the exact same MD5 as https://es.wikipedia.org/w/index.php?title=Usuario:Danielalfredo/Taller&oldid=111186880

Fri, Oct 12, 1:29 PM · ORES, Scoring-platform-team (Current)
akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

I think I 've managed to reproduce the problem. After fetching and trying the last 500 revisions from https://es.wikipedia.org/wiki/Usuario:Danielalfredo/Taller and failing to reproduce it, I returned to the gdb process. After a very long fight understanding Tokenizer and with how the data is stored internally in Tokenizer values I 've managed to obtain the following

Fri, Oct 12, 1:21 PM · ORES, Scoring-platform-team (Current)
akosiaris updated the task description for T206841: Evaluate the consequences of the parsercache being empty post-switchover.
Fri, Oct 12, 9:47 AM · User-Joe, Datacenter-Switchover-2018, DBA, Operations

Thu, Oct 11

akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

I 'm adding some extra gdb info in P7669

Thu, Oct 11, 8:48 PM · ORES, Scoring-platform-team (Current)
akosiaris updated the title for P7669 some gdb frames for T206654 from untitled to some gdb frames for T206654.
Thu, Oct 11, 8:48 PM
akosiaris created P7669 some gdb frames for T206654.
Thu, Oct 11, 8:46 PM
akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

I've demonstrated that ORES timeout mechanism works for an old mwparserfromhell recursion bug in version 0.4.4 based on this report: https://github.com/earwig/mwparserfromhell/issues/190

See my work here: https://gist.github.com/halfak/281a692873d538e77c6096f5042e8be2

Thu, Oct 11, 6:17 PM · ORES, Scoring-platform-team (Current)
akosiaris added a comment to T206740: parsercache used disk space increase.

Forgetting the codfw -> eqiad replication was the most likely cause of overload on the application servers (and on External storage hosts).

Thu, Oct 11, 6:15 PM · MediaWiki-Cache, Performance-Team (Radar), User-Banyek, Performance-Team-notice, Datacenter-Switchover-2018, Operations, DBA
akosiaris added a comment to T201343: rack/setup/install mwmaint1002.eqiad.wmnet.

@Dzahn anything left here ?

Thu, Oct 11, 5:13 PM · Patch-For-Review, Datacenter-Switchover-2018, ops-eqiad, Operations
akosiaris closed T199073: Perform a datacenter switchover (2018-19 Q1) as Resolved.

Successfully switched (with some aftermath and actionables but successfully nevertheless) to codfw and back per the subtasks, I am resolving this.

Thu, Oct 11, 4:11 PM · Patch-For-Review, Operations, Goal
akosiaris updated the task description for T199073: Perform a datacenter switchover (2018-19 Q1).
Thu, Oct 11, 4:10 PM · Patch-For-Review, Operations, Goal
akosiaris closed T203777: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from eqiad as Resolved.

Mediawiki and traffic were successfully switched yesterday, swift and services today. I 'll close this as resolved

Thu, Oct 11, 3:38 PM · Patch-For-Review, Operations, Goal
akosiaris closed T203777: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from eqiad, a subtask of T199073: Perform a datacenter switchover (2018-19 Q1), as Resolved.
Thu, Oct 11, 3:38 PM · Patch-For-Review, Operations, Goal
akosiaris awarded P7661 pool_services.sh a Like token.
Thu, Oct 11, 11:15 AM · Datacenter-Switchover-2018, Operations
akosiaris committed rDEPLOYCHARTS70ec6627350d: First draft of a zotero helm chart (authored by akosiaris).
First draft of a zotero helm chart
Thu, Oct 11, 10:46 AM
akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

I found this config on production, so in theory our workers should be restarting themselves:

CELERYD_MAX_TASKS_PER_CHILD: 100
Thu, Oct 11, 10:30 AM · ORES, Scoring-platform-team (Current)
akosiaris updated the task description for T206740: parsercache used disk space increase.
Thu, Oct 11, 9:34 AM · MediaWiki-Cache, Performance-Team (Radar), User-Banyek, Performance-Team-notice, Datacenter-Switchover-2018, Operations, DBA

Wed, Oct 10

akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

Unfortunately due to the venv I could not get (yet) the niceties of py-bt and py-list working but there's already an indication that this is busy looping parsing text. The ltrace output reinforces that idea

Wed, Oct 10, 9:09 PM · ORES, Scoring-platform-team (Current)
akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.

And gdb bt output

Wed, Oct 10, 9:08 PM · ORES, Scoring-platform-team (Current)
akosiaris added a comment to T206654: ORES workers using dramatically higher CPU, increasing linearly with time.
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd8867000
munmap(0x7fbfd8867000, 262144)          = 0
munmap(0x7fbfd88e7000, 262144)          = 0
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
munmap(0x7fbfd89a7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd89a7000
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
munmap(0x7fbfd88e7000, 262144)          = 0
mmap(NULL, 262144, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7fbfd88e7000
Wed, Oct 10, 9:07 PM · ORES, Scoring-platform-team (Current)

Mon, Oct 8

akosiaris closed T205364: helium (bacula) - Device not healthy -SMART- as Resolved.

And now we got

Mon, Oct 8, 9:05 AM · ops-eqiad, Operations

Thu, Oct 4

akosiaris added a subtask for T196478: rack/setup/install backup1001: Unknown Object (Task).
Thu, Oct 4, 3:26 PM · Patch-For-Review, Operations, ops-eqiad
akosiaris changed the status of T196478: rack/setup/install backup1001 from Open to Stalled.
Thu, Oct 4, 3:26 PM · Patch-For-Review, Operations, ops-eqiad
akosiaris added a comment to T205364: helium (bacula) - Device not healthy -SMART-.

Maybe it makes sense to prioritize T196478 instead?

Thu, Oct 4, 3:25 PM · ops-eqiad, Operations
akosiaris committed rDEPLOYCHARTS0c2d1673a47d: mathoid: Bump chart version to 0.0.12 (authored by akosiaris).
mathoid: Bump chart version to 0.0.12
Thu, Oct 4, 2:49 PM
akosiaris committed rDEPLOYCHARTS25c8767b8139: mathoid: Bump num_workers to 1 (authored by akosiaris).
mathoid: Bump num_workers to 1
Thu, Oct 4, 2:49 PM
akosiaris committed rDEPLOYCHARTSd77e63d28727: scaffold: Add some sample requests (authored by akosiaris).
scaffold: Add some sample requests
Thu, Oct 4, 2:49 PM
akosiaris committed rDEPLOYCHARTS820957f71866: Set the scaffolding's livenessProbe to tcpSocket (authored by akosiaris).
Set the scaffolding's livenessProbe to tcpSocket
Thu, Oct 4, 2:42 PM
akosiaris committed rDEPLOYCHARTS6728220e7985: mathoid: Switch liveness probe into tcpSocket (authored by akosiaris).
mathoid: Switch liveness probe into tcpSocket
Thu, Oct 4, 2:42 PM
akosiaris committed rDEPLOYCHARTSa500f265521b: mathoid: Add nominal resource requests (authored by akosiaris).
mathoid: Add nominal resource requests
Thu, Oct 4, 2:42 PM
akosiaris reopened T205364: helium (bacula) - Device not healthy -SMART- as "Open".

I followed http://erikimh.com/megacli-cheatsheet/ to do so

Thu, Oct 4, 1:40 PM · ops-eqiad, Operations
akosiaris committed rDEPLOYCHARTS172b28d7d9aa: Set the scaffolding's livenessProbe to tcpSocket (authored by akosiaris).
Set the scaffolding's livenessProbe to tcpSocket
Thu, Oct 4, 10:10 AM
akosiaris committed rDEPLOYCHARTS6cb5c79bf0f8: mathoid: Switch liveness probe into tcpSocket (authored by akosiaris).
mathoid: Switch liveness probe into tcpSocket
Thu, Oct 4, 10:10 AM
akosiaris committed rDEPLOYCHARTS4de97d2a988c: mathoid: Add nomial resource requests (authored by akosiaris).
mathoid: Add nomial resource requests
Thu, Oct 4, 6:48 AM
akosiaris committed rDEPLOYCHARTS65fbfd34a851: mathoid: Add nominal resource requests (authored by akosiaris).
mathoid: Add nominal resource requests
Thu, Oct 4, 6:48 AM

Wed, Oct 3

akosiaris moved T205559: Scap canary warning monitoring URL is hard-coded with eqiad servers, so isn't useful when codfw is primary from Backlog to Done on the Datacenter-Switchover-2018 board.
Wed, Oct 3, 2:48 PM · Datacenter-Switchover-2018, Scap
akosiaris committed rDEPLOYCHARTS23b920aa65b9: Specify policyTypes in Network Policies (authored by akosiaris).
Specify policyTypes in Network Policies
Wed, Oct 3, 12:54 PM

Tue, Oct 2

akosiaris added a comment to T205256: ORES uwsgi logs in logstash are useless.

The reason that everything is INFO is that our uwsgi encoder is designed to do so: https://github.com/wikimedia/puppet/blob/production/modules/service/manifests/uwsgi.pp#L166

Tue, Oct 2, 11:33 AM · Patch-For-Review, Scoring-platform-team (Current), User-Ladsgroup, ORES
akosiaris added a comment to T195710: OTRS interface loads "web bugs" in emails without warning.

This bug was publicly divulged in https://community.otrs.com/security-advisory-2018-05-security-update-for-otrs-framework/.

Can the present solved task be public ?

Tue, Oct 2, 8:48 AM · Security, Upstream, OTRS
akosiaris removed a project from T195710: OTRS interface loads "web bugs" in emails without warning: Security.
Tue, Oct 2, 8:48 AM · Security, Upstream, OTRS
akosiaris added a comment to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.".

Yup, https://gerrit.wikimedia.org/r/463733 is meant to fix it

Tue, Oct 2, 8:42 AM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services

Mon, Oct 1

akosiaris created P7606 etherpad stacktrace.
Mon, Oct 1, 12:21 PM

Fri, Sep 28

akosiaris added a comment to T204907: Scap is checking canary servers in dormant instead of active-dc .

As far as solving the logstash URL I think the best approach would be to just have the entire list in a dashboard. I 've updated the "scap canary" dashboard. However for sharing purposes this generated a new URL https://logstash.wikimedia.org/goto/db09a36be5ed3e81155041f7d46ad040. I 'll update scap.cfg with it

Fri, Sep 28, 11:09 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Wikimedia-Incident, Operations, Datacenter-Switchover-2018, Scap
akosiaris merged T205559: Scap canary warning monitoring URL is hard-coded with eqiad servers, so isn't useful when codfw is primary into T204907: Scap is checking canary servers in dormant instead of active-dc .
Fri, Sep 28, 11:02 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Wikimedia-Incident, Operations, Datacenter-Switchover-2018, Scap
akosiaris merged task T205559: Scap canary warning monitoring URL is hard-coded with eqiad servers, so isn't useful when codfw is primary into T204907: Scap is checking canary servers in dormant instead of active-dc .
Fri, Sep 28, 11:02 AM · Datacenter-Switchover-2018, Scap
akosiaris claimed T204907: Scap is checking canary servers in dormant instead of active-dc .
Fri, Sep 28, 9:45 AM · Patch-For-Review, Release-Engineering-Team (Watching / External), Wikimedia-Incident, Operations, Datacenter-Switchover-2018, Scap

Thu, Sep 27

akosiaris added a comment to T201611: Deploy translation-server-v2.

https://gerrit.wikimedia.org/g/mediawiki/services/zotero/+/refs/heads/master would be the repository @Mvolz I just created it and gave it the same permissions as the zotero/translators repo in gerrit. Let me know if anything else is required.

Thu, Sep 27, 3:25 PM · Patch-For-Review, Services, User-mobrovac, Service-deployment-requests, VisualEditor (Current work), Citoid, Operations

Wed, Sep 26

akosiaris closed T195710: OTRS interface loads "web bugs" in emails without warning as Resolved.

Nice finding!

Wed, Sep 26, 1:07 PM · Security, Upstream, OTRS
akosiaris closed T205540: Upgrade to OTRS version 5.0.30 as Resolved.

Upgrade completed successfully

Wed, Sep 26, 1:06 PM · OTRS, Operations
akosiaris added a comment to T205540: Upgrade to OTRS version 5.0.30.

https://community.otrs.com/security-advisory-2018-05-security-update-for-otrs-framework/ is also relevant and get's fixed by 5.0.30 as well.

Wed, Sep 26, 1:06 PM · OTRS, Operations
akosiaris created T205540: Upgrade to OTRS version 5.0.30.
Wed, Sep 26, 1:04 PM · OTRS, Operations

Tue, Sep 25

akosiaris closed T189801: setup backup1001.eqiad.wmnet as Invalid.

This was impossible to happen, a new box was procured in T196478

Tue, Sep 25, 7:58 AM · Patch-For-Review, Operations, ops-eqiad
akosiaris closed T189801: setup backup1001.eqiad.wmnet, a subtask of T201165: Review Bacula home backups set for stat100[56], as Invalid.
Tue, Sep 25, 7:58 AM · Patch-For-Review, Analytics

Mon, Sep 24

akosiaris added a comment to T201611: Deploy translation-server-v2.

@Mvolz, SRE has a question about this migration. Assuming this gets deployed successfully next quarter, this will allow us to migrate off the current zotero infrastructure and thus remove it from WMF, right ? Does this sound plausible ?

Mon, Sep 24, 4:45 PM · Patch-For-Review, Services, User-mobrovac, Service-deployment-requests, VisualEditor (Current work), Citoid, Operations
akosiaris added a comment to T201611: Deploy translation-server-v2.

@thcipriani, @dduvall, nodejs 10 image built and uploaded. It's the "slim" variant, I was wondering if it is enough or if we also require the "devel" variant

We'll need npm to install the test dependencies and run the tests in the pipeline, so we'll need a -devel variant for that.

Mon, Sep 24, 4:05 PM · Patch-For-Review, Services, User-mobrovac, Service-deployment-requests, VisualEditor (Current work), Citoid, Operations

Sep 21 2018

akosiaris updated subscribers of T201611: Deploy translation-server-v2.

@thcipriani, @dduvall, nodejs 10 image built and uploaded. It's the "slim" variant, I was wondering if it is enough or if we also require the "devel" variant

Sep 21 2018, 3:31 PM · Patch-For-Review, Services, User-mobrovac, Service-deployment-requests, VisualEditor (Current work), Citoid, Operations

Sep 17 2018

akosiaris moved T204127: Reclone db2054 and db2068 from Backlog to Done on the Datacenter-Switchover-2018 board.
Sep 17 2018, 4:06 PM · Patch-For-Review, Datacenter-Switchover-2018, DBA
akosiaris moved T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500." from Backlog to Done on the Datacenter-Switchover-2018 board.
Sep 17 2018, 4:06 PM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services
akosiaris moved T204163: wtp2020.codfw.wmnet not pooled from Backlog to Done on the Datacenter-Switchover-2018 board.
Sep 17 2018, 4:05 PM · Datacenter-Switchover-2018, Parsoid
akosiaris added a comment to T204421: Phabricator is slow.

Can't reproduce this. https://grafana.wikimedia.org/dashboard/db/phabricator?orgId=1&from=1536592054293&to=1537196854293 is pointing out that apache was restarted today so this is probably why. If this is indeed T182832, we should probably merged into it.

Sep 17 2018, 3:08 PM · Operations, Phabricator

Sep 13 2018

akosiaris closed T203121: Update Debian package of Blubber (0.5.0-1) as Resolved.

Packages built and uploaded to both stretch-wikimedia and jessie-wikimedia. Resolving, feel free to reopen.

Sep 13 2018, 3:03 PM · Release Pipeline, Operations, Release-Engineering-Team (Watching / External)
akosiaris closed T202963: eqiad (1) - VM request for Piwik/Matomo as Resolved.

@elukey VM is up and running. No role assigned in puppet so you probably want to handle that. Resolving this.

Sep 13 2018, 2:55 PM · Patch-For-Review, Operations, vm-requests, Analytics
akosiaris closed T202963: eqiad (1) - VM request for Piwik/Matomo, a subtask of T202962: Upgrade bohrium (piwik/matomo) to Debian Stretch, as Resolved.
Sep 13 2018, 2:55 PM · Patch-For-Review, Analytics-Kanban, Analytics

Sep 12 2018

akosiaris added a comment to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.".

VE is functional on wikitech once more.

Sep 12 2018, 6:50 PM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services
Marostegui awarded T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw a Party Time token.
Sep 12 2018, 4:50 PM · Patch-For-Review, Operations, Goal
akosiaris closed T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw as Resolved.

The switchover has happened successfully, I am gonna happily resolve this. For any issues that might have crept up because of the switchover please tag them with Datacenter-Switchover-2018

Sep 12 2018, 4:49 PM · Patch-For-Review, Operations, Goal
akosiaris closed T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw, a subtask of T189107: DB meta task for next DC failover issues, as Resolved.
Sep 12 2018, 4:49 PM · Patch-For-Review, Epic, Operations, DBA
akosiaris closed T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw, a subtask of T199073: Perform a datacenter switchover (2018-19 Q1), as Resolved.
Sep 12 2018, 4:49 PM · Patch-For-Review, Operations, Goal
akosiaris added a watcher for Datacenter-Switchover-2018: akosiaris.
Sep 12 2018, 4:27 PM
akosiaris added a comment to T196886: Replace wtp1043's sda.

https://icinga.wikimedia.org/cgi-bin/icinga/extinfo.cgi?type=2&host=wtp1043&service=MD+RAID still complains btw.

Sep 12 2018, 4:25 PM · Parsing-Team, DC-Ops, ops-eqiad, Operations
akosiaris added a comment to T203087: Decommission Ganeti vm meitnerium.wikimedia.org (old Archiva host).

The steps listed in the description look correct and sufficient to me. There is one thing to add and it would be the removal from DebMonitor per https://wikitech.wikimedia.org/wiki/Server_Lifecycle

Sep 12 2018, 11:31 AM · hardware-requests, Operations, Analytics
akosiaris updated the task description for T203087: Decommission Ganeti vm meitnerium.wikimedia.org (old Archiva host).
Sep 12 2018, 11:25 AM · hardware-requests, Operations, Analytics

Sep 11 2018

akosiaris added a project to T163438: VisualEditor broken on wikitech when codfw is primary: "Error loading data from server: apierror-visualeditor-docserver-http: HTTP 500.": Datacenter-Switchover-2018.
Sep 11 2018, 9:17 PM · Operations, Patch-For-Review, Datacenter-Switchover-2018, Parsing-Team, codfw-rollout, Cloud-Services
akosiaris added a comment to T204083: wikibase_shared/<current_train_version>-wikidatawiki-hhvm:CacheAwarePropertyInfoStore memcached key not well distributed, causing excessive traffic.

https://grafana.wikimedia.org/dashboard/db/t204083?orgId=1 shows the excessive traffic moving around the various memcached hosts for the last 1 year.

Sep 11 2018, 8:31 PM · wikidata-tech-focus, Performance-Team, Operations, wikiba.se, Wikidata
akosiaris triaged T204083: wikibase_shared/<current_train_version>-wikidatawiki-hhvm:CacheAwarePropertyInfoStore memcached key not well distributed, causing excessive traffic as High priority.
Sep 11 2018, 8:17 PM · wikidata-tech-focus, Performance-Team, Operations, wikiba.se, Wikidata
akosiaris created T204083: wikibase_shared/<current_train_version>-wikidatawiki-hhvm:CacheAwarePropertyInfoStore memcached key not well distributed, causing excessive traffic.
Sep 11 2018, 8:16 PM · wikidata-tech-focus, Performance-Team, Operations, wikiba.se, Wikidata
akosiaris added a comment to T203121: Update Debian package of Blubber (0.5.0-1).

Poking this for ETA.

This one should unblock us on graphoid as well as a add a builder to help support generic jobs in CI via blubber – we're eager to get it out.

Sep 11 2018, 6:20 PM · Release Pipeline, Operations, Release-Engineering-Team (Watching / External)
akosiaris added a comment to T204033: Request creation of k8splay VPS project.

Have you talked to @Joe and @akosiaris about using the existing kubernetes-testing project for this?

Sep 11 2018, 4:00 PM · cloud-services-team (Kanban), User-jijiki, Cloud-VPS (Project-requests)

Sep 10 2018

akosiaris renamed T203963: Convert makevm to spicerack cookbook from Convert makevm το spicerack cookbook to Convert makevm to spicerack cookbook.
Sep 10 2018, 3:54 PM · Operations-Software-Development, User-jijiki, User-Joe, Operations
akosiaris created T203964: Create a spicerack cookbook to empty a ganeti node from VMs.
Sep 10 2018, 2:49 PM · Operations-Software-Development, User-jijiki, User-Joe, Operations
akosiaris created T203963: Convert makevm to spicerack cookbook.
Sep 10 2018, 2:43 PM · Operations-Software-Development, User-jijiki, User-Joe, Operations
akosiaris closed T199124: Remove all usages of $::mw_primary on puppet as Resolved.

$::mw_primary is removed from puppet now. Resolving this.

Sep 10 2018, 9:48 AM · Patch-For-Review, Puppet, DBA, Operations
akosiaris closed T199124: Remove all usages of $::mw_primary on puppet, a subtask of T199073: Perform a datacenter switchover (2018-19 Q1), as Resolved.
Sep 10 2018, 9:48 AM · Patch-For-Review, Operations, Goal

Sep 7 2018

akosiaris added a subtask for T189107: DB meta task for next DC failover issues: T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw.
Sep 7 2018, 10:29 AM · Patch-For-Review, Epic, Operations, DBA
akosiaris added a parent task for T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw: T189107: DB meta task for next DC failover issues.
Sep 7 2018, 10:29 AM · Patch-For-Review, Operations, Goal
akosiaris removed a parent task for T189107: DB meta task for next DC failover issues: T203776: Successfully switch backend traffic (MediaWiki, Swift, RESTBase, Parsoid and services) to be served from codfw.
Sep 7 2018, 10:29 AM · Patch-For-Review, Epic, Operations, DBA