elukey (Luca Toscano)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (153 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Today

elukey updated subscribers of T211883: Move oxygen to weblog1001.

@herron decided to proceed to unblock the oxygen's decom process, from now on we can decide how to proceed with logstash/webrequest-503 (it will likely take a bit of time so better to nuke oxygen in the meantime :). Hope that it is ok!

Fri, Dec 14, 8:58 AM · Patch-For-Review, User-Elukey, Operations
elukey added a comment to T211883: Move oxygen to weblog1001.

Current status:

Fri, Dec 14, 8:57 AM · Patch-For-Review, User-Elukey, Operations

Yesterday

elukey added a comment to T207760: setup/install weblog1001/WMF4750 as oxygen replacement.

Created https://phabricator.wikimedia.org/T211883 :)

Thu, Dec 13, 2:14 PM · Operations, Analytics
elukey triaged T211883: Move oxygen to weblog1001 as Normal priority.
Thu, Dec 13, 2:14 PM · Patch-For-Review, User-Elukey, Operations
elukey added a comment to T211606: As a user of Superset I would like it to be up-to-date so I'm not blocked by bugs that have already been fixed.

@mpopov I added some comments to T211706, let me know what you think about the plan :)

Thu, Dec 13, 12:59 PM · Analytics, Product-Analytics
elukey updated the task description for T211706: Superset Updates .
Thu, Dec 13, 12:57 PM · Analytics-Kanban, Product-Analytics
elukey triaged T211706: Superset Updates as Normal priority.
Thu, Dec 13, 12:57 PM · Analytics-Kanban, Product-Analytics
elukey added a comment to T210687: Bug: can't make a YoY time series chart in Superset.

Me and @fdans deployed 0.28.1 this morning, and we had to apply a hot fix for an outstanding upstream bug (see T211605#4820128 for more juicy info). Please check if everything is ok, we did a quick check and didn't notice anything significant (except what referenced before of course :)).

Thu, Dec 13, 12:48 PM · Analytics-Kanban, Product-Analytics, Analytics
elukey added a comment to T211605: Upgrade Superset to 0.28.1.

After deploying the Chars panel was broken, fixed manually by https://github.com/apache/incubator-superset/issues/6347#issuecomment-442178847

Thu, Dec 13, 12:40 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T211605: Upgrade Superset to 0.28.1.

In prod

Thu, Dec 13, 12:39 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T211605: Upgrade Superset to 0.28.1.

While testing the superset db upgrade command I got:

Thu, Dec 13, 11:31 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T179192: Check analytics1037 power supply status.

This server is going to be decommed very soon (OOW), I've acked the alarm a long time ago to avoid it spamming us. Good to close in my opinion, +1

Thu, Dec 13, 7:10 AM · ops-eqiad, Operations, User-Elukey, Analytics

Wed, Dec 12

elukey added a comment to T211605: Upgrade Superset to 0.28.1.

The first breaking change that I can see (use of f-strings) happened in commit https://github.com/apache/incubator-superset/commit/cc3a625a4bb6b0e581b30f3112315ff5a8ab6807 that should be in the upcoming release, not in 0.28.1, so in theory reverting https://github.com/lyft/incubator-superset/commit/174ee13b512f8aaa311fe0980276ac970930f4e6 and building with Python 3.5 should be enough for this upgrade.

Wed, Dec 12, 9:28 AM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Dec 11

elukey updated subscribers of T211605: Upgrade Superset to 0.28.1.

@MoritzMuehlenhoff I think that you have a good first candidate for buster testing :D

Tue, Dec 11, 4:14 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T211606: As a user of Superset I would like it to be up-to-date so I'm not blocked by bugs that have already been fixed.

@mpopov please also keep in mind that things like T211605#4814020 could happen with a project that is still developing fast and does not care much about breaking existing users, so upgrades might not be easy :D

Tue, Dec 11, 3:52 PM · Analytics, Product-Analytics
elukey added a comment to T211605: Upgrade Superset to 0.28.1.

Very nice issue just found: https://github.com/apache/incubator-superset/pull/5985

Tue, Dec 11, 3:37 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey updated the task description for T205846: Move users from stat1005 to stat1007.
Tue, Dec 11, 2:01 PM · Analytics-Kanban, Patch-For-Review, Analytics
elukey updated the task description for T205846: Move users from stat1005 to stat1007.
Tue, Dec 11, 2:00 PM · Analytics-Kanban, Patch-For-Review, Analytics
elukey added a comment to T148843: GPU upgrade for stats machine.

Very good news, finally stat1005 is ready for experiment with GPU drivers etc.. I am completely ignorant about the subject so if anybody has time/patience please come forward :)

Tue, Dec 11, 2:00 PM · User-Elukey, Operations, Analytics, Research-management
elukey added a comment to T211606: As a user of Superset I would like it to be up-to-date so I'm not blocked by bugs that have already been fixed.

We discussed this during the Analytics standup and we have a proposal: we could start with creating a tracking task for Superset/Turnilo upgrade schedules, that everybody can bookmark easily, and then start with one update every quarter (if upstream released a new version of course). If this turns up to be not enough, the same tracking task will also be used to request new versions for specific reasons (like solving a bug etc..). How does it sound?

Tue, Dec 11, 1:55 PM · Analytics, Product-Analytics
elukey added a comment to T203669: Return to real time banner impressions in Druid.

@AndyRussG ping :)

Tue, Dec 11, 1:12 PM · Analytics-Kanban, User-Elukey, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

The timeouts/tkos happen only on the API appservers, and I have noticed the following POST when checking httpd's access logs:

https://meta.wikimedia.org/w/index.php?title=Special:Translate&group=agg-Affiliates&language=th&filter=%21translated&action=translate

Could this be the trigger of cache refreshes?

It is unclear to me how that request could be POST in legitimate use. It should always be GET if I am not mistaken. It's not an API request reigher.

Tue, Dec 11, 11:16 AM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

I checked again tcpdump traffic and the "new" peaks of mc1022's usage are due to CAS commands, as it is visible in https://grafana.wikimedia.org/d/000000614/memcache-elukey?orgId=1&panelId=10&fullscreen

Tue, Dec 11, 10:11 AM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T172410: Phase out and replace analytics-store (multisource).

To follow up what I wrote (after a chat with the data persistence team):

  • the proposal in T210478#4794536 would move sX sections (so the database groupings listed in s1.dblist, s2.dblist etc..) to their own mysql instance on an assigned dbstore node. For example, all wikis in S5 will be available (i.e. replicated) to a mysql instance on dbstore1003 (with an assigned port that we don't know yet). So joins between schemas belonging to different sX sections will not be possible anymore (we already knew this).
  • the staging database will likely be assigned to a separate mysql instance, so people will be able to keep using its data. It will still be possible to create tables etc.., but importing data from various wiki databases will need some extra work (dump the data, import it, etc..).

    Would the above points be ok for everybody? Any special need or use case that is not taken into consideration?

Unfortunately, this would be a big problem for me and likely others in the Product Analytics team. I discussed some of the reasons already in T172410#4005226 and T172410#4272291, although that was more focused on what I'd miss if we deprecated the wiki replicas altogether.

The Data Lake still lacks a lot of data we need frequently, like edit tags (T161149), user email addresses (needed to prepare lists for surveys), user preferences (where, for example, the Growth team is storing users' answers to a [new onboarding questionnaire](https://www.mediawiki.org/wiki/Growth/Personalized_first_day)), and relatively real-time edit data. For most of these, we're not just interested in one wiki, so we have to query all the wiki databases on the replicas and aggregate the results. This is common enough that many of us researchers have written our own scripts to do this (e.g. @halfak's [`multiquery`](https://github.com/halfak/multiquery) and my [`wmfdata.mariadb.multirun`](https://github.com/neilpquinn/wmfdata/blob/master/wmfdata/mariadb.py#L118)).

For analysis that only requires retrieving an aggregate value from each database, with enough effort we could modify our scripts to do this across multiple databases. But for analysis where we build large intermediate tables using insert...select queries (like my editor month table), we could run into even more difficult trying to dump and then reload multiple large datasets.

So splitting up these databases will cause us a major headache by removing a stable older tool when we're still trying to rebuild our process around and secure necessary improvements to newer, often still experimental tools like mediawiki_history and Superset.

I understand this is a reaction to very real performance issues, but could we please keep the current setup until we have a full replacement?

Tue, Dec 11, 9:52 AM · Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research

Mon, Dec 10

elukey added a comment to T172410: Phase out and replace analytics-store (multisource).

@bmansurov:

  • log is not anymore on dbstore1002/analytics-store, but you can find it in analytics-slave (db1108)
  • centralauth should be s7
  • wikishared no idea (@Banyek can you help?)
Mon, Dec 10, 6:54 PM · Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey renamed T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions from Mcrouter periodically reports soft TKOs for mc[1,2]035 leading to MW Memcached exceptions to Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.
Mon, Dec 10, 3:45 PM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey removed projects from T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions: MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Patch-For-Review, Gadgets.
Mon, Dec 10, 3:45 PM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T211250: Create a mediawiki::cronjob define.

We (analytics) have been trying to move away from crons in favor of systemd timers, adding some automation in profile::analytics::systemd_timer. It shouldn't need too much work to be generalized and adapted to the mediawiki use case, I can help/work on it if you think it is good!

Mon, Dec 10, 8:52 AM · serviceops, User-jijiki, Operations

Fri, Dec 7

elukey added a comment to T210939: Varnishkafka error to investigate: Required feature not supported by broker.

It happened on the 30th on various cp hosts, and in most of the Jumbo brokers I can see something like the following (repeated multiple times and for different brokers):

Fri, Dec 7, 3:13 PM · Patch-For-Review, User-Elukey, Analytics
elukey moved T164243: Alarms on pageview API latency increase from Analytics Backlog to Backlog on the User-Elukey board.
Fri, Dec 7, 2:53 PM · Analytics, User-Elukey
elukey moved T142073: Improve user management for AQS Cassandra from Analytics Backlog to Backlog on the User-Elukey board.
Fri, Dec 7, 2:53 PM · User-Elukey, Pageviews-API, Analytics
elukey moved T210749: Hardware for cloud db replicas for analytics usage from Backlog to In Progress on the User-Elukey board.
Fri, Dec 7, 2:53 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from In Progress to Done on the User-Elukey board.
Fri, Dec 7, 2:52 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

Today another mediawiki alert from ~12:24 to ~12:27 UTC. @Nikerabbit, @aaron - do you think that we can narrow down specific events (beside TTL expiring that may cause this?

Fri, Dec 7, 2:18 PM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T210749: Hardware for cloud db replicas for analytics usage .

@Nuria do you think we should close this as it is decided we'll go for a host with the same specs and config than the rest of clouds replicas (T211135)?

Fri, Dec 7, 7:02 AM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics

Thu, Dec 6

elukey added a comment to T172410: Phase out and replace analytics-store (multisource).

To follow up what I wrote (after a chat with the data persistence team):

Thu, Dec 6, 6:07 PM · Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey moved T206943: JVM pauses cause Yarn master to failover from In Progress to Paused on the Analytics-Kanban board.
Thu, Dec 6, 5:42 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey reassigned T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms from elukey to JAllemandou.
Thu, Dec 6, 5:41 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T211330: cron job rsyncing dumps webserver logs to stat1005 is broken from Next Up to In Progress on the Analytics-Kanban board.
Thu, Dec 6, 5:03 PM · Analytics-Kanban, Patch-For-Review, Analytics, Datasets-General-or-Unknown
elukey added a project to T211330: cron job rsyncing dumps webserver logs to stat1005 is broken: Analytics-Kanban.
Thu, Dec 6, 4:06 PM · Analytics-Kanban, Patch-For-Review, Analytics, Datasets-General-or-Unknown
elukey added a comment to T210705: Move turnilo to nodejs 10.

Turnilo is now running on nodejs 10!

Thu, Dec 6, 3:01 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from Ready to Deploy to Done on the Analytics-Kanban board.
Thu, Dec 6, 3:00 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a project to T211330: cron job rsyncing dumps webserver logs to stat1005 is broken: Analytics.
Thu, Dec 6, 1:26 PM · Analytics-Kanban, Patch-For-Review, Analytics, Datasets-General-or-Unknown

Wed, Dec 5

elukey added a comment to T209929: Decommission old Hadoop worker nodes and add newer ones.

The plan is:

Wed, Dec 5, 2:26 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T209929: Decommission old Hadoop worker nodes and add newer ones from Next Up to In Progress on the Analytics-Kanban board.
Wed, Dec 5, 2:14 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T209929: Decommission old Hadoop worker nodes and add newer ones from Backlog to In Progress on the User-Elukey board.
Wed, Dec 5, 2:10 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T203693: Update to CDH 6 or other up-to-date Hadoop distribution from Backlog to Stalled on the User-Elukey board.
Wed, Dec 5, 2:10 PM · User-Elukey, Analytics-Cluster, Analytics
elukey moved T208934: mcrouter does not remove a memcached shard from consistent hashing when timeouts happen from Backlog to Stalled on the User-Elukey board.
Wed, Dec 5, 2:10 PM · Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey moved T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions from In Progress to Stalled on the User-Elukey board.
Wed, Dec 5, 2:10 PM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey moved T172532: Refactor analytics cronjobs to alarm on failure reliably from In Progress to Stalled on the User-Elukey board.
Wed, Dec 5, 2:10 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
elukey moved T206943: JVM pauses cause Yarn master to failover from In Progress to Stalled on the User-Elukey board.
Wed, Dec 5, 2:09 PM · Patch-For-Review, User-Elukey, Analytics-Kanban, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from Backlog to In Progress on the User-Elukey board.
Wed, Dec 5, 2:09 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].

This is a very good point, I'll bring it up to my team's standup today and I'll let you know. It has been used, as far as I know, for two purposes:

  • join tables from different databases in tmp tables to work on them freely (thing not possible anymore)
  • use it as holding area for various scripts/analytics-reporing/etc..
Wed, Dec 5, 1:58 PM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
elukey added a comment to T210447: codfw row A recable and add QFX.

No mcrouter proxies on A4, all good.

Wed, Dec 5, 10:46 AM · ops-codfw, netops, Operations
elukey added a comment to T210456: codfw row B recable and add QFX.

No mcrouter codfw proxies present in B4, all good.

Wed, Dec 5, 10:44 AM · Patch-For-Review, ops-codfw, netops, Operations
elukey updated the task description for T196489: upgrade all codfw switch stacks to include additional 10G switch per row.
Wed, Dec 5, 10:40 AM · ops-codfw, netops, Operations
elukey added a comment to T210705: Move turnilo to nodejs 10.

Before doing this, we need to probably run npm install for turnilo with the nodejs10... Just realized it

Wed, Dec 5, 10:02 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from In Progress to Ready to Deploy on the Analytics-Kanban board.
Wed, Dec 5, 10:01 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from Ready to Deploy to In Progress on the Analytics-Kanban board.
Wed, Dec 5, 9:23 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey moved T209808: Upgrade Matomo to 3.6.1 or 3.7.0 from Next Up to Done on the Analytics-Kanban board.
Wed, Dec 5, 9:23 AM · Analytics-Kanban, Analytics
elukey set the point value for T209808: Upgrade Matomo to 3.6.1 or 3.7.0 to 5.
Wed, Dec 5, 9:22 AM · Analytics-Kanban, Analytics
elukey updated subscribers of T209808: Upgrade Matomo to 3.6.1 or 3.7.0.

Piwik/Matomo upgraded, but while testing the users I noticed that the piwik user outlined in https://wikitech.wikimedia.org/wiki/Analytics/Systems/Piwik#Access seems having a different password. @Nuria: I tried to change the pass and it seems that it needs more than 6 chars, so it must be another one. Shall we update wikitech?

Wed, Dec 5, 9:22 AM · Analytics-Kanban, Analytics
elukey added a comment to T209808: Upgrade Matomo to 3.6.1 or 3.7.0.

Database Upgrade Required

Wed, Dec 5, 8:57 AM · Analytics-Kanban, Analytics

Tue, Dec 4

elukey added a comment to T210749: Hardware for cloud db replicas for analytics usage .

Opened a procurement task for 1 Cloudb replica in T211135. We are not planning to buy two hosts with the following assumption:

Tue, Dec 4, 6:06 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey added a comment to T210705: Move turnilo to nodejs 10.

Before doing this, we need to probably run npm install for turnilo with the nodejs10... Just realized it

Tue, Dec 4, 2:08 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms.

As FYI ema told me that https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/477424/ reverted https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/476311, an experiment to disable N-hit-wonder for some days. This caused issues while loading images from commons - https://phabricator.wikimedia.org/T210890 (hence upload).

Tue, Dec 4, 1:48 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms from Next Up to In Progress on the Analytics-Kanban board.
Tue, Dec 4, 9:57 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T210705: Move turnilo to nodejs 10 from Next Up to Ready to Deploy on the Analytics-Kanban board.
Tue, Dec 4, 9:57 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T210705: Move turnilo to nodejs 10.

Upgraded turnilo in labs (turnilo.eqiad.wmflabs), if anybody wants to test it: ssh -N turnilo.eqiad.wmflabs -L 9091:turnilo.eqiad.wmflabs:9091

Tue, Dec 4, 9:57 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey claimed T210705: Move turnilo to nodejs 10.
Tue, Dec 4, 9:56 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

Between 8:10 and 9 UTC this morning there were enough TKOs to trigger logstash exception alarm, from https://grafana.wikimedia.org/dashboard/db/memcache-elukey?orgId=1&from=1543910817973&to=1543914647084 it matches nicely..

Tue, Dec 4, 9:13 AM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey updated subscribers of T210704: Migrate node-based services in production to node10.

As mentioned in https://phabricator.wikimedia.org/T209711#4788954 I am looping in @hashar to also allow Releng to test NodeJS 10 :)

Tue, Dec 4, 7:41 AM · Patch-For-Review, Core Platform Team Backlog (Next), Services (next), Operations
elukey added a comment to T210467: codfw row D recable and add QFX.

Need to check with Joe but I'd do the following:

Tue, Dec 4, 7:29 AM · User-jijiki, Patch-For-Review, ops-codfw, netops, Operations

Mon, Dec 3

elukey added a comment to T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms.

Since the refined data should now be there, lowering the priority to High :)

Mon, Dec 3, 7:16 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey lowered the priority of T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms from Unbreak Now! to High.
Mon, Dec 3, 7:15 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

To clarify, the $useMutex logic in WAN cache never triggers due to minAsOf=INF, resulting in stampedes when someone invalidates the cache. Instead, this should be treated like a regular TTL expiration and have one thread at a time doing regeneration.

Mon, Dec 3, 7:14 PM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T172410: Phase out and replace analytics-store (multisource).

Hi everybody,

Mon, Dec 3, 6:53 PM · Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey raised the priority of T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms from Normal to High.
Mon, Dec 3, 11:15 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey triaged T211000: Failure while refining webrequest upload 2018-12-01-14. Upgrade alarms as Normal priority.
Mon, Dec 3, 10:55 AM · Patch-For-Review, Analytics-Kanban, Analytics

Sun, Dec 2

elukey added a project to T210944: librdkafka 1.0.0 deprecates functions used by varnishkafka: User-Elukey.
Sun, Dec 2, 1:52 PM · User-Elukey, Analytics
elukey created T210944: librdkafka 1.0.0 deprecates functions used by varnishkafka.
Sun, Dec 2, 1:51 PM · User-Elukey, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

Is the value so big that every time it is SET it causes a TKO? Besides the 24h TTL, the cache value is updated when there are changes to the underlying data. In other words: when people create or remove translatable pages, aggregate message groups or other similar stuff.

Sun, Dec 2, 11:19 AM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a project to T210939: Varnishkafka error to investigate: Required feature not supported by broker: User-Elukey.
Sun, Dec 2, 9:09 AM · Patch-For-Review, User-Elukey, Analytics
elukey renamed T210939: Varnishkafka error to investigate: Required feature not supported by broker from Varnishkafka error to investigate: to Varnishkafka error to investigate: Required feature not supported by broker.
Sun, Dec 2, 9:08 AM · Patch-For-Review, User-Elukey, Analytics
elukey created T210939: Varnishkafka error to investigate: Required feature not supported by broker.
Sun, Dec 2, 9:08 AM · Patch-For-Review, User-Elukey, Analytics

Fri, Nov 30

elukey added a comment to T210749: Hardware for cloud db replicas for analytics usage .

A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 12x3.6TB disks and 128GB of RAM, so I'd say that they could do the job (I only don't know if 128GB of RAM would be enough for our use case, but I'll defer to Manuel/Balazs/Jaime judgement).

Are those SSDs?

They are not, forgot to mention :(

Fri, Nov 30, 6:28 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey added a comment to T210667: Can exfat be used in WMF production?.

Reading the backlog only now, this was good learning lesson for me too (I was aware of what Chase did as mentioned, and didn't think that it would have been flagged as issue to review). Thanks a lot to all that contributed with their thoughts and suggestions :)

Fri, Nov 30, 6:24 PM · Security-Team, Analytics, Software-Licensing, WMF-Legal, Operations
elukey added a comment to T210749: Hardware for cloud db replicas for analytics usage .

A possible solution, instead of ordering new hardware, would be to reuse one/two of the new Hadoop nodes racked in T207192 for this use case: they have 12x3.6TB disks and 128GB of RAM, so I'd say that they could do the job (I only don't know if 128GB of RAM would be enough for our use case, but I'll defer to Manuel/Balazs/Jaime judgement).
In case this option is viable, we'll need to get also the green light for the repurpose from Faidon or Mark.

Fri, Nov 30, 6:15 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey updated subscribers of T209711: Set up CI system on AQS.

@hashar We have a component in Stretch for this (component/node10), and @MoritzMuehlenhoff is currently leading an effort to migrate to Nodejs 10 in https://phabricator.wikimedia.org/T210704 :)

Fri, Nov 30, 2:33 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T209711: Set up CI system on AQS.

@hashar quick question - we are about to migrate AQS to NodeJS 10, will it be easy to migrate npm test to it when needed?

Fri, Nov 30, 2:13 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].

Thanks a lot for all the inputs, I'd say that we don't need proxies for the moment, we'll probably just need some automation around the map between replicated wiki - mariadb host/instance (that the Analytics team can do of course) to ease the job of connecting to a specific instance for every user.

Fri, Nov 30, 2:00 PM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

Update after the mediawiki train deployment:

Fri, Nov 30, 9:02 AM · Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations

Thu, Nov 29

elukey added a project to T210749: Hardware for cloud db replicas for analytics usage : User-Elukey.
Thu, Nov 29, 6:12 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey added a comment to T210749: Hardware for cloud db replicas for analytics usage .

We don't have any spare host similar to the labsdb ones.
Those are very specific hardware, as they need to contain all the wikis on the same host, they have lots of disks (16x1.6TB on a RAID10) and 512GB memory T135529

Thu, Nov 29, 6:09 PM · User-Banyek, Data-Services, User-Elukey, DBA, Analytics
elukey added a comment to T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].

As far as I know we have to go multi-instance, but I don't have a lot of context if multi-source can or is needed anymore (I guess no but I prefer to ask :)

Thu, Nov 29, 5:07 PM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
elukey triaged T210706: Move AQS to nodejs 10 as Normal priority.
Thu, Nov 29, 8:58 AM · Analytics
elukey triaged T210705: Move turnilo to nodejs 10 as Normal priority.
Thu, Nov 29, 8:57 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Thu, Nov 29, 8:45 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics

Wed, Nov 28

elukey updated the task description for T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5].
Wed, Nov 28, 5:52 PM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
elukey closed T209620: rack/setup/install dbstore100[3-5].eqiad.wmnet as Resolved.

Thanks a lot! We are going to follow up in https://phabricator.wikimedia.org/T210478

Wed, Nov 28, 5:52 PM · User-Elukey, Analytics, Operations