Page MenuHomePhabricator

elukey (Luca Toscano)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (166 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Yesterday

elukey added a comment to T213802: Investigate ways to reduce the size of translate-groups cache key.

@abi_ , @Nikerabbit Any plans to merge/deploy this? :)

Mon, Mar 18, 3:53 PM · Patch-For-Review, User-abi_, User-Nikerabbit, MediaWiki-extensions-Translate
elukey added a comment to T217359: Possibly expand Kafka main-{eqiad,codfw} clusters in Q4 2019..

In the SRE spreadsheet I can see that the suggested replacement FY is 20/21, not the upcoming one.. Just adding the info, not sure if these servers are eligible or not for refresh before the 5y of usage.

Mon, Mar 18, 2:25 PM · User-herron, Core Platform Team (Modern Event Platform (TEC2)), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Operations
elukey added a comment to T217731: Consider raising Memcached MWObject cache memory size limit.

I have read again the https://github.com/memcached/memcached/blob/master/doc/protocol.txt and came up with some new graphs, all added to https://grafana.wikimedia.org/d/000000317/memcache-slabs.

Mon, Mar 18, 9:35 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations

Thu, Mar 14

elukey moved T210706: Move AQS to nodejs 10 from In Progress to Done on the Analytics-Kanban board.
Thu, Mar 14, 6:08 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey set the point value for T210706: Move AQS to nodejs 10 to 5.
Thu, Mar 14, 6:07 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T210706: Move AQS to nodejs 10.

AQS migrated to nodejs 10!

Thu, Mar 14, 6:07 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T216528: confirm gpu form factor in stat1005.

@Cmjohnson do you have time today/tomorrow to answer Rob's question? It would unblock us to order the new GPU :) (sorry for the hassle with stat1005, we hope this is the last request)

Thu, Mar 14, 9:15 AM · ops-eqiad, Analytics-Kanban, Analytics, Operations
elukey added a comment to T213089: Upgrade memcached for Debian Stretch/Buster.

Leaving here also a reference of https://github.com/memcached/memcached/issues/359:

Thu, Mar 14, 8:44 AM · User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
elukey added a comment to T213089: Upgrade memcached for Debian Stretch/Buster.

EDIT: after a chat with upstream it was suggested to me to follow up with Debian to avoid shipping 1.5.6 since it contains some bugs resolved in later versions. I'll try to follow up with Debian upstream asap!

Thu, Mar 14, 8:37 AM · User-jijiki, serviceops, Performance-Team (Radar), Operations, User-Elukey
elukey added a comment to T217731: Consider raising Memcached MWObject cache memory size limit.

Can the (extra) space be dedicated more so towards the larger slabs, were we have more problems AFAIK?

Thu, Mar 14, 8:30 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey added a project to T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern: User-Elukey.
Thu, Mar 14, 8:12 AM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata

Wed, Mar 13

elukey added a comment to T217731: Consider raising Memcached MWObject cache memory size limit.

I was wrong, evictions started happening, even if on a lower pace. The extra 10G of space allowed mc1019 to store 53M objects rather than 46.8M, lowering down the evictions by 100 ops/s more or less. Interestingly the reclaim rate (that should be the expired objects cleaned up to allow more space in the slab) grew at the same time, so I suppose that keeping more things in the LRU eventually translates into having more expired items to evict. Not sure if this is the right way to read these graphs, will think more about it during the next days :)

Wed, Mar 13, 1:53 PM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T218197: CacheAwarePropertyInfoStore & CachingPropertyInfoLookup should use WANObjectCache instead of BagOStuff.

Adding also @aaron that knows better what is best (and hopefully can confirm what I am saying).

Wed, Mar 13, 12:10 PM · Wikidata, Wikidata-Campsite
elukey updated subscribers of T218197: CacheAwarePropertyInfoStore & CachingPropertyInfoLookup should use WANObjectCache instead of BagOStuff.
Wed, Mar 13, 12:10 PM · Wikidata, Wikidata-Campsite
elukey added a comment to T217731: Consider raising Memcached MWObject cache memory size limit.

Very interesting results for mc1019 after a day of metrics.

Wed, Mar 13, 7:41 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey claimed T217731: Consider raising Memcached MWObject cache memory size limit.
Wed, Mar 13, 7:35 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey updated subscribers of T210706: Move AQS to nodejs 10.

@Milimetric if the test went fine I'd say to proceed with production :)

Wed, Mar 13, 7:09 AM · Patch-For-Review, Analytics-Kanban, Analytics

Tue, Mar 12

elukey added a comment to T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern.

To summarize what I am currently seeing:

Tue, Mar 12, 5:42 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata
elukey updated subscribers of T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern.

Another big SET bursts to mc1022 caused timeouts for the MW appservers.. Don't meant to pressure you guys but is there any plan to fix this?

Tue, Mar 12, 3:42 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata
elukey moved T212243: Staging environment for upgrades of superset from Backlog to In Progress on the User-Elukey board.
Tue, Mar 12, 3:13 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey moved T217731: Consider raising Memcached MWObject cache memory size limit from Backlog to In Progress on the User-Elukey board.
Tue, Mar 12, 3:13 PM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T217412: Enable encryption and authentication for TLS-based Hadoop services.

Ah I see! Is that a problem? Can a CA not create multiple certificates with the same CN?

Tue, Mar 12, 2:57 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T217412: Enable encryption and authentication for TLS-based Hadoop services.

use a self signed CA

Another option would be to use the Puppet CA to sign cergen created certificates. The truststore doesn't need the private key of the CA, so this shoudln't have any security problems.

Tue, Mar 12, 2:23 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

I tried to force pandas to 0.22 but this requires numpy 1.13.2, that is not compatible (apparently) with Python 3.7, leading to compilation errors while building the numpy's C stuff.

Tue, Mar 12, 10:55 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey closed T215589: Migrate users to dbstore100[3-5], a subtask of T210478: Migrate dbstore1002 to a multi instance setup on dbstore100[3-5], as Resolved.
Tue, Mar 12, 7:09 AM · Patch-For-Review, User-Banyek, Analytics-Kanban, DBA, Analytics
elukey closed T215589: Migrate users to dbstore100[3-5] as Resolved.
Tue, Mar 12, 7:08 AM · User-Marostegui, Analytics-Kanban, Analytics
elukey set the point value for T215589: Migrate users to dbstore100[3-5] to 0.
Tue, Mar 12, 7:08 AM · User-Marostegui, Analytics-Kanban, Analytics
elukey updated the task description for T215589: Migrate users to dbstore100[3-5].
Tue, Mar 12, 7:07 AM · User-Marostegui, Analytics-Kanban, Analytics

Mon, Mar 11

elukey added a comment to T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern.

I found another occurrence of timeouts in mcrouter with SETs for this key, it might be due to TTL expiring or similar. Reducing the traffic volume to memcached is surely a good thing.

Mon, Mar 11, 6:37 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata
elukey updated the task description for T212243: Staging environment for upgrades of superset .
Mon, Mar 11, 6:25 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

Created https://github.com/apache/incubator-superset/issues/7006

Mon, Mar 11, 5:42 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

Turned out that simply re-creating the world maps from scratch worked fine!

Mon, Mar 11, 5:20 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey closed T187102: Vagrant's /var/log/daemon.log filling up with kafka errors as Resolved.
Mon, Mar 11, 4:04 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, MediaWiki-Vagrant
elukey added a comment to T187102: Vagrant's /var/log/daemon.log filling up with kafka errors.

@DLynch did you manage to solve the issue?

Mon, Mar 11, 4:02 PM · Core Platform Team Backlog (Watching / External), Services (watching), Analytics, MediaWiki-Vagrant
elukey added a comment to T215775: Check home leftovers of ISI researchers.

ping @leila :)

Mon, Mar 11, 3:47 PM · Research, Analytics
elukey added a project to T218037: Upgrade matomo1001 to latest upstream: User-Elukey.
Mon, Mar 11, 3:28 PM · User-Elukey, Analytics
elukey created T218037: Upgrade matomo1001 to latest upstream.
Mon, Mar 11, 3:27 PM · User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

I added some logging on the main superset instance. This is what df[metric] looks like for the pageview dashboard:

Mon, Mar 11, 2:37 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey updated subscribers of T217412: Enable encryption and authentication for TLS-based Hadoop services.

The next step is to figure out how to deploy the Java trustore (where the TLS CA's certificate is) and the keystore (where TLS public/private key for the host are stored). An important note to remember is the following (talks about TLS communication between Hadoop daemons/hosts):

Mon, Mar 11, 2:14 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

To quickly test superset: ssh -L 9080:analytics-tool1004.eqiad.wmnet:80 analytics-tool1004.eqiad.wmnet and then localhost:9080

Mon, Mar 11, 11:19 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T212243: Staging environment for upgrades of superset .

0.29rc7 is deployed on analytics-tool1004, so far the only I issue that I found is that graphs showing data on a World Map are broken. To reproduce, it is sufficient to check the "Pageviews Overview" dashboard and see that one graph fails with: "Too many indexers"

Mon, Mar 11, 11:16 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics

Sun, Mar 10

elukey added a comment to T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern.

Correction: the last bursts of mcrouter's timeouts seems to match a high set rate for slab 140 on mc1022, that is exactly where the key is:

Sun, Mar 10, 7:03 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata
elukey reopened T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern as "Open".
Sun, Mar 10, 6:57 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata
elukey added a comment to T97368: Fix inefficient CacheAwarePropertyInfoStore memcached access pattern.

Quick comment: I noticed while checking some memcached metrics that wikibase_shared/1_33_0-wmf_20-wikidatawiki-hhvm:CacheAwarePropertyInfoStore generates 30~60MB/s of GET traffic since the last deployment of mediawiki:

Sun, Mar 10, 6:57 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), User-Elukey, Wikidata-Campsite, MW-1.32-notes (WMF-deploy-2018-10-02 (1.32.0-wmf.24)), User-Addshore, Performance-Team (Radar), Patch-For-Review, Wikimedia-Incident, Operations, wikidata-tech-focus, MediaWiki-extensions-WikibaseClient, MediaWiki-extensions-WikibaseRepository, Wikidata

Fri, Mar 8

elukey added a comment to T210706: Move AQS to nodejs 10.

Confirmed deployment-aqs servers are behaving normally with Node 10.4

Next step: deploy to prod. I pushed a change here to bump up the node version in package.json. We could build with that and test again if we want to be extra cautious. Let me know what you think @elukey

Fri, Mar 8, 5:04 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey added a comment to T204742: Unable to store parser output in StashEdit (Memcached error: ITEM TOO BIG).

I'd like the limit to be bumped to around $wgMaxArticleSize after upgrade (2MB).

Following the procedure at https://github.com/facebook/mcrouter/issues/26 , I'd be OK with experimenting with --big-value-split-threshold . It would only be enable for things that would fail otherwise anyway. It still puts everything in one server. One thing I don't see is a way to enforce, for sanity, the size limit once you do that.

Fri, Mar 8, 3:09 PM · MW-1.33-notes (1.33.0-wmf.22; 2019-03-19), Performance-Team, MediaWiki-Page-editing, Patch-For-Review, User-Elukey, Regression, Wikimedia-production-error

Thu, Mar 7

elukey added a comment to T205507: Decommission analytics100[1,2].

Proposal for fix:

Thu, Mar 7, 8:52 AM · Operations, ops-eqiad, decommission, User-Elukey, Analytics

Wed, Mar 6

elukey added a comment to T215231: rack/setup/install labsdb1012.eqiad.wmnet.

@jcrespo @Marostegui thoughts? What would it be best in your opinion? I'd prefer another dbproxy-based domain but not sure how complicated to create/maintain would be for you..

Wed, Mar 6, 7:16 PM · DBA, Patch-For-Review, ops-eqiad, Analytics, User-Elukey, Operations
elukey added a project to T215231: rack/setup/install labsdb1012.eqiad.wmnet: DBA.
Wed, Mar 6, 7:15 PM · DBA, Patch-For-Review, ops-eqiad, Analytics, User-Elukey, Operations
elukey added a comment to T215231: rack/setup/install labsdb1012.eqiad.wmnet.

I added the following bit on cr1/cr2:

Wed, Mar 6, 7:13 PM · DBA, Patch-For-Review, ops-eqiad, Analytics, User-Elukey, Operations
elukey reopened T215231: rack/setup/install labsdb1012.eqiad.wmnet as "Open".
Wed, Mar 6, 7:11 PM · DBA, Patch-For-Review, ops-eqiad, Analytics, User-Elukey, Operations
elukey set the point value for T213488: Superset's rolling average feature results in error message to 3.
Wed, Mar 6, 4:19 PM · Analytics-Kanban, Product-Analytics, Analytics
elukey moved T213488: Superset's rolling average feature results in error message from Next Up to Done on the Analytics-Kanban board.
Wed, Mar 6, 4:19 PM · Analytics-Kanban, Product-Analytics, Analytics
elukey closed T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster as Resolved.

analytics-tool1004 is up and running, will close this task now and re-open another one when we'll be ready to decom 1003.

Wed, Mar 6, 4:03 PM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey closed T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster, a subtask of T212243: Staging environment for upgrades of superset , as Resolved.
Wed, Mar 6, 4:03 PM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey renamed T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster from Replace analytics-tool1003 ganeti VM with another VM with Buster to Create a ganeti VM identical to analytics-tool1003 with Debian Buster.
Wed, Mar 6, 4:02 PM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey added a comment to T203498: Upgrade Hive to ≥ 2.0.
I see migrating to CDH6 as long term goal/plan.

CDH6 might be the goal, but if ends up being a brand new install (no clear upgrade path), I think it would still be worth considering other distributions, e.g. Bigtop or Hortonworks or even that cool new Hadoop distribution with better security primitives that I can't remember the name of.

Anyway ya this is humongo project indeed :)

Wed, Mar 6, 2:24 PM · Analytics-Cluster, Analytics
elukey added a project to T215550: Test sqooping from the new dedicated labsdb host: Analytics-Kanban.
Wed, Mar 6, 12:33 PM · Patch-For-Review, Analytics-Kanban, Analytics
elukey closed T200312: Remove data from Hadoop's HDFS as part of the user offboard workflow as Resolved.
Wed, Mar 6, 11:46 AM · Patch-For-Review, Documentation, Operations, Analytics
elukey updated the task description for T211706: Superset Updates .
Wed, Mar 6, 11:41 AM · Analytics-Kanban, Product-Analytics
elukey added a comment to T213488: Superset's rolling average feature results in error message.

@jlinehan I am currently half way through the plan for the staging environment, one thing that we could try to do is to test https://pypi.org/project/superset/0.29.0rc7/ as first use case (rc8 is the newest tag but they haven't released it via pypi). There is still little traction on https://github.com/apache/incubator-superset/issues/6785 for their first Apache release, so we'll probably have to wait more for the first 0.29 stable. Thoughts? I believe that this task can be closed, lemme know if there are other steps todo/missing.

Wed, Mar 6, 11:38 AM · Analytics-Kanban, Product-Analytics, Analytics
elukey moved T217412: Enable encryption and authentication for TLS-based Hadoop services from Next Up to In Progress on the Analytics-Kanban board.
Wed, Mar 6, 11:23 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a comment to T210706: Move AQS to nodejs 10.

Restarted aqs in deployment-prep, we are using nodejs 10 in there. @Milimetric let's chat about the next steps whenever you have time, I think we are really close to deploy!

Wed, Mar 6, 10:29 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T212243: Staging environment for upgrades of superset from Next Up to In Progress on the Analytics-Kanban board.
Wed, Mar 6, 10:25 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a project to T212243: Staging environment for upgrades of superset : Analytics-Kanban.
Wed, Mar 6, 10:25 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey updated the task description for T212243: Staging environment for upgrades of superset .
Wed, Mar 6, 10:25 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey updated the task description for T212243: Staging environment for upgrades of superset .
Wed, Mar 6, 10:12 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey updated subscribers of T217738: Create an-tool1005 (Staging environment for Superset).

To keep archives happy: @MoritzMuehlenhoff is currently testing the Buster debian installer for Ganeti VMs, so please do not install anything on this VM yet.

Wed, Mar 6, 10:05 AM · Analytics-Kanban, Patch-For-Review, vm-requests, Operations, User-Elukey, Analytics
elukey added a comment to T217738: Create an-tool1005 (Staging environment for Superset).
If you need a private IP, do you need it to be inside the Analytics VLAN? (y/n)
y
Wed, Mar 6, 10:01 AM · Analytics-Kanban, Patch-For-Review, vm-requests, Operations, User-Elukey, Analytics
elukey triaged T217738: Create an-tool1005 (Staging environment for Superset) as Normal priority.
Wed, Mar 6, 9:37 AM · Analytics-Kanban, Patch-For-Review, vm-requests, Operations, User-Elukey, Analytics
elukey updated the task description for T215589: Migrate users to dbstore100[3-5].
Wed, Mar 6, 8:42 AM · User-Marostegui, Analytics-Kanban, Analytics
elukey closed T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard as Resolved.
Wed, Mar 6, 8:42 AM · Product-Analytics, Patch-For-Review, Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey closed T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard, a subtask of T215589: Migrate users to dbstore100[3-5], as Resolved.
Wed, Mar 6, 8:41 AM · User-Marostegui, Analytics-Kanban, Analytics
elukey added a comment to T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard.

All good if we close this task? Or is there anything else pending?

Wed, Mar 6, 8:41 AM · Product-Analytics, Patch-For-Review, Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey set the point value for T212386: Provide tools for querying MediaWiki replica databases without having to specify the shard to 13.
Wed, Mar 6, 8:41 AM · Product-Analytics, Patch-For-Review, Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
elukey added a comment to T203498: Upgrade Hive to ≥ 2.0.

It looks like CDH 6.1, which includes Hive 2.1.1, was released in December.

@elukey, what's the current thinking about deploying this? I'm sure there are many complexities: going from CDH 5 to 6 sounds like a tricky upgrade, I know there's been discussion of switching from CDH to Hortonworks or BigTop, and I've heard the larger plan is to move away from Hive and towards Presto anyway.

Wed, Mar 6, 8:39 AM · Analytics-Cluster, Analytics
elukey placed T217731: Consider raising Memcached MWObject cache memory size limit up for grabs.
Wed, Mar 6, 8:19 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey updated the task description for T217731: Consider raising Memcached MWObject cache memory size limit.
Wed, Mar 6, 8:18 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey triaged T217731: Consider raising Memcached MWObject cache memory size limit as Normal priority.
Wed, Mar 6, 8:17 AM · Patch-For-Review, Performance-Team (Radar), User-Elukey, MediaWiki-Cache, Operations
elukey awarded T213802: Investigate ways to reduce the size of translate-groups cache key a Love token.
Wed, Mar 6, 7:57 AM · Patch-For-Review, User-abi_, User-Nikerabbit, MediaWiki-extensions-Translate
elukey added a comment to T216491: Decommission dbstore1002.

No complaints or outages after the shutdown of dbstore1002, I think that we are good to keep going with the decom.

Wed, Mar 6, 7:17 AM · Patch-For-Review, decommission, ops-eqiad, Operations, Analytics
elukey updated the task description for T216491: Decommission dbstore1002.
Wed, Mar 6, 7:16 AM · Patch-For-Review, decommission, ops-eqiad, Operations, Analytics
elukey closed T215231: rack/setup/install labsdb1012.eqiad.wmnet as Resolved.
Wed, Mar 6, 7:13 AM · DBA, Patch-For-Review, ops-eqiad, Analytics, User-Elukey, Operations

Tue, Mar 5

elukey added a comment to T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.

To keep archives happy: @MoritzMuehlenhoff is currently testing the Buster debian installer for Ganeti VMs, so please do not install anything on this VM yet.

Tue, Mar 5, 5:30 PM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey added a comment to T216226: GPU upgrade for stat1005.

Those cards afaics are only 8G, meanwhile we'd need 16G (if possible). The only model that would suit us that I found is:

Tue, Mar 5, 4:06 PM · Analytics, hardware-requests, Operations
elukey added a comment to T216226: GPU upgrade for stat1005.

@RobH What do you think? Would it be feasible for you to check from our vendors if we can get a RX Vega 64?

Tue, Mar 5, 3:57 PM · Analytics, hardware-requests, Operations
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

@aaron if you have time can you explain what https://gerrit.wikimedia.org/r/493148 does? I didn't get the "spam" reference in the description of the change: does it mean removing actual ADD/GET commands sent to Memcached?

Tue, Mar 5, 2:41 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations
elukey added a comment to T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.

I just realized that in https://wikitech.wikimedia.org/wiki/Infrastructure_naming_conventions we have 'an-tool' as host prefix, not 'analytics-tool' (new vs current scheme), so I should have named this instance an-tool1004. Since a lot of work has been done I think it is better to keep the name and establish that the next VMs (should be created soon) will all have the new naming scheme. Apologies! :(

Tue, Mar 5, 2:34 PM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey changed the status of T212243: Staging environment for upgrades of superset from Stalled to Open.
Tue, Mar 5, 11:54 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey changed the status of T212243: Staging environment for upgrades of superset , a subtask of T211706: Superset Updates , from Stalled to Open.
Tue, Mar 5, 11:54 AM · Analytics-Kanban, Product-Analytics
elukey added a comment to T187960: Rack/cable/configure asw2-a-eqiad switch stack.
ge-6/0/25 - mc1019
ge-6/0/26 - mc1020
ge-6/0/27 - mc1021
ge-6/0/28 - mc1022
ge-6/0/29 - mc1023
Tue, Mar 5, 11:47 AM · Wikidata, wikidata-tech-focus, Reading-Infrastructure-Team-Backlog, Cognate, Language-Team, Growth-Team, Patch-For-Review, Operations, ops-eqiad, netops
elukey added a comment to T187960: Rack/cable/configure asw2-a-eqiad switch stack.

About Analytics nodes:

Tue, Mar 5, 11:40 AM · Wikidata, wikidata-tech-focus, Reading-Infrastructure-Team-Backlog, Cognate, Language-Team, Growth-Team, Patch-For-Review, Operations, ops-eqiad, netops
elukey added a comment to T216226: GPU upgrade for stat1005.

@RobH, @EBernhardson - while we wait for a response from AMD, I'd also like to understand if T216528 gave us more info about the possibility of ordering a GPU like RX vega 64 via regular vendors (I tried to understand it by myself reading tasks/infos/etc.. but I didn't come up with a solid answer :)

Tue, Mar 5, 11:31 AM · Analytics, hardware-requests, Operations
elukey added a comment to T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.

Worked nicely!

Tue, Mar 5, 10:48 AM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey added a comment to T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.

@akosiaris nope I don't feel adventurous today :D I added a change to makevm to support this use case, not sure if it makes sense and/or too tedious for the regular use case :)

Tue, Mar 5, 10:36 AM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey updated subscribers of T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.

@akosiaris IIRC there is a bridge + interface for the Analytics VLAN on the Ganeti host, that takes care of proper VLAN tagging etc.. So I guess that I cannot use makevm in this use case, since I'd need to add something like --net 0:link=analytics right?

Tue, Mar 5, 10:19 AM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey added a comment to T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster.
elukey@ganeti1003:~$  sudo gnt-group list
Group Nodes Instances AllocPolicy NDParams
row_A     4        33 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
row_C     4        29 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
Tue, Mar 5, 9:45 AM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey triaged T217640: Create a ganeti VM identical to analytics-tool1003 with Debian Buster as Normal priority.
Tue, Mar 5, 9:45 AM · Patch-For-Review, Operations, vm-requests, User-Elukey, Analytics
elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

Some TKOs happened also at around 3:33 UTC this morning (March 5th), so I took a look to some of the apache httpd logs of the appservers getting timeouts and I found out something interesting (that I brought up also a while ago IIRC). When the rise of GETs happens, I can clearly see a ton of POSTs to meta.wikimedia.org interleaved by GETs following this format:

Tue, Mar 5, 9:19 AM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations

Mon, Mar 4

elukey added a comment to T203786: Mcrouter periodically reports soft TKOs for mc1022 (was mc1035) leading to MW Memcached exceptions.

The spike in GETs (generating several MBs of traffic) happened again, adding some links to follow up:

Mon, Mar 4, 6:19 PM · MW-1.33-notes (1.33.0-wmf.21; 2019-03-12), Patch-For-Review, Performance-Team (Radar), Wikimedia-production-error, User-Elukey, MediaWiki-Cache, Operations