Page MenuHomePhabricator

elukey (Luca Toscano)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Jan 5 2016, 9:54 PM (251 w, 3 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
LToscano (WMF) [ Global Accounts ]

Recent Activity

Yesterday

elukey added a comment to T236327: replace onboard NIC in kafka-jumbo100[1-6].

We had to rollback the NIC on 1006, we need to install firmware-bnx2x on all nodes before doing any work (checked with Faidon since it is a non-free package). The drivers are usually added ad d-i/install time, but since we are not reimaging, we need to do it manually.

Fri, Oct 30, 4:41 PM · Patch-For-Review, Analytics-Clusters, ops-eqiad, Operations, User-Elukey
elukey added a comment to T236327: replace onboard NIC in kafka-jumbo100[1-6].

After booting kafka-jumbo1006 with the 10g nic:

Fri, Oct 30, 3:09 PM · Patch-For-Review, Analytics-Clusters, ops-eqiad, Operations, User-Elukey
elukey moved T255139: Create the new Hadoop test cluster from In Progress to Done on the Analytics-Kanban board.
Fri, Oct 30, 9:15 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey added a comment to T255139: Create the new Hadoop test cluster.

Created https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Hadoop/Test

Fri, Oct 30, 8:46 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Thu, Oct 29

elukey added a comment to T253438: an-presto1004 down .

I am a little bit disappointed, Dell seems to be lagging a lot. Again, this host has been down since May..

Thu, Oct 29, 5:03 PM · Analytics-Radar, Operations, ops-eqiad
elukey closed T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet as Resolved.
Thu, Oct 29, 4:45 PM · vm-requests, Operations, Analytics-Clusters
elukey added a comment to T266709: an-coord1001 ram upgrade.

@Cmjohnson deal then, thanks!

Thu, Oct 29, 4:34 PM · Reading Epics (Analytics), Operations, ops-eqiad
elukey added a comment to T266709: an-coord1001 ram upgrade.

@Cmjohnson yep definitely I'd need to schedule this, Tuesday is ok! What time would you be able to start? (I'd need an hour of drain time before that)

Thu, Oct 29, 4:02 PM · Reading Epics (Analytics), Operations, ops-eqiad
elukey moved T264176: Switch Zookeeper to profile::java from In Progress to Done on the Analytics-Kanban board.
Thu, Oct 29, 2:09 PM · Analytics-Kanban, Analytics-Clusters, Operations
elukey added a comment to T264176: Switch Zookeeper to profile::java.

Change rolled out!

Thu, Oct 29, 2:09 PM · Analytics-Kanban, Analytics-Clusters, Operations
elukey added a comment to T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet.
elukey@cumin1001:~$ sudo cookbook sre.ganeti.makevm eqiad_B an-test-ui1001.eqiad.wmnet --vcpus 2 --memory 4 --disk 20 --network analytics
START - Cookbook sre.ganeti.makevm
Ready to create Ganeti VM an-test-ui1001.eqiad.wmnet in the ganeti01.svc.eqiad.wmnet cluster on row B with 2 vCPUs, 4GB of RAM, 20GB of disk in the analytics network.
Is this correct?
Type "done" to proceed
> done
Allocated IPv4 10.64.21.6/24
Set DNS name of IP 10.64.21.6/24 to an-test-ui1001.eqiad.wmnet
Allocated IPv6 2620:0:861:105:10:64:21:6/64 with DNS name an-test-ui1001.eqiad.wmnet
Generating the DNS records from Netbox data. It will take a couple of minutes.
2020-10-29 10:03:37,265 [INFO] Gathering devices, interfaces, addresses and prefixes from Netbox
2020-10-29 10:05:30,292 [INFO] Gathered 2181 devices from Netbox
2020-10-29 10:05:30,292 [INFO] Generating DNS records
2020-10-29 10:05:37,566 [INFO] Generated 12028 direct and reverse records (6014 each) in 26 direct zones and 168 reverse zones
2020-10-29 10:05:37,567 [INFO] Cloning /srv/netbox-exports/dns.git/ to /tmp/dns-c25pcHBldHM-h8v1a9h2 ...
2020-10-29 10:05:38,135 [INFO] Generating zonefile snippets to directory /tmp/dns-c25pcHBldHM-h8v1a9h2
2020-10-29 10:05:38,924 [INFO] Nothing to commit!
METADATA: {"no_changes": true}
2020-10-29 10:05:39,137 [INFO] Temporary directory /tmp/dns-c25pcHBldHM-h8v1a9h2 removed.
No changes to deploy.
The Ganeti's command output will be printed at the end.
Creating VM an-test-ui1001.eqiad.wmnet in cluster ganeti01.svc.eqiad.wmnet with row=B vcpus=2 memory=4GB disk=20GB link=analytics. This may take
a few minutes.
Thu Oct 29 10:05:40 2020  - INFO: No-installation mode selected, disabling startup
Thu Oct 29 10:05:45 2020  - INFO: Selected nodes for instance an-test-ui1001.eqiad.wmnet via iallocator hail: ganeti1015.eqiad.wmnet, ganeti1016.eqiad.wmnet
Thu Oct 29 10:05:46 2020 * creating instance disks...
Thu Oct 29 10:05:51 2020 adding instance an-test-ui1001.eqiad.wmnet to cluster config
Thu Oct 29 10:05:51 2020 adding disks to cluster config
Thu Oct 29 10:05:51 2020  - INFO: Waiting for instance an-test-ui1001.eqiad.wmnet to sync disks
Thu Oct 29 10:05:51 2020  - INFO: - device disk/0:  0.10% done, 41m 35s remaining (estimated)
Thu Oct 29 10:06:51 2020  - INFO: - device disk/0: 10.90% done, 8m 17s remaining (estimated)
Thu Oct 29 10:07:51 2020  - INFO: - device disk/0: 21.60% done, 7m 2s remaining (estimated)
Thu Oct 29 10:08:52 2020  - INFO: - device disk/0: 32.40% done, 6m 9s remaining (estimated)
Thu Oct 29 10:09:52 2020  - INFO: - device disk/0: 43.10% done, 5m 15s remaining (estimated)
Thu Oct 29 10:10:52 2020  - INFO: - device disk/0: 53.90% done, 4m 7s remaining (estimated)
Thu Oct 29 10:11:52 2020  - INFO: - device disk/0: 64.60% done, 3m 12s remaining (estimated)
Thu Oct 29 10:12:52 2020  - INFO: - device disk/0: 75.40% done, 2m 15s remaining (estimated)
Thu Oct 29 10:13:52 2020  - INFO: - device disk/0: 86.20% done, 1m 13s remaining (estimated)
Thu Oct 29 10:14:53 2020  - INFO: - device disk/0: 96.90% done, 17s remaining (estimated)
Thu Oct 29 10:15:10 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Thu Oct 29 10:15:10 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Thu Oct 29 10:15:10 2020  - INFO: - device disk/0: 100.00% done, 0s remaining (estimated)
Thu Oct 29 10:15:10 2020  - INFO: Instance an-test-ui1001.eqiad.wmnet's disks are in sync
Thu Oct 29 10:15:10 2020  - INFO: Waiting for instance an-test-ui1001.eqiad.wmnet to sync disks
Thu Oct 29 10:15:10 2020  - INFO: Instance an-test-ui1001.eqiad.wmnet's disks are in sync
MAC address for an-test-ui1001.eqiad.wmnet is: aa:00:00:3b:5a:aa
Syncing VMs in DC eqiad to Netbox
Failed to call 'cookbooks.sre.ganeti.makevm.get_vm' [1/20, retrying in 3.00s]:
Failed to call 'cookbooks.sre.ganeti.makevm.get_vm' [2/20, retrying in 6.00s]:
Created interface ##PRIMARY## on VM an-test-ui1001
Attached IPv4 10.64.21.6/24 and IPv6 2620:0:861:105:10:64:21:6/64 to VM an-test-ui1001 and marked as primary IPs
END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0)
Thu, Oct 29, 10:17 AM · vm-requests, Operations, Analytics-Clusters
elukey added a comment to T266746: TCP traffic increase for DNS over TLS breached a low limit for max open files on authdns1001/2001.

Current status:

Thu, Oct 29, 9:55 AM · Traffic, Operations
elukey triaged T266746: TCP traffic increase for DNS over TLS breached a low limit for max open files on authdns1001/2001 as High priority.
Thu, Oct 29, 8:24 AM · Traffic, Operations

Wed, Oct 28

elukey added a comment to T265971: Check data currently stored on thorium and drop what it is not needed anymore.

@Milimetric the data to review (if you have time) is the one under https://analytics.wikimedia.org/published/datasets/archive/public-datasets/, especially:

Wed, Oct 28, 8:15 PM · Analytics
elukey added a watcher for SRE-swift-storage: elukey.
Wed, Oct 28, 3:42 PM
elukey added a comment to T266495: Create Debian Package for Flink.

I see https://issues.apache.org/jira/browse/BIGTOP-3382 to add 1.11.1, but it seems that the work will be done after Bigtop 1.5. We could think about contributing to upstream if we feel the need to have that included to Bigtop 1.5!

Wed, Oct 28, 3:42 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
elukey added a comment to T266495: Create Debian Package for Flink.
root@apt1001:/srv/wikimedia# reprepro lsbycomponent flink
flink | 1.6.4-1 | stretch-wikimedia | thirdparty/bigtop14 | amd64
flink | 1.6.4-1 |  buster-wikimedia | thirdparty/bigtop14 | amd64
Wed, Oct 28, 3:37 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
elukey added a comment to T266467: Check home/HDFS leftovers of rodolfovalentim.

@diego green light to drop then?

Wed, Oct 28, 3:26 PM · Analytics
elukey added a comment to T260411: Create a temporary hadoop backup cluster.

analytics1044 seems to keep PXE booting, so it installs endlessly the OS. I checked the system's setup (reboot + f2) but the hard disk step is configured before the NIC (as expected), so not sure what's wrong.

Wed, Oct 28, 10:52 AM · Patch-For-Review, Analytics-Clusters
elukey added a comment to T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet.
elukey@ganeti1011:~$   sudo gnt-group list
Group Nodes Instances AllocPolicy NDParams
row_A     4        36 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
row_B     6         9 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
row_C     4        37 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
row_D     4        11 preferred   ovs=False, ssh_port=22, ovs_link=, spindle_count=1, exclusive_storage=False, cpu_speed=1, ovs_name=switch1, oob_program=
Wed, Oct 28, 10:19 AM · vm-requests, Operations, Analytics-Clusters
elukey updated subscribers of T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet.
Wed, Oct 28, 10:19 AM · vm-requests, Operations, Analytics-Clusters
elukey triaged T266648: Create a ganeti VM in eqiad: an-test-ui1001.eqiad.wmnet as Medium priority.
Wed, Oct 28, 10:14 AM · vm-requests, Operations, Analytics-Clusters
elukey updated subscribers of T266644: Power supply lost for analytics1072.
Wed, Oct 28, 9:24 AM · ops-eqiad, Operations
elukey added a comment to T264176: Switch Zookeeper to profile::java.

One gotcha: conf1* is still on jessie (and consequently Java 7), and I don't think anything accounts for Java 7 yet

Wed, Oct 28, 9:24 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey created T266644: Power supply lost for analytics1072.
Wed, Oct 28, 9:23 AM · ops-eqiad, Operations
elukey added a parent task for T256108: Co-locate Presto with Hadoop worker nodes: T266639: Analytics Presto improvements .
Wed, Oct 28, 9:10 AM · Analytics-Clusters
elukey added a subtask for T266639: Analytics Presto improvements : T256108: Co-locate Presto with Hadoop worker nodes.
Wed, Oct 28, 9:10 AM · Analytics-Kanban, Analytics
elukey changed the status of T256108: Co-locate Presto with Hadoop worker nodes from Open to Stalled.

Pending T266641

Wed, Oct 28, 9:10 AM · Analytics-Clusters
elukey created T266641: Test Alluxio as cache layer for Presto.
Wed, Oct 28, 9:10 AM · Analytics
elukey created T266640: Decide to move or not to PrestoSQL.
Wed, Oct 28, 9:06 AM · Analytics
elukey set Final Story Points to 0 on T266639: Analytics Presto improvements .
Wed, Oct 28, 9:02 AM · Analytics-Kanban, Analytics
elukey triaged T266639: Analytics Presto improvements as Medium priority.
Wed, Oct 28, 9:02 AM · Analytics-Kanban, Analytics
elukey moved T264176: Switch Zookeeper to profile::java from Next Up to In Progress on the Analytics-Kanban board.
Wed, Oct 28, 8:40 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey added a project to T264176: Switch Zookeeper to profile::java: Analytics-Kanban.
Wed, Oct 28, 8:40 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey claimed T264176: Switch Zookeeper to profile::java.

Going to takeover the ownership of the task since I need to do some refactoring of some code that I have written :)

Wed, Oct 28, 8:40 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey placed T264176: Switch Zookeeper to profile::java up for grabs.
Wed, Oct 28, 8:39 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey moved T236740: Remove postal code and longitude / latitude from geocoded data object on webrequest data from Next Up to Done on the Analytics-Kanban board.
Wed, Oct 28, 7:08 AM · Analytics-Kanban, Product-Analytics, Analytics
elukey added a comment to T236740: Remove postal code and longitude / latitude from geocoded data object on webrequest data.

Moving this to done since everything seems already deployed.

Wed, Oct 28, 7:08 AM · Analytics-Kanban, Product-Analytics, Analytics
elukey moved T262660: Review and improve Oozie authorization permissions from Next Up to Paused on the Analytics-Kanban board.
Wed, Oct 28, 7:07 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey moved T264152: Fix Maxmind geoip database archive from Next Up to In Progress on the Analytics-Kanban board.
Wed, Oct 28, 7:07 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey moved T240439: Move https termination from nginx to envoy (if possible) from Next Up to In Progress on the Analytics-Kanban board.
Wed, Oct 28, 7:07 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey added a project to T240439: Move https termination from nginx to envoy (if possible): Analytics-Kanban.
Wed, Oct 28, 7:07 AM · Analytics-Kanban, Patch-For-Review, User-Elukey, Analytics
elukey closed T215171: Archival of home directories on servers with very large homes as Declined.

Declining this since we have been following another path over the past year and it worked well, will re-open if necessary.

Wed, Oct 28, 7:06 AM · Analytics, Patch-For-Review, User-Elukey, Operations
elukey added a comment to T216294: Hive log4j logging is misconfigured.

Hello Neil, sorry for this lag in following up. I have tested two things:

Wed, Oct 28, 7:00 AM · Analytics

Tue, Oct 27

elukey moved T255138: Put 6 GPU-based Hadoop worker in service from Q2 2020/2021 to Done on the Analytics-Clusters board.
Tue, Oct 27, 4:40 PM · Analytics-Kanban, Patch-For-Review, Analytics-Clusters
elukey moved T255140: Refresh 16 nodes in the Hadoop Analytics cluster from Q2 2020/2021 to Done on the Analytics-Clusters board.
Tue, Oct 27, 4:40 PM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey moved T255028: Move the stat1004-6-7 hosts to Debian Buster from Q2 2020/2021 to Done on the Analytics-Clusters board.
Tue, Oct 27, 4:40 PM · Analytics-Kanban, Analytics-Clusters
elukey closed T241192: Purchase of GPUs to help support the open source software stack on top of AMD GPUs (donation to Debian), a subtask of T241190: New Hadoop hardware. Refreshes and hosts with space for GPUs, as Declined.
Tue, Oct 27, 4:07 PM · Analytics-Kanban, User-Elukey, Analytics
elukey closed T241192: Purchase of GPUs to help support the open source software stack on top of AMD GPUs (donation to Debian) as Declined.

As far as I got this is not needed anymore, please reopen if necessary!

Tue, Oct 27, 4:07 PM · Analytics
elukey closed T264994: Check home/HDFS leftovers of leila, a subtask of T264472: Requesting access to researchers and analytics-privatedata-users for Leila Zia, as Resolved.
Tue, Oct 27, 3:59 PM · Operations, SRE-Access-Requests
elukey closed T264994: Check home/HDFS leftovers of leila as Resolved.

Tables dropped by Francisco today, this task is completed :)

Tue, Oct 27, 3:59 PM · Analytics
elukey updated subscribers of T266467: Check home/HDFS leftovers of rodolfovalentim.
====== stat1004 ======
total 0
ls: cannot access '/var/userarchive/rodolfovalentim.tar.bz2': No such file or directory
Tue, Oct 27, 3:57 PM · Analytics
elukey placed T204734: Deprecate Python 2 software from the Analytics infrastructure up for grabs.
Tue, Oct 27, 3:46 PM · Analytics-Kanban
elukey closed T258612: Performance Issues when running Spark/Hive jobs via Jupyter Notebooks as Resolved.

Closing this task since there seems to be no more action item left, please re-open if needed.

Tue, Oct 27, 1:52 PM · Analytics, Research-collaborations, Research
elukey reassigned T260411: Create a temporary hadoop backup cluster from elukey to razzi.
Tue, Oct 27, 1:22 PM · Patch-For-Review, Analytics-Clusters
elukey updated subscribers of T260411: Create a temporary hadoop backup cluster.

All the nodes (analytics1042 -> 1057) have new ext4 partitions for /var/lib/hadoop/data/$letter.

Tue, Oct 27, 1:22 PM · Patch-For-Review, Analytics-Clusters
elukey closed T266064: Site: 1 VM request for Analytics test cluster as Resolved.

This is done!

Tue, Oct 27, 8:04 AM · vm-requests, Operations
elukey closed T266064: Site: 1 VM request for Analytics test cluster, a subtask of T255139: Create the new Hadoop test cluster, as Resolved.
Tue, Oct 27, 8:04 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Mon, Oct 26

elukey added a comment to T265487: Review recurrent Hadoop worker disk saturation events.

As far as I can see from iotop's logs, a lot of the following are present when disks are saturated:

Mon, Oct 26, 4:28 PM · Analytics-Clusters
elukey moved T264896: Fix the remaining bugs open on for Hue next from Ready to Deploy to In Progress on the Analytics-Kanban board.
Mon, Oct 26, 10:24 AM · Analytics
elukey closed T262427: Add more metrics to prometheus-amd-rocm-stats Python script as Resolved.
Mon, Oct 26, 10:08 AM · Analytics-Kanban, Analytics-Clusters
elukey moved T264176: Switch Zookeeper to profile::java from Backlog to Q2 2020/2021 on the Analytics-Clusters board.
Mon, Oct 26, 10:07 AM · Analytics-Kanban, Analytics-Clusters, Operations
elukey moved T265126: Improve logging for HDFS Namenodes from Backlog to Q2 2020/2021 on the Analytics-Clusters board.
Mon, Oct 26, 10:07 AM · Analytics-Clusters
JAllemandou awarded T265487: Review recurrent Hadoop worker disk saturation events a Hungry Hippo token.
Mon, Oct 26, 8:45 AM · Analytics-Clusters
elukey updated the task description for T265487: Review recurrent Hadoop worker disk saturation events.
Mon, Oct 26, 8:36 AM · Analytics-Clusters
elukey added a comment to T265487: Review recurrent Hadoop worker disk saturation events.

There was an error on https://grafana.wikimedia.org/d/VSyI1AWMk/cluster-overview-thanos and the other one (user cluster overview), namely read/write metrics were supposed to be -/+ but in reality they were all +. I modified the dashboard, and I'll fix the description of the task as well.

Mon, Oct 26, 8:31 AM · Analytics-Clusters
JAllemandou awarded T257412: Review an-coord1001's usage and failover plans a Mountain of Wealth token.
Mon, Oct 26, 8:23 AM · Patch-For-Review, Analytics-Clusters
elukey updated the task description for T265487: Review recurrent Hadoop worker disk saturation events.
Mon, Oct 26, 7:57 AM · Analytics-Clusters
elukey added a comment to T265487: Review recurrent Hadoop worker disk saturation events.

I added some Datanode metrics to the Hadoop grafana dashboard, and started 3 iotop sessions (dumping to a file) on an-worker108[1-3] to get some idea about what processes are hammering the disks periodically.

Mon, Oct 26, 7:48 AM · Analytics-Clusters
elukey closed T265620: Rename an-scheduler1001 to an-coord1002 as Resolved.
Mon, Oct 26, 7:46 AM · Operations, Analytics-Clusters

Fri, Oct 23

elukey added a comment to T264994: Check home/HDFS leftovers of leila.

stat100x homes done (content moved under /home/leizi)

Fri, Oct 23, 2:23 PM · Analytics
elukey added a comment to T257412: Review an-coord1001's usage and failover plans.

Summary of actions done:

  • created a dns CNAME analytics-test-hive.eqiad.wmnet -> an-test-coord1001.eqiad.wmnet
  • created the kerberos principal hive/analytics-test-hive.eqiad.wmnet@WIKIMEDIA on krb1001
  • executed the following on krb1001:
kadmin.local ktadd -norandkey -k /srv/kerberos/keytabs/an-test-coord1001.eqiad.wmnet/hive/hive.keytab hive/analytics-test-hive.eqiad.wmnet@WIKIMEDIA
Fri, Oct 23, 1:45 PM · Patch-For-Review, Analytics-Clusters
elukey closed T265121: Check home/HDFS leftovers of rush, a subtask of T265147: Offboard Chase Pettet from Security Team, as Resolved.
Fri, Oct 23, 1:17 PM · Operations, Security-Team
elukey closed T265121: Check home/HDFS leftovers of rush as Resolved.

All stat100x homes cleaned up, HDFS home also cleaned up!

Fri, Oct 23, 1:17 PM · Analytics
elukey renamed T266322: Possible issue between Maxmind and Hive 2.x libs in Refinery source from Possible between Maxmind and Hive 2.x libs in Refinery source to Possible issue between Maxmind and Hive 2.x libs in Refinery source .
Fri, Oct 23, 10:23 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey created T266322: Possible issue between Maxmind and Hive 2.x libs in Refinery source .
Fri, Oct 23, 10:23 AM · Patch-For-Review, Analytics-Kanban, Analytics
elukey closed T265447: Check home/HDFS leftovers of joewalsh as Resolved.

All stat100x home dirs removed!

Fri, Oct 23, 7:05 AM · Analytics
elukey added a comment to T265121: Check home/HDFS leftovers of rush.

Sent an email to John to get a final confirmation.

Fri, Oct 23, 7:03 AM · Analytics
elukey added a comment to T264268: Check home/HDFS leftovers of nathante.

All stat100x home dirs purged, only hdfs/hive left!

Fri, Oct 23, 7:00 AM · Analytics
elukey updated the task description for T255145: Analytics Hardware for Fiscal Year 2020/2021.
Fri, Oct 23, 6:46 AM · Analytics-Kanban
elukey added a subtask for T255145: Analytics Hardware for Fiscal Year 2020/2021: Unknown Object (Task).
Fri, Oct 23, 6:46 AM · Analytics-Kanban
elukey added subtasks for T255145: Analytics Hardware for Fiscal Year 2020/2021: Unknown Object (Task), Unknown Object (Task), Unknown Object (Task), Unknown Object (Task).
Fri, Oct 23, 6:45 AM · Analytics-Kanban
elukey closed T264081: Increase in usage of /var/lib/mysql on an-coord1001 after Sept 21st as Resolved.

It seems way more stable now, closing for the moment :)

Fri, Oct 23, 6:43 AM · Analytics
elukey added a comment to T264269: Check home/HDFS leftovers of shiladsen.

Deleted all the home dirs on stat100x, only hdfs files are left :)

Fri, Oct 23, 6:41 AM · Analytics
elukey closed T263715: Check home/HDFS leftovers of jkumarah as Resolved.

homes deleted.

Fri, Oct 23, 6:38 AM · Analytics
elukey closed T243521: Hadoop Hardware Orders FY2019-2020, a subtask of T244211: Analytics Hardware for Fiscal Year 2019/2020, as Resolved.
Fri, Oct 23, 6:11 AM · Analytics
elukey closed T243521: Hadoop Hardware Orders FY2019-2020 as Resolved.
Fri, Oct 23, 6:11 AM · Analytics-Kanban, Analytics
elukey moved T258970: Set up environment for Product Analytics system user from In Progress to Paused on the Analytics-Kanban board.
Fri, Oct 23, 6:11 AM · Analytics-Kanban, Analytics, Product-Analytics
elukey moved T255140: Refresh 16 nodes in the Hadoop Analytics cluster from In Progress to Done on the Analytics-Kanban board.
Fri, Oct 23, 6:11 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey added a comment to T255140: Refresh 16 nodes in the Hadoop Analytics cluster.

All old nodes with removed from Hadoop!

Fri, Oct 23, 6:11 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters

Thu, Oct 22

elukey removed projects from T265971: Check data currently stored on thorium and drop what it is not needed anymore: Operations, procurement.
Thu, Oct 22, 4:30 PM · Analytics
elukey shifted T265971: Check data currently stored on thorium and drop what it is not needed anymore from the Restricted Space space to the S1 Public space.
Thu, Oct 22, 4:27 PM · Analytics
elukey closed T265969: Add sbisson to analytics-privatedata-users and create a kerberos identity as Resolved.
elukey@krb1001:~$ sudo manage_principals.py create sbisson --email_address=sbisson@wikimedia.org
Principal successfully created. Make sure to update data.yaml in Puppet.
Successfully sent email to sbisson@wikimedia.org
Thu, Oct 22, 2:19 PM · Operations, Analytics, SRE-Access-Requests
elukey added a comment to T257412: Review an-coord1001's usage and failover plans.

LVS inside the analytics vlan is problematic :(

Oh right we wanted to do that for druid long ago but couldn't. But why should it be! I don't remember why it didn't work but maybe we can work with traffic and fix it.

Thu, Oct 22, 1:00 PM · Patch-For-Review, Analytics-Clusters
elukey added a comment to T257412: Review an-coord1001's usage and failover plans.

@Ottomata LVS inside the analytics vlan is problematic :(

Thu, Oct 22, 12:41 PM · Patch-For-Review, Analytics-Clusters
elukey added a comment to T257412: Review an-coord1001's usage and failover plans.

After reading https://docs.cloudera.com/documentation/enterprise/latest/topics/admin_ha_hiveserver2.html I had an idea about a possible way forward, that may also be applied to other services like oozie.

Thu, Oct 22, 9:45 AM · Patch-For-Review, Analytics-Clusters
elukey added a comment to T255139: Create the new Hadoop test cluster.

Remaining steps:

Thu, Oct 22, 9:00 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey added a comment to T266064: Site: 1 VM request for Analytics test cluster.

@razzi yes this is expected, in the hiera config we have

Thu, Oct 22, 8:55 AM · vm-requests, Operations
elukey added a subtask for T255139: Create the new Hadoop test cluster: T266064: Site: 1 VM request for Analytics test cluster.
Thu, Oct 22, 8:52 AM · Patch-For-Review, Analytics-Kanban, Analytics-Clusters
elukey added a parent task for T266064: Site: 1 VM request for Analytics test cluster: T255139: Create the new Hadoop test cluster.
Thu, Oct 22, 8:52 AM · vm-requests, Operations