Ottomata (Andrew Otto)
User

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (171 w, 3 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Unknown

Recent Activity

Fri, Jan 19

Ottomata added a comment to T181036: Pull netflow data in realtime from Kafka via Tranquillity/Spark.

HMMM. If this is JSON data, and the schema is consistent, we could use JSONRefine to build the table, rather than doing all those Hive table/oozie job steps.

Fri, Jan 19, 5:53 PM · Analytics-Kanban, User-Elukey, monitoring, netops, Operations
Ottomata added a comment to T185291: Verify duplicate entry warnings logged by the m4 mysql consumer.

Hm, this sounds right to me. The question now is why is the processor process restarting?

Fri, Jan 19, 2:00 PM · Analytics-Kanban, User-Elukey, Analytics-EventLogging
Ottomata added a comment to T166248: Upgrade Analytics Cluster to Java 8.

Wow nice etherpad plan, <3

Fri, Jan 19, 1:58 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster

Thu, Jan 18

Ottomata triaged T185262: Add IPv6 addresses for kafka-jumbo hosts as Normal priority.
Thu, Jan 18, 9:58 PM · Analytics-Kanban, Analytics-Cluster
Ottomata added a comment to T180105: Set up a statsv-like endpoint for Prometheus.

Hm, could we possibly use EventLogging (or similar?) system for this? Incoming valid EventLogging data goes to a Kafka topic anyway. The data would then be available in Hive/Hadoop/Spark for historical querying (although we'd have to whitelist it from purging). The Kafka topic could then be consumed by some process that would then somehow emit to (or be pulled from) Prometheus. Perhaps a streaming aggregator of some kind? The proposed Stream Data Platform program (final name still TBD) next year might make this kinda stuff way easier.

Thu, Jan 18, 9:06 PM · Performance-Team
Ottomata added a comment to T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster..

Another question, does kafka use a different port for TLS service?

Yes, :9093.

Thu, Jan 18, 8:58 PM · Analytics-Kanban, Analytics-Cluster
Ottomata claimed T185237: Lookout for duplicates in EL refine .
Thu, Jan 18, 6:19 PM · Analytics, Analytics-EventLogging
Ottomata added a comment to T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster..

first step on the frack side is to whitelist the new hosts at the firewalls, can you point me to the list and I'll add a phabricator task?

kafka-jumbo100[1-6].eqiad.wmnet

Thu, Jan 18, 4:44 PM · Analytics-Kanban, Analytics-Cluster
Ottomata triaged T185225: Move EventStreams to new jumbo cluster. as Normal priority.
Thu, Jan 18, 4:15 PM · Analytics-Kanban, Patch-For-Review, Analytics-Cluster
Ottomata closed T174742: deployment-kafka01 - disk is full as Resolved.

deployment-kafka01 has been deleted.

Thu, Jan 18, 4:09 PM · Analytics-Kanban, Beta-Cluster-Infrastructure
Ottomata closed T184235: Puppet broken on deployment-kafka03 due to full disk as Resolved.

deployment-kafka03 has been deleted.

Thu, Jan 18, 4:08 PM · Analytics, Puppet, Beta-Cluster-Infrastructure
Ottomata closed T184235: Puppet broken on deployment-kafka03 due to full disk, a subtask of T132259: Deployment-prep hosts with puppet errors (tracking), as Resolved.
Thu, Jan 18, 4:08 PM · Puppet, Tracking, Beta-Cluster-Infrastructure
Ottomata closed T184235: Puppet broken on deployment-kafka03 due to full disk, a subtask of T152015: Provision new Kafka cluster(s) with security features, as Resolved.
Thu, Jan 18, 4:08 PM · Patch-For-Review, Analytics-Kanban, Analytics-Cluster
Ottomata added a comment to T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster..
I'm guessing we're talking about a new pool of kafka hosts

Yup! Mostly just changing settings and bouncing the kafkatee instances, but we'll have to coordinate it. If yall use any offset storage features of kafkatee, we'll have to wipe those and start with new offsets.

Thu, Jan 18, 3:46 PM · Analytics-Kanban, Analytics-Cluster
Ottomata added a comment to T184482: analytics VPS project puppet errors.

j1 deleted!

Thu, Jan 18, 3:41 PM · Analytics-Kanban, User-Elukey, Puppet

Wed, Jan 17

Ottomata updated subscribers of T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster..

@Jgreen FYI, we'll need to coordinate this soon :)

Wed, Jan 17, 9:08 PM · Analytics-Kanban, Analytics-Cluster
Ottomata renamed T126494: Send Mediawiki Kafka logs to Kafka jumbo cluster with TLS encryption from Look into encrypting logs sent between mediawiki app servers and kafka to Send Mediawiki Kafka logs to Kafka jumbo cluster with TLS encryption.
Wed, Jan 17, 9:00 PM · Patch-For-Review, Discovery, Analytics
Ottomata claimed T182993: TLS security review of the Kafka stack.
Wed, Jan 17, 8:56 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata updated the task description for T182993: TLS security review of the Kafka stack.
Wed, Jan 17, 8:56 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata added a comment to T175461: Port Kafka clients to new jumbo cluster.

If all is still well tomorrow, I will delete the analytics instances in deployment-prep.

Wed, Jan 17, 8:55 PM · Patch-For-Review, Analytics-Kanban, Analytics-Cluster
Ottomata added a comment to T175461: Port Kafka clients to new jumbo cluster.

Yeehaw, FYI, all Kafka clients have been ported from analytics to jumbo in deployment-prep in Cloud VPS. EventLogging was a breeze there.

Wed, Jan 17, 8:55 PM · Patch-For-Review, Analytics-Kanban, Analytics-Cluster
Ottomata updated the task description for T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster..
Wed, Jan 17, 8:47 PM · Analytics-Kanban, Analytics-Cluster
Ottomata triaged T185136: Move webrequest varnishkafka and consumers to Kafka jumbo cluster. as Normal priority.
Wed, Jan 17, 8:42 PM · Analytics-Kanban, Analytics-Cluster
Ottomata updated subscribers of T170878: Audit users and account expiry dates for stat boxes.

FYI: @Samwalton9 and @Samtar, your access expired on 2018-01-01 and your accounts have been removed. Thanks! :)

Wed, Jan 17, 6:04 PM · User-Elukey, Patch-For-Review, Analytics-Kanban, Analytics-Cluster
Ottomata added a comment to T176126: Update node-rdkafka version to v2.x.

I'd prefer to pin the version in puppet, then restrict it everywhere for SCB. If we fix up that patch to something more acceptable, would you be ok with that @mobrovac?

Wed, Jan 17, 2:50 PM · Patch-For-Review, Services (blocked), Analytics, EventBus, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp

Tue, Jan 16

Ottomata merged task T157977: Upgrade druid into T164008: Update druid to latest release (0.10).
Tue, Jan 16, 9:04 PM · Analytics-Kanban
Ottomata merged T157977: Upgrade druid into T164008: Update druid to latest release (0.10).
Tue, Jan 16, 9:04 PM · Analytics, Patch-For-Review
Ottomata added a comment to T166341: SSDs for main Kafka clusters.

Do we still want to do this?

Tue, Jan 16, 9:03 PM · Services (watching), hardware-requests, User-mobrovac, Analytics, Operations, EventBus
Ottomata added a comment to T176126: Update node-rdkafka version to v2.x.

It is also installed on cp1008.wikimedia.org (cache canary) and used by varnishkafka there.

Tue, Jan 16, 9:01 PM · Patch-For-Review, Services (blocked), Analytics, EventBus, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp
Ottomata added a comment to T176126: Update node-rdkafka version to v2.x.

We've also got librdkafka 0.11 backported for Jessie in our apt repo now too. I don't see any blockers to using it. I've tested EventStreams with it locally.

Tue, Jan 16, 9:00 PM · Patch-For-Review, Services (blocked), Analytics, EventBus, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp
Ottomata updated the task description for T176126: Update node-rdkafka version to v2.x.
Tue, Jan 16, 8:59 PM · Patch-For-Review, Services (blocked), Analytics, EventBus, Trending-Service, Reading-Infrastructure-Team-Backlog, ChangeProp
Ottomata added a comment to T171482: Programmatic generation of grafana dashboards.

BTW, +1 for this. It'd be especially cool if we applied the same puppet profile in labs and got the same grafana dashboards there.

Tue, Jan 16, 8:15 PM · Graphite, User-fgiunchedi, monitoring, Operations
Ottomata triaged T171482: Programmatic generation of grafana dashboards as Normal priority.
Tue, Jan 16, 8:14 PM · Graphite, User-fgiunchedi, monitoring, Operations
Ottomata triaged T174596: dmz_cidr only includes some wikimedia public IP ranges, leading to some very strange behaviour as Low priority.
Tue, Jan 16, 8:14 PM · netops, Cloud-VPS, Operations
Ottomata triaged T178445: flapping monitoring for recommendation_api on scb as Normal priority.
Tue, Jan 16, 8:13 PM · Discovery, Recommendation-API, Wikidata, Services (watching), Operations, monitoring
Ottomata triaged T178628: Improve puppet alerting as Normal priority.
Tue, Jan 16, 8:11 PM · Puppet, Operations
Ottomata triaged T179787: upload.wikimedia.org reports wrong mimetype for svg as Normal priority.
Tue, Jan 16, 8:10 PM · Operations, media-storage
Ottomata triaged T180023: [DRAFT][RfC] Deployment of python applications in production as Normal priority.
Tue, Jan 16, 8:10 PM · Release-Engineering-Team (Watching / External), User-Joe, Operations
Ottomata triaged T180330: Add CI to all operations/* repositories and archive obsolete ones as Normal priority.
Tue, Jan 16, 8:10 PM · Patch-For-Review, Operations, Continuous-Integration-Config
Ottomata triaged T180628: Install git-lfs client (at least on scap targets & masters) as Normal priority.
Tue, Jan 16, 8:09 PM · Operations, Scap
Ottomata triaged T180921: Referrer policy for browsers which only support the old spec as Normal priority.
Tue, Jan 16, 8:09 PM · MW-1.31-release-notes (WMF-deploy-2018-01-02 (1.31.0-wmf.15)), Patch-For-Review, Browser-Support-Microsoft-Edge, Browser-Support-Apple-Safari, Research, Privacy, Security-General, Analytics, Operations, Traffic
Ottomata triaged T181205: let quarry use the mariadb module as Normal priority.
Tue, Jan 16, 8:09 PM · cloud-services-team (Kanban), Operations, Quarry
Ottomata triaged T181546: Let the ORES application set log severity, not uWSGI as Normal priority.
Tue, Jan 16, 8:08 PM · Operations, Scoring-platform-team
Ottomata triaged T181559: Investigate redis-cluster or other techniques for making Redis not a single point of failure. as Normal priority.
Tue, Jan 16, 8:08 PM · Wikimedia-Incident, Operations, Scoring-platform-team
Ottomata triaged T181621: What is causing ORES celery workers to suddenly require more CPU? as Normal priority.
Tue, Jan 16, 8:08 PM · Wikimedia-Incident, Operations, Scoring-platform-team
Ottomata triaged T181630: Send celery and wsgi service logs to logstash as Normal priority.
Tue, Jan 16, 8:07 PM · Wikimedia-Logstash, monitoring, Wikimedia-Incident, Operations, Scoring-platform-team
Ottomata triaged T181632: Celery manager implodes horribly if Redis goes down as Normal priority.
Tue, Jan 16, 8:07 PM · Wikimedia-Incident, Operations, Scoring-platform-team
Ottomata triaged T183902: Swift invalid range requests causing 501s as Normal priority.
Tue, Jan 16, 8:07 PM · User-fgiunchedi, Traffic, media-storage, Operations
Ottomata triaged T184186: Fix unknown variables warning that occur with puppet 4.x as Normal priority.
Tue, Jan 16, 8:06 PM · Operations, Puppet-infrastructure-modernization, Puppet
Ottomata added a comment to T184473: Requesting access to Production Shell for cy534.

Ah, the doc was incorrect, analytics-users gives access to both stat1004 and stat1005. Just updated the doc.

Tue, Jan 16, 8:06 PM · Operations, Ops-Access-Requests
Ottomata added a comment to T181855: scap support for git-lfs.

K cool, sounds good :)

Tue, Jan 16, 8:02 PM · Release-Engineering-Team (Next), Operations, Scap, ORES, Scoring-platform-team
Ottomata triaged T184245: Create some mechanism for instances in projects to modify the project Designate records as Normal priority.
Tue, Jan 16, 7:43 PM · Operations, DNS, Beta-Cluster-reproducible, Cloud-VPS
Ottomata triaged T184444: Puppet hosts with their cert revoked can still run puppet as High priority.
Tue, Jan 16, 7:43 PM · Patch-For-Review, Puppet, Operations
Ottomata triaged T184522: To purchase for next esams visit as Normal priority.
Tue, Jan 16, 7:42 PM · ops-esams, Operations
Ottomata triaged T184655: logstash group1 dashboard incorrectly shows testwikidatawiki as Normal priority.
Tue, Jan 16, 7:42 PM · Operations, Wikimedia-Logstash
Ottomata triaged T184714: Puppet fail to properly refresh Icinga as Normal priority.

I suppose a restart without a configcheck would be dangerous, right? So just changing the subscribe behavior of the puppet service isn't quite right. Should we add an exec that does something like configtest && restart with refreshonly, and then notify the exec on config change?

Tue, Jan 16, 7:42 PM · monitoring, Operations
Ottomata triaged T184797: Move mariadb_maintenance away from terbium/wasat (mediawiki_maintenance) as Normal priority.
Tue, Jan 16, 7:40 PM · Patch-For-Review, Puppet, Operations, DBA
Ottomata triaged T184805: Move some wikis to s5 as Normal priority.
Tue, Jan 16, 7:39 PM · wikitech.wikimedia.org, cloud-services-team, Release-Engineering-Team, DBA, Operations
Ottomata triaged T184833: Inconsistent behavior when fetching redirected pages with Cache-Control header as Normal priority.
Tue, Jan 16, 7:39 PM · Services (done), Traffic, Operations, Reading-Infrastructure-Team-Backlog, RESTBase, Page Content Service, Wikipedia-Android-App-Backlog (Android-app-release-v2.7.23x-G)
Ottomata triaged T185024: HHVM 3.18.5+dfsg-1+wmf3 changes parse_url causing unit tests to fail as Normal priority.
Tue, Jan 16, 7:39 PM · Operations, Continuous-Integration-Infrastructure, HHVM
Ottomata triaged T181801: "Error: 404, Requested domainname does not exist" when accessing Commons categories/images; works on mobile page as Normal priority.
Tue, Jan 16, 7:39 PM · Operations, Traffic, media-storage
Ottomata triaged T181855: scap support for git-lfs as Normal priority.

Just curious, why not use git fat? We have a git-fat store available already, and it can be used by scap:

Tue, Jan 16, 7:39 PM · Release-Engineering-Team (Next), Operations, Scap, ORES, Scoring-platform-team
Ottomata triaged T181967: Update puppet code to conform to puppet 4.x and later standards as Normal priority.
Tue, Jan 16, 7:36 PM · User-Joe, Puppet, Operations
Ottomata triaged T181988: Investigate and improve memory allocation rates of WDQS as Normal priority.
Tue, Jan 16, 7:36 PM · Discovery, Wikidata, Operations, Discovery-Wikidata-Query-Service-Sprint, Wikidata-Query-Service
Ottomata triaged T182028: DNS repo: add CI checks for obvious configuration errors as Normal priority.
Tue, Jan 16, 7:35 PM · Operations-Software-Development, Operations
Ottomata triaged T182085: Requesting access to swift for Phabricator's git-lfs storage as Normal priority.
Tue, Jan 16, 7:35 PM · Operations, media-storage
Ottomata triaged T182171: rack/setup/install lvs500[123] as Normal priority.
Tue, Jan 16, 7:35 PM · ops-eqsin, Operations
Ottomata triaged T182203: Tuning profile::ores::celery parameters should cause a Celery service restart as Normal priority.
Tue, Jan 16, 7:35 PM · ORES, Operations, Scoring-platform-team
Ottomata triaged T182331: [Epic] Deploy ORES in kubernetes cluster as Low priority.
Tue, Jan 16, 7:34 PM · Operations, ORES, Scoring-platform-team
Ottomata triaged T182699: Use firmware-enriched Debian installation images as Normal priority.
Tue, Jan 16, 7:34 PM · Operations
Ottomata triaged T182812: Forward security@tools.wmflabs.org to security@wikimedia.org as Normal priority.
Tue, Jan 16, 7:34 PM · Toolforge, Security, Mail, Operations
Ottomata triaged T182822: Generate a list of files that are supposed to exist but 404s as Normal priority.
Tue, Jan 16, 7:33 PM · Operations, Multimedia, media-storage, Commons
Ottomata triaged T182955: Decommission kafka1018 as Normal priority.
Tue, Jan 16, 7:32 PM · Analytics, Operations, ops-eqiad
Ottomata triaged T183390: unrack/decom pfw1-eqiad and pfw2-eqiad as Normal priority.
Tue, Jan 16, 7:32 PM · hardware-requests, netops, Operations, ops-eqiad
Ottomata triaged T183454: Deprovision Diamond collectors no longer in use as Normal priority.
Tue, Jan 16, 7:32 PM · User-fgiunchedi, monitoring, Operations
Ottomata triaged T183814: Degraded RAID on bast3002 as Normal priority.
Tue, Jan 16, 7:31 PM · ops-esams, Operations
Ottomata triaged T176437: puppet ca_server confusion as Normal priority.
Tue, Jan 16, 7:31 PM · cloud-services-team (Kanban), Operations
Ottomata triaged T177099: Large number of "A page you created was linked on Wikidata" emails to one recipient in short period of time as Low priority.
Tue, Jan 16, 7:29 PM · Wikidata, Operations, Mail
Ottomata triaged T174172: unused grafana-dashboard indices on elasticsearch / logstash as Low priority.
Tue, Jan 16, 7:28 PM · Graphite, Operations
Ottomata triaged T174269: Two cases of local-multiwrite storage backend failure as Normal priority.
Tue, Jan 16, 7:27 PM · Operations, media-storage
Ottomata triaged T126295: Spike: What do we have to package to run the Programs and Events dashboard on production? as Normal priority.
Tue, Jan 16, 7:26 PM · Spike, Programs-and-Events-Dashboard-Sprint 2, Operations, Education-Program-Dashboard
Ottomata removed a project from T178778: Parsoid, VisualEditor not working with SSL / HTTPS: Operations.
Tue, Jan 16, 7:24 PM · HTTPS, Parsoid, VisualEditor
Ottomata triaged T128821: reclaim and return all cisco servers as Normal priority.
Tue, Jan 16, 7:23 PM · DBA, Goal, hardware-requests, Operations
Ottomata added a comment to T184473: Requesting access to Production Shell for cy534.

Just to confirm, since this specifically mentions 'pageviews' not 'webrequests', it is likely that analytics-users will be sufficient. Aggregated pageviews are generally public data.

Tue, Jan 16, 5:03 PM · Operations, Ops-Access-Requests
Ottomata added a comment to T184582: Request access to analytics cluster for bawolff.

I believe just analytics-privatedata-users would be appropriate for this access.

Tue, Jan 16, 5:01 PM · Patch-For-Review, Operations, Ops-Access-Requests
Ottomata added a comment to T184838: Requesting access to stat1004, stat1005, stat1006 for mneisler.

analytics-privatedata-users and researchers is probably appropriate here.

Tue, Jan 16, 5:00 PM · Ops-Access-Requests, Operations

Mon, Jan 15

Ottomata added a comment to T184501: What to do with deployment-sca03?.

+1

Mon, Jan 15, 6:31 PM · Release-Engineering-Team, Recommendation-API, Beta-Cluster-Infrastructure, Scoring-platform-team (Current)

Thu, Jan 11

Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Current status:

Thu, Jan 11, 10:40 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Oook, I've set this [restricted certpath algorithms] on all jumbo Kafka brokers.

Thu, Jan 11, 7:47 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata reassigned T184551: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002. from Ottomata to faidon.

Great, that'll do just fine! Assigned to @faidon for approval.

Thu, Jan 11, 6:23 PM · Analytics, hardware-requests, Operations
Ottomata added a comment to T166248: Upgrade Analytics Cluster to Java 8.

/usr/lib/bigtop-utils/bigtop-detect-javahome, that seems to favor java7 over java8.

Strange that it favors Java 7 even if update-java-alternatives chooses Java 8. Hm.

Thu, Jan 11, 5:05 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics-Cluster
Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Here's a Q:

Thu, Jan 11, 5:00 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata added a comment to T184713: EventStreams doesnt find any messages anymore.

event.data in python 2 is an instance of unicode

Hm, you are right.

Thu, Jan 11, 4:28 PM · Analytics-Kanban, EventBus, Pywikibot-core
Ottomata added a comment to T184713: EventStreams doesnt find any messages anymore.

Ah, I did deploy EventStreams yesterday for T171011. I don't know exactly what caused this change, but I think the event.data is now utf-8 encoded. I don't know why it wouldn't have been before but is now. All tests in JavaScript pass, I think I'll need to add a python based on to the EventStreams test suite.

Thu, Jan 11, 2:54 PM · Analytics-Kanban, EventBus, Pywikibot-core

Wed, Jan 10

Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Oook, I've set this on all jumbo Kafka brokers. @BBlack anything else?

Wed, Jan 10, 4:33 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Does that mean SHA1 is disabled, except in the cases that it's the root cert of a chain stored in the jdkCA's default store (e.g. list of public CAs)?

Wed, Jan 10, 2:53 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations

Tue, Jan 9

Ottomata added a comment to T182993: TLS security review of the Kafka stack.

OO I have done some research!

Tue, Jan 9, 9:49 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations
Ottomata moved T182688: Make superset more scalable from In Progress to Done on the Analytics-Kanban board.
Tue, Jan 9, 7:32 PM · Analytics-Kanban, Patch-For-Review
Ottomata added a comment to T182688: Make superset more scalable.

If we do celery workers, it will be as a different task.

Tue, Jan 9, 7:32 PM · Analytics-Kanban, Patch-For-Review
Ottomata added a project to T184551: EQIAD: (1) hardware request for eventlog1001 replacement - eventlog1002.: Analytics.
Tue, Jan 9, 6:57 PM · Analytics, hardware-requests, Operations