Page MenuHomePhabricator

Ottomata (Andrew Otto)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (248 w, 4 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Ottomata [ Global Accounts ]

Recent Activity

Yesterday

Ottomata added a subtask for T205319: Modern Event Platform: Stream Configuration Service: T227906: [WIP] RFC: Stream Configuration Service .
Mon, Jul 15, 5:04 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics
Ottomata added a parent task for T227906: [WIP] RFC: Stream Configuration Service : T205319: Modern Event Platform: Stream Configuration Service.
Mon, Jul 15, 5:04 PM · EventBus, Analytics
Ottomata moved T227132: issues with artifact cache in an-coord1001 from In Progress to Done on the Analytics-Kanban board.
Mon, Jul 15, 2:24 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata added a comment to T227132: issues with artifact cache in an-coord1001.

I've manually removed a bunch of old refinery artifact jar versions from the refinery deploy on notebook hosts to free up space.

Mon, Jul 15, 2:23 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata added a comment to T227132: issues with artifact cache in an-coord1001.

Docs updated: https://wikitech.wikimedia.org/wiki/Analytics/Systems/Cluster/Deploy/Refinery#Deploying_to_notebook*_hosts

Mon, Jul 15, 1:54 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata added a comment to T191231: RFC: Abstract schemas and schema changes.

Just came across this ticket after reading the TechCom radar email.

Mon, Jul 15, 1:45 PM · Patch-For-Review, User-Addshore, Core Platform Team (Code Health (TEC13)), TechCom-RFC, SQLite, Oracle Database, MSSQL, PostgreSQL, MediaWiki-Database, Epic
Ottomata added a comment to T206824: Make it possible to use $ref in JSONSchemas.

For posterity, this is being done in https://github.com/wikimedia/jsonschema-tools with json-schema-ref-parser.

Mon, Jul 15, 1:40 PM · Core Platform Team Backlog (Designing), Services (designing), Core Platform Team (Modern Event Platform (TEC2)), Analytics-EventLogging, EventBus, Analytics

Fri, Jul 12

Ottomata updated the task description for T227906: [WIP] RFC: Stream Configuration Service .
Fri, Jul 12, 8:50 PM · EventBus, Analytics
Ottomata updated the task description for T227906: [WIP] RFC: Stream Configuration Service .
Fri, Jul 12, 7:13 PM · EventBus, Analytics
Ottomata updated the task description for T227906: [WIP] RFC: Stream Configuration Service .
Fri, Jul 12, 7:12 PM · EventBus, Analytics
Ottomata updated the task description for T227906: [WIP] RFC: Stream Configuration Service .
Fri, Jul 12, 6:57 PM · EventBus, Analytics
Ottomata added a project to T227906: [WIP] RFC: Stream Configuration Service : EventBus.
Fri, Jul 12, 6:12 PM · EventBus, Analytics
Ottomata created T227906: [WIP] RFC: Stream Configuration Service .
Fri, Jul 12, 6:11 PM · EventBus, Analytics
Ottomata created T227896: Make oozie swift upload emit event to Kafka about swift object upload complete.
Fri, Jul 12, 3:39 PM · Research-Backlog, Operations, Discovery, Analytics
Ottomata added a comment to T215976: Data Dictionary for Core Metrics.

FYI, tools like Apache Atlas have Glossary features that allow for defining and linking fields in different datasets.

Fri, Jul 12, 1:26 PM · Product-Analytics, Better Use Of Data

Thu, Jul 11

Ottomata moved T227088: Make JSONSchema aware Refine merge in existing Hive schema to read data from In Progress to Ready to Deploy on the Analytics-Kanban board.
Thu, Jul 11, 10:13 PM · Analytics-Kanban, Analytics
Ottomata moved T227484: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data from In Code Review to Ready to Deploy on the Analytics-Kanban board.
Thu, Jul 11, 10:13 PM · Analytics-Kanban, Analytics
Ottomata moved T227132: issues with artifact cache in an-coord1001 from Next Up to Done on the Analytics-Kanban board.
Thu, Jul 11, 8:36 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata added a comment to T227132: issues with artifact cache in an-coord1001.

I believe this happens on an-coord1001 and notebook* hosts because their /srv partitions are relatively small. When the disk fills up during scap deploy, the scap deploy aborts, and does not remove old cached deploys. Upon future successful deploys it does.

Thu, Jul 11, 8:36 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata closed T227018: EventLogging Schema errors have increased ~6x as Resolved.

agree

Thu, Jul 11, 8:11 PM · User-Ryasmeen, VisualEditor, Readers-Web-Backlog (Tracking), Analytics, Mobile
Ottomata updated subscribers of T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main.

@DStrine, I need some help from someone who knows how to make changes to centralnotice campaigns to test that these events work after migration. Who should I ask?

Thu, Jul 11, 5:19 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata added a comment to T227018: EventLogging Schema errors have increased ~6x.

It's weird that you can't send a null value for a non-required field (and it complicates the instrumentation code a bit too).

I don't think it is weird, but I understand that it might not be obvious. null is a JSON datatype, just like string and number. If you wanted to allow a field to have both null or string, you'd have to use a union type. Please don't do that though! We don't support union types!

Thu, Jul 11, 5:14 PM · User-Ryasmeen, VisualEditor, Readers-Web-Backlog (Tracking), Analytics, Mobile
Ottomata updated subscribers of T227018: EventLogging Schema errors have increased ~6x.

Here are the top 10 offenders:

Thu, Jul 11, 3:10 PM · User-Ryasmeen, VisualEditor, Readers-Web-Backlog (Tracking), Analytics, Mobile
Ottomata added a comment to T219544: Make hadoop cluster able to push to swift .

@EBernhardson analytics-search user should now be able to access the auth file

Thu, Jul 11, 12:48 PM · Patch-For-Review, Analytics-Kanban, Research, Operations, Discovery, Analytics

Wed, Jul 10

Ottomata updated the task description for T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main.
Wed, Jul 10, 7:35 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata added a comment to T227164: Archive zookeeper puppet submodule.

single tear

Wed, Jul 10, 2:04 PM · Patch-For-Review, Analytics-Kanban, Operations, Cleanup, Analytics
Ottomata added a comment to T227164: Archive zookeeper puppet submodule.

Done

Wed, Jul 10, 2:04 PM · Patch-For-Review, Analytics-Kanban, Operations, Cleanup, Analytics

Tue, Jul 9

Ottomata triaged T227611: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN as High priority.
Tue, Jul 9, 8:25 PM · Analytics-Kanban, Analytics
Ottomata created T227611: LDAP ldap-ro.eqiad.wikimedia.org not reachable from Analytics VLAN.
Tue, Jul 9, 8:20 PM · Analytics-Kanban, Analytics
Ottomata moved T227088: Make JSONSchema aware Refine merge in existing Hive schema to read data from Next Up to In Progress on the Analytics-Kanban board.
Tue, Jul 9, 6:33 PM · Analytics-Kanban, Analytics
Ottomata moved T227484: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data from Next Up to In Code Review on the Analytics-Kanban board.
Tue, Jul 9, 6:33 PM · Analytics-Kanban, Analytics
Ottomata added a project to T227088: Make JSONSchema aware Refine merge in existing Hive schema to read data: Analytics-Kanban.
Tue, Jul 9, 6:32 PM · Analytics-Kanban, Analytics
Ottomata added a project to T227484: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data: Analytics-Kanban.
Tue, Jul 9, 5:56 PM · Analytics-Kanban, Analytics
Ottomata added a comment to T219544: Make hadoop cluster able to push to swift .

Eric needs the analytics-search user to be able to access the swift auth file so his Oozie jobs can upload to swift.

Tue, Jul 9, 5:05 PM · Patch-For-Review, Analytics-Kanban, Research, Operations, Discovery, Analytics
Ottomata added a comment to T176875: Allow access to wdqs.svc.eqiad.wmnet on port 8888.

Ah, hm ok.

Tue, Jul 9, 3:09 PM · Traffic, Wikidata-Query-Service, Operations, WMDE-Analytics-Engineering, User-Addshore, Wikidata, Discovery
Ottomata moved T227065: Move icinga alarm for the EventStreams external endpoint to SRE from Next Up to Done on the Analytics-Kanban board.
Tue, Jul 9, 2:40 PM · Analytics-Kanban, Wikimedia-Incident, Analytics, Operations
Ottomata added a comment to T176875: Allow access to wdqs.svc.eqiad.wmnet on port 8888.

@Addshore, just saw T218710 and clicked through to here. If you use https://wikitech.wikimedia.org/wiki/HTTP_proxy, you can access wdqs.svc.eqiad.wmnet over HTTP from the analytics VLAN.

Tue, Jul 9, 1:42 PM · Traffic, Wikidata-Query-Service, Operations, WMDE-Analytics-Engineering, User-Addshore, Wikidata, Discovery

Mon, Jul 8

Ottomata added a comment to T220615: Add ottomata to gerrit-managers group in Gerrit.

Oops I duplicated. Thanks for closing. :)

Mon, Jul 8, 8:33 PM · Gerrit-Privilege-Requests
Ottomata updated the task description for T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main.
Mon, Jul 8, 6:46 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata added a comment to T227217: Enable widgets on Jupyter Labs on SWAP.

We need to upgrade the OSes to Debian Buster.

Mon, Jul 8, 3:33 PM · Analytics-SWAP, Analytics, Product-Analytics
Ottomata renamed T227484: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data from Refine JsonSchemaLoader uses should use JsonParser instead of YAMLParser to load JSON data to Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data.
Mon, Jul 8, 2:34 PM · Analytics-Kanban, Analytics
Ottomata created T227484: Refine JsonSchemaLoader should use JsonParser instead of YAMLParser to load JSON data.
Mon, Jul 8, 2:33 PM · Analytics-Kanban, Analytics
Ottomata added a comment to T227288: eqiad: 1 misc node for the Kerberos KDC service.

+1 for 1 eqiad and 1 codfw

Mon, Jul 8, 1:45 PM · hardware-requests, Operations, User-Elukey, Analytics

Fri, Jul 5

mpopov awarded T170826: Enable base::firewall on stat boxes after restricting Spark REPL ports. a Like token.
Fri, Jul 5, 1:57 PM · Analytics-Kanban, User-Elukey, Analytics-Cluster, Analytics

Wed, Jul 3

Ottomata added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

Please decommission the current servers to spare role

Ok will do. I'll downtime the the hostnames in icinga when I do.

Wed, Jul 3, 8:38 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata updated the task description for T205319: Modern Event Platform: Stream Configuration Service.
Wed, Jul 3, 7:34 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics
Ottomata added a comment to T226698: Allow all Analytics tools to work with Kerberos auth.

Also, do we still need to rsync that data?

Ya, I believe so: https://dumps.wikimedia.org/other/pageviews/2019/2019-07/

Wed, Jul 3, 3:49 PM · Patch-For-Review, Analytics-Kanban, User-Elukey, Analytics
Ottomata added a comment to T170826: Enable base::firewall on stat boxes after restricting Spark REPL ports..

Also related: T111433

Wed, Jul 3, 3:44 PM · Analytics-Kanban, User-Elukey, Analytics-Cluster, Analytics
Ottomata added a comment to T205319: Modern Event Platform: Stream Configuration Service.

@Nuria, see comment https://phabricator.wikimedia.org/T205319#5300239. I'm trying to isolate stream config uses the from the larger problem of data governance. Part of the upcoming projects will include uses cases from this as well as T201063: Modern Event Platform: Schema Registry, including things like schema UIs. Atlas has a 'schema' UI and a search engine for schema and dataset discovery. There's overlap with stream config, but I'm not sure if stream config itself fits into something like Atlas...maybe we could use for the UI components of stream config? Really not sure.

Wed, Jul 3, 12:55 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics

Tue, Jul 2

Ottomata added a comment to T205319: Modern Event Platform: Stream Configuration Service.

Check out:

Tue, Jul 2, 10:01 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics
Ottomata added a comment to T205319: Modern Event Platform: Stream Configuration Service.

FYI, I wanted to know more about Apache Atlas, so I set it up a standalone on stat1004 and ran the Hive import process for the wmf and event databases. I added some glossary terms for 'user_agent' and 'ip', classified them as PII, tagged related fields, etc.

Tue, Jul 2, 9:58 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics
Ottomata added a comment to T227132: issues with artifact cache in an-coord1001.

This also often affects other hosts with relatively small /srv partitions, like notebook* hosts.

Tue, Jul 2, 6:55 PM · Analytics-Kanban, Release-Engineering-Team, Analytics
Ottomata moved T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. from Next Up to Paused on the Analytics-Kanban board.
Tue, Jul 2, 6:16 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata added a project to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN.: Analytics-Kanban.
Tue, Jul 2, 6:16 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata added a comment to T227025: (Need By: August 31) rack/setup/install (3) new zookeeper nodes.

I like an-conf. Also gives us the option to colocate something else on them if we need to one day.

Tue, Jul 2, 6:01 PM · Operations, ops-eqiad
Ottomata assigned T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN. to Cmjohnson.

Feel free to reassign

Tue, Jul 2, 4:09 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata added a comment to T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..

@Cmjohnson bump!

Tue, Jul 2, 4:08 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata moved T226668: Factor out eventgate-wikimedia factory into its own gerrit repo and use it for deployment pipeline from Next Up to In Progress on the Analytics-Kanban board.
Tue, Jul 2, 4:06 PM · Analytics-Kanban, Analytics, Core Platform Team Backlog (Watching / External), Services (watching), EventBus
Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

Ah still wrong. Full details.

Tue, Jul 2, 3:32 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata updated subscribers of T205319: Modern Event Platform: Stream Configuration Service.
  • As a product manager/analyst/engineer, I want to set the privacy whitelist settings of stream's event fields so that I can retain non-PII data for longer than 90 days.
  • As a product manager/analyst/engineer, I want to set and discover the ownership of schemas and streams so I can track governance over time and know when a stream can be decommissioned.
Tue, Jul 2, 2:39 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics
Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

Hm, I just noticed there are more eventstreams processors than I had thought. There are 6 scb nodes in codfw, and 4 nodes in eqiad, for a total of 208 processors between them. eventstreams is configured to spawn one worker per processor. As is, this won't help keeping the varnish connection pool from filling up.

Tue, Jul 2, 2:15 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata created T227088: Make JSONSchema aware Refine merge in existing Hive schema to read data.
Tue, Jul 2, 1:54 PM · Analytics-Kanban, Analytics
Ottomata added a comment to T226219: [BUG] Logging error of MobileWikiAppDailyStats for the iOS app.

when we refine we get the schema for the 1st record we find and we assume always backwards compatibility of schemas.

For EventLogging Hive, we actually use the latest schema. The latest schema is used to read the data, so if the data has fields that are not in the lastest schema, they will not be read. Removing fields is a backwards incompatible change.

Tue, Jul 2, 1:47 PM · Product-Analytics, Analytics
Ottomata added a comment to T126989: MediaWiki logging & encryption.

Hm, I don't think T183303 affected any encryption status of logs. The Avro logs we migrated to event gate just do an HTTP post to EventGate, and EventGate produces to Kafka unencrypted.

Tue, Jul 2, 1:43 PM · MW-1.33-notes (1.33.0-wmf.24; 2019-04-02), Patch-For-Review, observability, Wikimedia-Logstash, MediaWiki-Debug-Logger, Operations
Ottomata added a comment to T227065: Move icinga alarm for the EventStreams external endpoint to SRE.

+1 I think this alarm should alert SRE.

Tue, Jul 2, 1:19 PM · Analytics-Kanban, Wikimedia-Incident, Analytics, Operations

Mon, Jul 1

Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

Ok, all patches ready to go. Deployed in beta and looks good there. It is near the end of my day now, so I'll wait until tomorrow to deploy to production eventstreams.

Mon, Jul 1, 8:32 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata updated the task description for T185233: Modern Event Platform (TEC2).
Mon, Jul 1, 6:31 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata updated the task description for T185233: Modern Event Platform (TEC2).
Mon, Jul 1, 6:31 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata updated the task description for T201063: Modern Event Platform: Schema Registry.
Mon, Jul 1, 5:59 PM · Analytics, Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus
Ottomata updated the task description for T201063: Modern Event Platform: Schema Registry.
Mon, Jul 1, 5:59 PM · Analytics, Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus
Ottomata added a comment to T226724: Gerrit manager rights for Ottomata.

Thank you! I think that should be fine.

Mon, Jul 1, 5:09 PM · Release-Engineering-Team, Gerrit-Privilege-Requests
Ottomata added a comment to T226724: Gerrit manager rights for Ottomata.

BTW, I don't need 'admin' rights, I think just 'manager' rights. I want to be able to create repositories.

Mon, Jul 1, 5:01 PM · Release-Engineering-Team, Gerrit-Privilege-Requests
Ottomata added a project to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap: Wikimedia-Incident.
Mon, Jul 1, 3:47 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata added a comment to T217142: [WIP] [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors.

I think that having the client send as simple and immediately useful a message as possible should be a goal

Indeed, as we will likely also import these events into Hadoop/Hive for longer term querying and analysis. The fewer transformations we have to do, the better.

Mon, Jul 1, 2:59 PM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Patch-For-Review, User-herron, Reading-Infrastructure-Team-Backlog, Wikimedia-Logstash
Ottomata added a comment to T220542: Update R from 3.3.3 to 3.6.0 on stat and notebook machines.

Not sure, but we are also waiting for buster to upgrade Spark. When I asked Moritz before, he said Buster would be ready in a monthish time.

Mon, Jul 1, 2:26 PM · Analytics, Product-Analytics
Ottomata added a comment to T217041: Use Z UTC suffix in EventBus emitted events rather than +00:00.

As each event is migrated to the new Event Platform format in T211248, the timestamps will use the 'Z' suffix.

Mon, Jul 1, 1:52 PM · Analytics-Kanban, MW-1.34-notes (1.34.0-wmf.7; 2019-05-28), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Product-Analytics
Ottomata added a subtask for T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main: T217041: Use Z UTC suffix in EventBus emitted events rather than +00:00.
Mon, Jul 1, 1:52 PM · Patch-For-Review, MW-1.34-notes (1.34.0-wmf.10; 2019-06-18), Core Platform Team Backlog (Watching / External), Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata added a parent task for T217041: Use Z UTC suffix in EventBus emitted events rather than +00:00: T211248: Modern Event Platform: Stream Intake Service: Migrate Mediawiki Eventbus events to eventgate-main.
Mon, Jul 1, 1:52 PM · Analytics-Kanban, MW-1.34-notes (1.34.0-wmf.7; 2019-05-28), Core Platform Team Backlog (Watching / External), Services (watching), EventBus, Analytics, Product-Analytics
Ottomata moved T217040: Add UTC 'Z' suffix to webrequest `dt` field. from Next Up to Done on the Analytics-Kanban board.
Mon, Jul 1, 1:43 PM · Analytics-Kanban, Analytics, Product-Analytics
Ottomata added a project to T217040: Add UTC 'Z' suffix to webrequest `dt` field.: Analytics-Kanban.
Mon, Jul 1, 1:43 PM · Analytics-Kanban, Analytics, Product-Analytics
Ottomata added a comment to T226986: Client side error logging production launch.

Looks good, I can take the k8s task. I can start the schema but we'll need to bikeshed that one together.

Mon, Jul 1, 1:31 PM · Reading-Infrastructure-Team-Backlog, Epic, Analytics
Ottomata added a comment to T217142: [WIP] [Proposal] Use the Kafka-Logstash logging infrastructure to log client-side errors.

Not sure what's the right place for that: client, EventGate, logstash filter?

I think client is the right place. We need to bikeshed the actual error schema we will use. EventGate will only validate that the incoming events conform to the schema.

Mon, Jul 1, 1:29 PM · MW-1.34-notes (1.34.0-wmf.11; 2019-06-26), Patch-For-Review, User-herron, Reading-Infrastructure-Team-Backlog, Wikimedia-Logstash

Fri, Jun 28

Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

To hold us over on the weekend, I've manually blacklisted the offending IP in EventStreams code and deployed. We'll work on a better solution next week.

Fri, Jun 28, 9:22 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata added a project to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap: Security.
Fri, Jun 28, 8:58 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

Collected some info about which IPs were connecting on scb1001. Over a period of about 40 minutes:

Fri, Jun 28, 6:35 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata updated the task description for T185233: Modern Event Platform (TEC2).
Fri, Jun 28, 5:30 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata moved T226522: Modern Event Platform: Stream Intake Service: Migrate change-prop events to new (EventGate) style schemas from Backlog to In Progress on the EventBus board.
Fri, Jun 28, 5:26 PM · Services (later), Analytics, Core Platform Team Backlog (Watching / External), EventBus
Ottomata updated the task description for T185233: Modern Event Platform (TEC2).
Fri, Jun 28, 4:59 PM · Core Platform Team Backlog (Watching / External), Core Platform Team (Modern Event Platform (TEC2)), Goal, Services (watching), Analytics-EventLogging, EventBus, Analytics-Kanban
Ottomata added a comment to T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap.

EventStreams is hitting its concurrent connection limits of about 200 connections. We think this is probably due to a single client starting many connections, but aren't yet 100% sure about that. We are looking into it!

Fri, Jun 28, 4:39 PM · Wikimedia-Incident, Security, Services (watching), Analytics, Operations
Ottomata added a comment to T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production .

Oh ok, will do!

Fri, Jun 28, 3:49 PM · Patch-For-Review, Research-Backlog, Operations, Discovery, Analytics
Ottomata added a comment to T213976: Workflow to be able to move data files computed in jobs from analytics cluster to production .

Great! @fgiunchedi you said 'that is something we'd have to deploy first'. Can I use this now?

Fri, Jun 28, 3:22 PM · Patch-For-Review, Research-Backlog, Operations, Discovery, Analytics
Ottomata added a comment to T225005: Replace and expand codfw kafka main hosts (kafka200[123]) with kafka-main200[12345].

Hm @herron, today we experienced T226808: Eventstreams in codfw down for several hours due to kafka2001 -> kafka-main2001 swap, which I think is caused by the fact that the eventstreams service has service::node auto_refresh => false. I forgot about this. eventstreams should be depooled, puppet run, and restarted for each new server. Same goes for change-prop, and possibly change-prop-job-queue. Sorry for not catching this when I reviewed the migration plan.

Fri, Jun 28, 1:31 PM · Patch-For-Review, Services (watching), Core Platform Team Backlog (Watching / External), Analytics, EventBus, User-herron, Operations

Thu, Jun 27

Ottomata renamed T226724: Gerrit manager rights for Ottomata from Gerrit admin permissions for Ottomata to Gerrit manager rights for Ottomata.
Thu, Jun 27, 5:58 PM · Release-Engineering-Team, Gerrit-Privilege-Requests
Ottomata assigned T223414: Move reportupdater reports that pull data from eventlogging mysql to pull data from hadoop to fdans.
Thu, Jun 27, 5:47 PM · Analytics, Analytics-EventLogging
Ottomata claimed T159170: Sunset MySQL data store for eventlogging.
Thu, Jun 27, 5:47 PM · Analytics, Analytics-EventLogging
Ottomata added a parent task for T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN.: T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users.
Thu, Jun 27, 5:26 PM · Analytics-Kanban, ops-eqiad, Operations, netops, Analytics
Ottomata added a subtask for T204950: Public Edit Data Lake: Mediawiki history snapshots available in SQL data store to cloud (labs) users: T225128: Move cloudvirtan* hardware out of CloudVPS back into production Analytics VLAN..
Thu, Jun 27, 5:26 PM · Cloud-Services, Analytics-Kanban
Ottomata added a comment to T226724: Gerrit manager rights for Ottomata.

Gerrit manager sounds fine!

Thu, Jun 27, 4:27 PM · Release-Engineering-Team, Gerrit-Privilege-Requests
Ottomata created T226724: Gerrit manager rights for Ottomata.
Thu, Jun 27, 2:58 PM · Release-Engineering-Team, Gerrit-Privilege-Requests
Ottomata moved T226668: Factor out eventgate-wikimedia factory into its own gerrit repo and use it for deployment pipeline from Backlog to Next Up on the EventBus board.
Thu, Jun 27, 2:53 PM · Analytics-Kanban, Analytics, Core Platform Team Backlog (Watching / External), Services (watching), EventBus