Ottomata (Andrew Otto)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (196 w, 6 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Unknown

Recent Activity

Tue, Jul 10

Ottomata added a comment to T190443: Spark Jupyter Notebook integration.

@diego, joal and I talked today, and we indeed decided to ditch Toree for PySpark, and just go with the ipython kernel + spark integration.
I just installed this on both notebook1003 and 1004, replacing the Toree PySpark kernels. They seem to work better now. Try em out!

Tue, Jul 10, 8:48 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

I just published an incident report for this: https://wikitech.wikimedia.org/wiki/Incident_documentation/20180705-EventLogging-in-Hive

Tue, Jul 10, 6:19 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a project to T198908: Alarms on throughput on refined data : Wikimedia-Incident.
Tue, Jul 10, 6:18 PM · Wikimedia-Incident, cloud-services-team (Kanban), Analytics
Ottomata added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

I've updated the task description with some recent comments from Marko and others. I've mentioned other binary type formats, but made an argument that given the goals of Modern Event Platform, we shouldn't consider options other than JSONSchema and Avro.

Tue, Jul 10, 4:50 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata updated the task description for T198256: RFC: Modern Event Platform - Choose Schema Tech.
Tue, Jul 10, 4:45 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics

Mon, Jul 9

Ottomata raised the priority of T190443: Spark Jupyter Notebook integration from Low to Normal.
Mon, Jul 9, 8:42 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata renamed T190443: Spark Jupyter Notebook integration from Spark notebook integration to Spark Jupyter Notebook integration.
Mon, Jul 9, 8:42 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata moved T190443: Spark Jupyter Notebook integration from In Code Review to In Progress on the Analytics-Kanban board.
Mon, Jul 9, 8:42 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata updated the task description for T190443: Spark Jupyter Notebook integration.
Mon, Jul 9, 8:41 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T198909: Errors with the new SWAP notebooks.

I think we got this working! I just merged this task into T190443: Spark Jupyter Notebook integration so we can keep tracking more issues there.

Mon, Jul 9, 8:41 PM · Patch-For-Review, Analytics
Ottomata merged task T198909: Errors with the new SWAP notebooks into T190443: Spark Jupyter Notebook integration.
Mon, Jul 9, 8:40 PM · Patch-For-Review, Analytics
Ottomata merged T198909: Errors with the new SWAP notebooks into T190443: Spark Jupyter Notebook integration.
Mon, Jul 9, 8:40 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T199131: Please install Text::CSV_XS at stat1005 .

Done!

Mon, Jul 9, 8:14 PM · Analytics-Kanban, Analytics, Patch-For-Review, Operations
Ottomata moved T199131: Please install Text::CSV_XS at stat1005 from Next Up to Done on the Analytics-Kanban board.
Mon, Jul 9, 8:14 PM · Analytics-Kanban, Analytics, Patch-For-Review, Operations
Ottomata claimed T199131: Please install Text::CSV_XS at stat1005 .
Mon, Jul 9, 8:14 PM · Analytics-Kanban, Analytics, Patch-For-Review, Operations
Ottomata added a comment to T199131: Please install Text::CSV_XS at stat1005 .

Ah a deb package! thanks @Reedy!

Mon, Jul 9, 7:47 PM · Analytics-Kanban, Analytics, Patch-For-Review, Operations
Ottomata added a comment to T190443: Spark Jupyter Notebook integration.

BTW, the PySpark on YARN notebook needs PYSPARK_PYTHON manually set to the same value of PYTHON_EXEC, e.g. /usr/bin/python3 to avoid version errors. Here's a working Toree jupyter kernel.json for PySpark in YARN:

Mon, Jul 9, 7:45 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T198909: Errors with the new SWAP notebooks.

K, we looking into it.

Mon, Jul 9, 7:25 PM · Patch-For-Review, Analytics
Ottomata added a comment to T198909: Errors with the new SWAP notebooks.

Alright, I think I got this. Since we set the python executable for pyspark on yarn, we also need to explicitly set the PYTHONPATH for executors too:

Mon, Jul 9, 7:02 PM · Patch-For-Review, Analytics
Ottomata added a comment to T199131: Please install Text::CSV_XS at stat1005 .

More info please! What is Text/CSV_XS ?

Mon, Jul 9, 5:45 PM · Analytics-Kanban, Analytics, Patch-For-Review, Operations
Ottomata added a comment to T185233: Modern Event Platform (with EventLogging of the Future (EoF)).

T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change Just made me think of another need: auditing. At the very least, we should have a stream processing job that simply counts the number of messages per topic (or event-topic grouping) per hour, and emits them to another topic. This would make it easy to write a verification/monitoring job to alert if events don't show up in expected places.

Mon, Jul 9, 2:45 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Ottomata added a comment to T198070: Varnishkafka eventlogging instances delivery failures.

Hm, didn't quite realize we didn't already compress snappy. I think it'd be wise to just enable this for all producers by default whenever we can.

Mon, Jul 9, 1:17 PM · Patch-For-Review, Analytics, User-Elukey, Analytics-Kanban

Sun, Jul 8

Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
18/07/07 03:43:45 INFO Refine: Successfully refined 151 of 151 dataset partitions into table `event`.`MobileWikiAppProtectedEditAttempt` (total # refined records: 9932)
18/07/07 03:43:45 INFO Refine: Successfully refined 107 of 107 dataset partitions into table `event`.`TranslationRecommendationAPIRequests` (total # refined records: 1760)
18/07/07 03:43:45 INFO Refine: Successfully refined 235 of 235 dataset partitions into table `event`.`PrefUpdate` (total # refined records: 664903)
18/07/07 03:43:45 INFO Refine: Successfully refined 219 of 219 dataset partitions into table `event`.`EchoInteraction` (total # refined records: 413902)
18/07/07 03:43:45 INFO Refine: Successfully refined 178 of 178 dataset partitions into table `event`.`MobileWikiAppFeedConfigure` (total # refined records: 26518)
18/07/07 03:43:45 INFO Refine: Successfully refined 223 of 223 dataset partitions into table `event`.`WikipediaPortal` (total # refined records: 162468)
18/07/07 03:43:45 INFO Refine: Successfully refined 227 of 227 dataset partitions into table `event`.`UniversalLanguageSelector` (total # refined records: 433052)
18/07/07 03:43:45 INFO Refine: Successfully refined 79 of 79 dataset partitions into table `event`.`MobileWikiAppWiktionaryPopup` (total # refined records: 632)
18/07/07 03:43:45 INFO Refine: Successfully refined 225 of 225 dataset partitions into table `event`.`ChangesListClickTracking` (total # refined records: 191044)
18/07/07 03:43:45 INFO Refine: Successfully refined 334 of 334 dataset partitions into table `event`.`GuidedTourGuiderHidden` (total # refined records: 9512)
18/07/07 03:43:45 INFO Refine: Successfully refined 121 of 121 dataset partitions into table `event`.`GuidedTourExited` (total # refined records: 2455)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`MobileWikiAppIntents` (total # refined records: 253770)
18/07/07 03:43:45 INFO Refine: Successfully refined 430 of 430 dataset partitions into table `event`.`MobileWikiAppiOSSessions` (total # refined records: 4755660)
18/07/07 03:43:45 INFO Refine: Successfully refined 201 of 201 dataset partitions into table `event`.`MobileWikiAppOnThisDay` (total # refined records: 50026)
18/07/07 03:43:45 INFO Refine: Successfully refined 98 of 98 dataset partitions into table `event`.`AdvancedSearchRequest` (total # refined records: 1447)
18/07/07 03:43:45 INFO Refine: Successfully refined 185 of 185 dataset partitions into table `event`.`WikimediaBlogVisit` (total # refined records: 21640)
18/07/07 03:43:45 INFO Refine: Successfully refined 222 of 222 dataset partitions into table `event`.`MultimediaViewerDuration` (total # refined records: 86836)
18/07/07 03:43:45 INFO Refine: Successfully refined 126 of 126 dataset partitions into table `event`.`MobileWikiAppNavMenu` (total # refined records: 4089)
18/07/07 03:43:45 INFO Refine: Successfully refined 152 of 152 dataset partitions into table `event`.`ContentTranslation` (total # refined records: 9997)
18/07/07 03:43:45 INFO Refine: Successfully refined 10 of 10 dataset partitions into table `event`.`MobileWikiAppStuffHappens` (total # refined records: 26)
18/07/07 03:43:45 INFO Refine: Successfully refined 169 of 169 dataset partitions into table `event`.`GuidedTourButtonClick` (total # refined records: 19174)
18/07/07 03:43:45 INFO Refine: Successfully refined 222 of 222 dataset partitions into table `event`.`ChangesListFilterGrouping` (total # refined records: 101491)
18/07/07 03:43:45 INFO Refine: Successfully refined 237 of 237 dataset partitions into table `event`.`MobileWikiAppArticleSuggestions` (total # refined records: 140021)
18/07/07 03:43:45 INFO Refine: Successfully refined 228 of 228 dataset partitions into table `event`.`MultimediaViewerAttribution` (total # refined records: 144630)
18/07/07 03:43:45 INFO Refine: Successfully refined 225 of 225 dataset partitions into table `event`.`SaveTiming` (total # refined records: 1819327)
18/07/07 03:43:45 INFO Refine: Successfully refined 224 of 224 dataset partitions into table `event`.`MobileWikiAppLogin` (total # refined records: 158343)
18/07/07 03:43:45 INFO Refine: Successfully refined 200 of 200 dataset partitions into table `event`.`MobileWikiAppRandomizer` (total # refined records: 35022)
18/07/07 03:43:45 INFO Refine: Successfully refined 223 of 223 dataset partitions into table `event`.`MobileWikiAppFeed` (total # refined records: 707981)
18/07/07 03:43:45 INFO Refine: Successfully refined 218 of 218 dataset partitions into table `event`.`UploadWizardTutorialActions` (total # refined records: 64383)
18/07/07 03:43:45 INFO Refine: Successfully refined 152 of 152 dataset partitions into table `event`.`RecentChangesTopLinks` (total # refined records: 9202)
18/07/07 03:43:45 INFO Refine: Successfully refined 126 of 126 dataset partitions into table `event`.`QuickSurveysResponses` (total # refined records: 2102)
18/07/07 03:43:45 INFO Refine: Successfully refined 237 of 237 dataset partitions into table `event`.`MobileWikiAppEdit` (total # refined records: 274822)
18/07/07 03:43:45 INFO Refine: Successfully refined 179 of 179 dataset partitions into table `event`.`GuidedTourGuiderImpression` (total # refined records: 32858)
18/07/07 03:43:45 INFO Refine: Successfully refined 222 of 222 dataset partitions into table `event`.`GeoFeatures` (total # refined records: 493660)
18/07/07 03:43:45 INFO Refine: Successfully refined 181 of 181 dataset partitions into table `event`.`UploadWizardUploadFlowEvent` (total # refined records: 30762)
18/07/07 03:43:45 INFO Refine: Successfully refined 231 of 231 dataset partitions into table `event`.`MobileWikiAppInstallReferrer` (total # refined records: 189423)
18/07/07 03:43:45 INFO Refine: Successfully refined 229 of 229 dataset partitions into table `event`.`ContentTranslationCTA` (total # refined records: 200153)
18/07/07 03:43:45 INFO Refine: Successfully refined 118 of 118 dataset partitions into table `event`.`UploadWizardErrorFlowEvent` (total # refined records: 13685)
18/07/07 03:43:45 INFO Refine: Successfully refined 208 of 208 dataset partitions into table `event`.`MobileWikiAppMediaGallery` (total # refined records: 77156)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`MultimediaViewerDimensions` (total # refined records: 994199)
18/07/07 03:43:45 INFO Refine: Successfully refined 203 of 203 dataset partitions into table `event`.`ServerSideAccountCreation` (total # refined records: 86476)
18/07/07 03:43:45 INFO Refine: Successfully refined 217 of 217 dataset partitions into table `event`.`Kartographer` (total # refined records: 164287)
18/07/07 03:43:45 INFO Refine: Successfully refined 232 of 232 dataset partitions into table `event`.`MobileWikiAppToCInteraction` (total # refined records: 1394848)
18/07/07 03:43:45 INFO Refine: Successfully refined 123 of 123 dataset partitions into table `event`.`UploadWizardExceptionFlowEvent` (total # refined records: 3174)
18/07/07 03:43:45 INFO Refine: Successfully refined 92 of 92 dataset partitions into table `event`.`ContentTranslationSuggestion` (total # refined records: 15415)
18/07/07 03:43:45 INFO Refine: Successfully refined 230 of 230 dataset partitions into table `event`.`MobileWebMainMenuClickTracking` (total # refined records: 911336)
18/07/07 03:43:45 INFO Refine: Successfully refined 386 of 386 dataset partitions into table `event`.`UploadWizardFlowEvent` (total # refined records: 2199)
18/07/07 03:43:45 INFO Refine: Successfully refined 153 of 153 dataset partitions into table `event`.`EditorActivation` (total # refined records: 15713)
18/07/07 03:43:45 INFO Refine: Successfully refined 154 of 154 dataset partitions into table `event`.`EditConflict` (total # refined records: 11095)
18/07/07 03:43:45 INFO Refine: Successfully refined 169 of 169 dataset partitions into table `event`.`UploadWizardStep` (total # refined records: 114169)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`GettingStartedRedirectImpression` (total # refined records: 71091)
18/07/07 03:43:45 INFO Refine: Successfully refined 216 of 216 dataset partitions into table `event`.`MobileWikiAppAppearanceSettings` (total # refined records: 103624)
18/07/07 03:43:45 INFO Refine: Successfully refined 32 of 32 dataset partitions into table `event`.`NavigationTiming` (total # refined records: 1324711)
18/07/07 03:43:45 INFO Refine: Successfully refined 159 of 159 dataset partitions into table `event`.`MediaWikiPingback` (total # refined records: 15516)
18/07/07 03:43:45 INFO Refine: Successfully refined 169 of 169 dataset partitions into table `event`.`EchoMail` (total # refined records: 60464)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`MobileWikiAppSavedPages` (total # refined records: 115713)
18/07/07 03:43:45 INFO Refine: Successfully refined 236 of 236 dataset partitions into table `event`.`CentralAuth` (total # refined records: 654521)
18/07/07 03:43:45 INFO Refine: Successfully refined 212 of 212 dataset partitions into table `event`.`QuickSurveyInitiation` (total # refined records: 174814)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`MobileWebSearch` (total # refined records: 497442)
18/07/07 03:43:45 INFO Refine: Successfully refined 427 of 427 dataset partitions into table `event`.`SearchSatisfactionErrors` (total # refined records: 168836)
18/07/07 03:43:45 INFO Refine: Successfully refined 106 of 106 dataset partitions into table `event`.`MobileWikiAppLangSelect` (total # refined records: 791)
18/07/07 03:43:45 INFO Refine: Successfully refined 223 of 223 dataset partitions into table `event`.`MobileWikiAppFindInPage` (total # refined records: 85011)
18/07/07 03:43:45 INFO Refine: Successfully refined 186 of 186 dataset partitions into table `event`.`MobileWikiAppCreateAccount` (total # refined records: 35711)
18/07/07 03:43:45 INFO Refine: Successfully refined 200 of 200 dataset partitions into table `event`.`MobileWikiAppTabs` (total # refined records: 193744)
18/07/07 03:43:45 INFO Refine: Successfully refined 241 of 241 dataset partitions into table `event`.`MultimediaViewerNetworkPerformance` (total # refined records: 447410)
18/07/07 03:43:45 INFO Refine: Successfully refined 222 of 222 dataset partitions into table `event`.`MobileWikiAppDailyStats` (total # refined records: 668769)
18/07/07 03:43:45 INFO Refine: Successfully refined 209 of 209 dataset partitions into table `event`.`MobileWikiAppReadingLists` (total # refined records: 650348)
Sun, Jul 8, 4:29 AM · Patch-For-Review, Analytics-Kanban, Analytics

Fri, Jul 6

Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

Scratch that previous command, I got that out of an old now unpuppetized wrapper script for the old 'json' based Refine jobs. (I deleted the wrapper scripts).

Fri, Jul 6, 7:20 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

Running refine to backfill the last 552 hours. Camus imported hours with new data will be re-refined.

Fri, Jul 6, 6:28 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata updated the task description for T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Fri, Jul 6, 4:55 AM · Patch-For-Review, Analytics-Kanban, Analytics

Thu, Jul 5

Ottomata updated subscribers of T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Thu, Jul 5, 9:28 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

Hm, I have a theory! This only affected low volume (EventLogging) topics. Assume that the log.message.timestamp.type change was the culprit, and that there is some timestamp related log retention bug when a Kafka log segment file has messages that were created when log.message.timestamp.type had different values. This bug could cause Kafka to miscalculate when an individual log segment needs to be deleted. Our log.segment.size is 512MB. For the low volume topics, there might not be > 512MB worth of data in a 7 day period. This would mean that a single segment file would span all 7 days. If there is a bug like this, Kafka might decide that this file needs deleted, even if there is data in it that is newer than 7 days old.

Thu, Jul 5, 9:27 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

Replaying these logs into Kafka via an eventlogging-consumer:

Thu, Jul 5, 9:14 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

Grepping for missing data in all-events.log in time range:

Thu, Jul 5, 8:48 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.

This is the list of affected eventlogging topics:

Thu, Jul 5, 8:25 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a subtask for T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change: T198908: Alarms on throughput on refined data .
Thu, Jul 5, 7:44 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a parent task for T198908: Alarms on throughput on refined data : T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Thu, Jul 5, 7:44 PM · Wikimedia-Incident, cloud-services-team (Kanban), Analytics
Ottomata updated the task description for T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Thu, Jul 5, 7:36 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata renamed T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change from EventLogging in Hive Data Loss due to Camus and Kafka timestamp.type=CreateTime change to EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Thu, Jul 5, 7:17 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata created T198906: EventLogging in Hive data loss due to Camus and Kafka timestamp.type=CreateTime change.
Thu, Jul 5, 7:16 PM · Patch-For-Review, Analytics-Kanban, Analytics
Ottomata added a comment to T190443: Spark Jupyter Notebook integration.

Alright, to install for existing users and installations, we need to first do to things. 1, reinstall the jupyterhub instance venv at /srv/jupyterhub/venv:

Thu, Jul 5, 6:26 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata moved T190443: Spark Jupyter Notebook integration from In Progress to In Code Review on the Analytics-Kanban board.
Thu, Jul 5, 6:14 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata moved T198738: Upgrade SWAP's JupyterLab from beta 1 to beta 2 from Next Up to Done on the Analytics-Kanban board.
Thu, Jul 5, 6:14 PM · Analytics-Kanban, Patch-For-Review, Analytics, Product-Analytics
Ottomata set the point value for T198738: Upgrade SWAP's JupyterLab from beta 1 to beta 2 to 2.
Thu, Jul 5, 6:14 PM · Analytics-Kanban, Patch-For-Review, Analytics, Product-Analytics
Ottomata added a comment to T198738: Upgrade SWAP's JupyterLab from beta 1 to beta 2.

Heya, as part of T190443 I've gone ahead and updated the JupyterLab dependency to 0.32.1 and installed this into all user venvs. Users will have to restart their jupyterhub process as you discovered, but it is there!

Thu, Jul 5, 6:14 PM · Analytics-Kanban, Patch-For-Review, Analytics, Product-Analytics
Ottomata added a comment to T196032: Huge messages in eqiad.mediawiki.job.cirrusSearchElasticaWrite (and other?) topics.

So, we've recently re-enabled job queue topic mirroring between main-eqiad and main-codfw, including this cirrusSearchElasticaWrite topic. I haven't seen any of these errors in those MirrorMaker instances, which either means that: there haven't been any of these huge messages since we started mirroring again, OR there is something specific to the main-eqiad -> jumbo-eqiad instance configuration that causes the message request size when productin to jumbo-eqiad to be too large.

Thu, Jul 5, 5:23 PM · Discovery-Search (Current work), Analytics-Kanban, EventBus, MediaWiki-JobQueue, Services (designing), Analytics
Ottomata claimed T190443: Spark Jupyter Notebook integration.
Thu, Jul 5, 4:06 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata moved T190443: Spark Jupyter Notebook integration from Next Up to In Progress on the Analytics-Kanban board.
Thu, Jul 5, 4:06 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

For example, one of the cool tools that Confluent provides is Kafka Connect

Kafka Connect is an API and service that is part of Apache Kafka, not provided by Confluent

Thu, Jul 5, 1:33 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics

Tue, Jul 3

Ottomata triaged T198764: Users should be able to read their jupyter instance logs as Normal priority.
Tue, Jul 3, 9:12 PM · Analytics
Ottomata added a comment to T198738: Upgrade SWAP's JupyterLab from beta 1 to beta 2.

Since the dependencies are installed in each user's personal virtualenv, you can actually do this yourself in the Jupyter CLI. Launch a terminal in Jupyter and then do

Tue, Jul 3, 6:28 PM · Analytics-Kanban, Patch-For-Review, Analytics, Product-Analytics
Ottomata added a project to T190443: Spark Jupyter Notebook integration: Analytics-Kanban.
Tue, Jul 3, 5:05 PM · Analytics-Kanban, Patch-For-Review, Analytics
Ottomata added a comment to T198424: Order Data Lake Hardware .

I'm pretty sure that whatever we end up using for this, the more memory we have the better.

Tue, Jul 3, 3:10 PM · Analytics-Kanban, Analytics

Mon, Jul 2

Ottomata updated subscribers of T188275: Jupyter Notebooks TLC 2018-2019.
Mon, Jul 2, 6:19 PM · Analytics
Ottomata added a comment to T188275: Jupyter Notebooks TLC 2018-2019.

@joal, would we be willing / have time to collaborate with me on this project over the next year? I think there are larger architectural decisions to think about.

Mon, Jul 2, 6:18 PM · Analytics
Ottomata added a comment to T188275: Jupyter Notebooks TLC 2018-2019.

Other cool things to try:

Mon, Jul 2, 6:17 PM · Analytics
Ottomata updated the task description for T185233: Modern Event Platform (with EventLogging of the Future (EoF)).
Mon, Jul 2, 2:28 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban

Fri, Jun 29

Ottomata updated the task description for T185233: Modern Event Platform (with EventLogging of the Future (EoF)).
Fri, Jun 29, 8:10 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Ottomata updated subscribers of T185233: Modern Event Platform (with EventLogging of the Future (EoF)).
Fri, Jun 29, 7:38 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Ottomata added a comment to T185233: Modern Event Platform (with EventLogging of the Future (EoF)).

Ok! Q4 is over, and we've completed the interview process. Marshall, Dan and I spoke with analysts, product managers, and product and tech department engineers. My messy notes are here. I'll try and summarize some of the important and interesting bits that we should make sure we consider as we work on this program over the next year.

Fri, Jun 29, 7:35 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Ottomata added a project to T120242: Reliable (atomic) MediaWiki event production: EventBus.
Fri, Jun 29, 7:21 PM · EventBus, Analytics, Services (later)
Ottomata added a comment to T185170: Enable EventBus on all wikis.

Can this be closed?

Fri, Jun 29, 7:18 PM · Services (done), MediaWiki-JobQueue, EventBus, Analytics
Ottomata added a comment to T198490: Use kafka for communication from analytics cluster to elasticsearch.

wait for kubernetes and stream processing? :D

Fri, Jun 29, 4:49 PM · Patch-For-Review, Analytics, Discovery-Search (Current work)
Ottomata added a comment to T198490: Use kafka for communication from analytics cluster to elasticsearch.

I am curious, what would the data look like? Would it be possible to use something like https://github.com/confluentinc/kafka-connect-elasticsearch ? I don't know exactly how that works, perhaps it only allows appending new documents, not updating indexes.

Fri, Jun 29, 3:24 PM · Patch-For-Review, Analytics, Discovery-Search (Current work)
Ottomata updated subscribers of T198256: RFC: Modern Event Platform - Choose Schema Tech.
Fri, Jun 29, 2:37 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

AFAICT JavaScript, PHP, Python, and Java all have libraries for JSONSchema draft 6.

Fri, Jun 29, 1:24 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata added a comment to T198256: RFC: Modern Event Platform - Choose Schema Tech.

Yeah, both protobufs and thrift are options, but neither have the advantages that Avro does, but many of the same disadvantages.

Fri, Jun 29, 2:54 AM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics

Thu, Jun 28

Ottomata added a comment to T182993: TLS security review of the Kafka stack.

Woo hoo!

Thu, Jun 28, 1:24 PM · Patch-For-Review, Traffic, User-Elukey, Analytics-Kanban, Analytics-Cluster, Operations

Wed, Jun 27

Ottomata added a project to T192641: Reimage thorium to Debian Stretch: Analytics-Kanban.
Wed, Jun 27, 3:53 PM · Analytics-Kanban, Analytics
Ottomata added a comment to T192641: Reimage thorium to Debian Stretch.

Luca and I just discussed, and decided that we should upgrade thorium to stretch anyway, and then later think about moving sites elsewhere.

Wed, Jun 27, 3:53 PM · Analytics-Kanban, Analytics
Ottomata added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

Ya fine with me too. A lot of these bits will likely get refactored as part of T185233, but the /beacon endpoint is unlikely to change.

Wed, Jun 27, 3:27 PM · User-Ryasmeen, Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging
Ottomata added a comment to T196904: Some VirtualPageView are too long and fail EventLogging processing.

@Jdrewniak, I think it'd be fine to modify the eventlogging codebase take a parameter to support chomping particular fields...although I see how that could get messy fast.

Wed, Jun 27, 3:13 PM · User-Ryasmeen, Readers-Web-Kanbanana-Board, Page-Previews, Analytics, Readers-Web-Backlog, Analytics-EventLogging
Ottomata moved T197254: Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS from In Code Review to Done on the Analytics-Kanban board.
Wed, Jun 27, 2:34 PM · Patch-For-Review, Analytics-Kanban, Services (watching), Analytics

Tue, Jun 26

Ottomata updated subscribers of T198256: RFC: Modern Event Platform - Choose Schema Tech.
Tue, Jun 26, 8:10 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata renamed T198256: RFC: Modern Event Platform - Choose Schema Tech from RFC: Modern Event Platform - Choose Schema Tech1 to RFC: Modern Event Platform - Choose Schema Tech.
Tue, Jun 26, 8:03 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata added a subtask for T185233: Modern Event Platform (with EventLogging of the Future (EoF)): T198256: RFC: Modern Event Platform - Choose Schema Tech.
Tue, Jun 26, 8:03 PM · Services (watching), Analytics-EventLogging, EventBus, Analytics, Analytics-Kanban
Ottomata added a parent task for T198256: RFC: Modern Event Platform - Choose Schema Tech: T185233: Modern Event Platform (with EventLogging of the Future (EoF)).
Tue, Jun 26, 8:03 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata created T198256: RFC: Modern Event Platform - Choose Schema Tech.
Tue, Jun 26, 8:02 PM · Operations, Services (designing), Analytics-EventLogging, EventBus, TechCom-RFC, Analytics
Ottomata added a comment to T149239: Ensure consistency of secondary data for external consumers.

There should be a mechanism for external tools to be notified when an edit has been fully processed.

Tue, Jun 26, 1:51 PM · TechCom-RFC, User-Daniel, Services (watching), MediaWiki-API, MediaWiki-Recent-changes, MediaWiki-Page-editing

Mon, Jun 25

Ottomata added a comment to T161731: Create reliable change stream for specific wiki.

OO yes @Smalyshev and in case you didn't see, we also increased retention of mediawiki topics to 31 days in the main kafka clusters.

Mon, Jun 25, 11:30 PM · Patch-For-Review, User-Smalyshev, EventBus, Wikimedia-Stream, Analytics, Epic, Wikidata-Query-Service, Discovery, Wikidata
Ottomata closed T152731: Implement server side filtering (if we should) as Declined.
Mon, Jun 25, 9:14 PM · Analytics, Patch-For-Review, EventBus, Wikimedia-Stream
Ottomata closed T152731: Implement server side filtering (if we should), a subtask of T130651: EventStreams, as Declined.
Mon, Jun 25, 9:14 PM · Services (watching), User-mobrovac, Analytics-Kanban, EventBus, Wikimedia-Stream
Ottomata added a comment to T197896: Make various auth libraries available on stat* machines.

oauth2client and oauthlib were easy because they already have .deb packages in Debian. We'll have to make a .deb package for google-auth.

Mon, Jun 25, 8:55 PM · Patch-For-Review, Product-Analytics, Analytics, SEO
Ottomata updated the task description for T197896: Make various auth libraries available on stat* machines.
Mon, Jun 25, 8:54 PM · Patch-For-Review, Product-Analytics, Analytics, SEO
Ottomata added a comment to T197254: Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS.

codfw -> eqiad renabled. If all is well, i'll do eqiad -> codfw tomorrow.

Mon, Jun 25, 5:43 PM · Patch-For-Review, Analytics-Kanban, Services (watching), Analytics
Ottomata added a comment to T196345: eqiad: (1) new stat box to offload users from stat1005.

We have budget for a new stat box next FY. We'd like to use that budget to order the new box, move stat1005 users to it, and then use stat1005 for what the new stat box was going to be (dedicated researcher host). This will minimize moving users around. We have to order new hardware anyway, so we might as well do it this way.

Mon, Jun 25, 3:43 PM · hardware-requests, Operations, Analytics
Ottomata added a comment to T198093: Add a safe failover for analytics1003.

Hm! interesting.

Mon, Jun 25, 2:03 PM · User-Elukey, Analytics
Ottomata added a comment to T198093: Add a safe failover for analytics1003.

there might be a chance that the snapshot used in a restore emergency operation leads to a corrupted database

Is there? We don't stop Mariadb,but mylvmbackup locks the tables (and flushes writes?) before taking the snapshot.

Mon, Jun 25, 1:59 PM · User-Elukey, Analytics

Thu, Jun 21

Ottomata added a comment to T186559: Provide data dumps in the Analytics Data Lake.

Q: How does ElasticSearch get the text for indexing?

Thu, Jun 21, 5:47 PM · Research, Analytics
Ottomata moved T197503: Archive operations/puppet/varnishkafka repository from Incoming to Operational Excellence on the Analytics board.
Thu, Jun 21, 4:43 PM · Analytics, Operations, Cleanup
Ottomata changed the status of T70477: Story: WikimetricsUser runs report against all wikis from Invalid to Resolved.
Thu, Jun 21, 4:42 PM · Analytics, Story, Analytics-Wikimetrics
Ottomata closed T70477: Story: WikimetricsUser runs report against all wikis as Invalid.
Thu, Jun 21, 4:42 PM · Analytics, Story, Analytics-Wikimetrics
Ottomata closed T70478: EEVSUser selects ALL wikis, a subtask of T70477: Story: WikimetricsUser runs report against all wikis, as Invalid.
Thu, Jun 21, 4:41 PM · Analytics, Story, Analytics-Wikimetrics
Ottomata closed T70478: EEVSUser selects ALL wikis as Invalid.
Thu, Jun 21, 4:41 PM · Analytics, Story, Analytics-Dashiki
Ottomata moved T197237: Requesting access for mbsantos from Incoming to Radar on the Analytics board.
Thu, Jun 21, 4:41 PM · Patch-For-Review, Analytics, Operations, SRE-Access-Requests
Ottomata moved T197254: Re-enable cross DC mirroring of job and change-prop Kafka topics over TLS from Incoming to Kafka Work on the Analytics board.
Thu, Jun 21, 4:40 PM · Patch-For-Review, Analytics-Kanban, Services (watching), Analytics
Ottomata triaged T197707: Degraded RAID on dbstore1002 as High priority.
Thu, Jun 21, 4:40 PM · User-Elukey, Analytics, ops-eqiad, Operations
Ottomata raised the priority of T197707: Degraded RAID on dbstore1002 from High to Needs Triage.
Thu, Jun 21, 4:40 PM · User-Elukey, Analytics, ops-eqiad, Operations
Ottomata moved T197828: Fix "score_schema" -- invalid JSON Schema from Incoming to Data Quality on the Analytics board.
Thu, Jun 21, 4:40 PM · Analytics-Kanban, User-Ladsgroup, Scoring-platform-team (Current), Services (watching), ORES, Analytics, EventBus
Ottomata triaged T197276: turnilo x axis improperly labeled as Normal priority.
Thu, Jun 21, 4:39 PM · Analytics
Ottomata added a comment to T197276: turnilo x axis improperly labeled.

HMmm https://github.com/allegro/turnilo/issues/105

Thu, Jun 21, 4:38 PM · Analytics
Ottomata triaged T197277: Pageviews agent=bot is always 0 as Normal priority.
Thu, Jun 21, 4:36 PM · Analytics, Pageviews-API, Tool-Pageviews

Wed, Jun 20

Ottomata added a comment to T167180: Emit revision-score event to EventBus and expose in EventStreams.

Due to the very harry problems in T195979 and T197000, I'm considering removing revision-score from EventStreams and reopening this task as a parent of those two. We shouldn't expose the stream until we are sure of the schema we will use. YARRGHH

Wed, Jun 20, 9:37 PM · Patch-For-Review, Services (done), Analytics-Kanban, Scoring-platform-team, Reading-Infrastructure-Team-Backlog, Trending-Service, Analytics, EventBus, ORES
Ottomata added a comment to T197000: Modify revision-score schema so that model probabilities won't conflict.

Oh, this is not easy. The schema's have a few incompatible field names as noted in T197000.

Wed, Jun 20, 9:33 PM · Scoring-platform-team, Analytics-Kanban, Patch-For-Review, User-Ladsgroup, Services (watching), ORES, Analytics, EventBus