Page MenuHomePhabricator

Ottomata (Andrew Otto)
User

Projects (10)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Oct 9 2014, 4:50 PM (379 w, 5 d)
Availability
Available
IRC Nick
ottomata
LDAP User
Ottomata
MediaWiki User
Ottomata [ Global Accounts ]

Recent Activity

Today

Ottomata added a comment to T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation.

I believe restarting Jupyter servers will be necessary, but that should be all.

Tue, Jan 18, 5:56 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
Ottomata claimed T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation.
Tue, Jan 18, 5:26 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
Ottomata moved T292699: Conda's CPPFLAGS may not be correct when pip installing a package that needs c/cpp compilation from Next Up to In Progress on the Data-Engineering-Kanban board.
Tue, Jan 18, 5:25 PM · Data-Engineering-Kanban, Data-Engineering, Analytics-Kanban
Ottomata moved T299398: Migrate AQS hourly job from Next Up to In Progress on the Data-Engineering-Kanban board.
Tue, Jan 18, 5:23 PM · Airflow, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T291120: MediaWiki Event Carried State Transfer - Problem Statement.

Ok, going with 'MediaWiki Event Carried State Transfer' as title.

Tue, Jan 18, 3:37 PM · Analytics, Platform Engineering, Data-Engineering, Event-Platform, tech-decision-forum
Ottomata renamed T291120: MediaWiki Event Carried State Transfer - Problem Statement from MediaWiki Events as a Source of Truth - Problem Statement to MediaWiki Event Carried State Transfer - Problem Statement.
Tue, Jan 18, 3:37 PM · Analytics, Platform Engineering, Data-Engineering, Event-Platform, tech-decision-forum
Ottomata added a comment to T299343: Requesting access to analytics clients for mfossati.

Approved!

Tue, Jan 18, 2:28 PM · Patch-For-Review, SRE, SRE-Access-Requests
Ottomata closed T296945: Deploy research_poc Swift credidentials to Hadoop , a subtask of T294380: Storage request for datasets published by research team, as Resolved.
Tue, Jan 18, 2:18 PM · SRE-swift-storage
Ottomata closed T296945: Deploy research_poc Swift credidentials to Hadoop as Resolved.
Tue, Jan 18, 2:18 PM · Data-Engineering-Kanban, Data-Engineering, SRE-swift-storage
Ottomata added a comment to T296945: Deploy research_poc Swift credidentials to Hadoop .

Hm, perhaps, although I'm not sure where. This is sort of a one off. We'd love to have more first class support for exporting to swift, but atm its a little hacky.

Tue, Jan 18, 2:18 PM · Data-Engineering-Kanban, Data-Engineering, SRE-swift-storage
Ottomata added a comment to T268027: Automate EventGate validation error reporting.

Just checked, and some of the dashboard links had changed, so I updated them in https://wikitech.wikimedia.org/wiki/Event_Platform/Instrumentation_How_To#Viewing_schema_validation_errors

Tue, Jan 18, 2:13 PM · Analytics, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Event-Platform
Ottomata added a comment to T268027: Automate EventGate validation error reporting.

Not specifically, but there is this eventgate validation error dashboard, which allows you to add a filter per stream name if you like.

Tue, Jan 18, 2:11 PM · Analytics, Data-Engineering, Better Use Of Data, Product-Data-Infrastructure, Event-Platform
Ottomata added a comment to T299186: Requesting access to analytics-privatedata-users for Nick Ray.

Approved

Tue, Jan 18, 2:06 PM · Patch-For-Review, SRE, SRE-Access-Requests
Ottomata closed T296699: Pool eventgate-main in both datacenters (active/active) as Resolved.

Yup should be!

Tue, Jan 18, 3:13 AM · Analytics, Data-Engineering, Sustainability (Incident Followup), Event-Platform, SRE

Fri, Jan 14

Ottomata added a comment to T299166: Run Atlas on cloud services cluster.

Nice

Fri, Jan 14, 5:32 PM · Data-Catalog, Data-Engineering-Kanban, Epic, Data-Engineering

Thu, Jan 13

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Been experimenting with how we might create Airflow + Skein + Spark submit integration:
Here's an example SkeinSparkSubmitOperator that leverages the upstream provided (currently hacked) SparkSubmitHook to get the proper spark-submit command to run.

Thu, Jan 13, 10:22 PM · Data-Engineering-Kanban, Airflow

Tue, Jan 11

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Okay, got some tests in and created a merge request:

Tue, Jan 11, 9:10 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T298721: MobileWikiAppiOSUserHistory sending incompatible data.

Ok, thank you!

Tue, Jan 11, 8:39 PM · Product-Analytics (Kanban), Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics, Data-Engineering
Ottomata added a comment to T298981: Create Kerberos login for Brian King (bking).

Approved.

Tue, Jan 11, 4:19 PM · SRE, Data-Engineering, LDAP-Access-Requests, SRE-Access-Requests

Mon, Jan 10

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Alright, figured some stuff out (thanks Marcel and Joseph for the brian bounces).

Mon, Jan 10, 9:17 PM · Data-Engineering-Kanban, Airflow

Fri, Jan 7

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

End of day update:

Fri, Jan 7, 10:37 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Example of launching a custom spark version from a packed conda env using skein:

Fri, Jan 7, 9:00 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T298721: MobileWikiAppiOSUserHistory sending incompatible data.

Manually alter the Hive table event.device_level_enabled field to a string. This will likely cause any old data in Hive to be unreadable or corrupted.

Done.

Fri, Jan 7, 8:19 PM · Product-Analytics (Kanban), Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics, Data-Engineering
Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .
  • Have been improving and experimenting with conda-dist built envs.
  • I can successfully build conda dist envs with and without python, with and without pyspark (different versions) and run them in yarn cluster mode via call.py, with *almost* no local files needed.
  • However, to run different spark versions, the launcher (e.g. airflow scheduler) needs to have a pyspark local, e.g. to be able to shell out to spark-submit.
    • I don't see how it will be possible to support different versions of spark without those dependencies available and unpacked on the airflow scheduler.
    • Unless...we always use skein.
Fri, Jan 7, 8:14 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T298786: Requesting access to Data Engineering team resources for Sandra Ebele Nwachukwu.

Approved.

Fri, Jan 7, 5:41 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T296470: Initialize WCQS production servers.

Ran on main-eqiad and main-codfw kafka:

Fri, Jan 7, 3:09 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
Ottomata added a comment to T298657: Requesting access to the data engineering team resources for Antoine Qu'hen.

No I think any SRE can do the work; IIUC clinic duty exists to make sure things like this don't fall through the cracks. Proceed!

Fri, Jan 7, 2:05 PM · Data-Engineering-Kanban, SRE, SRE-Access-Requests

Thu, Jan 6

Ottomata added a comment to T298721: MobileWikiAppiOSUserHistory sending incompatible data.

Hm, something must have sent this data with a boolean value initially. That's the only way the table would have been created this way.

Thu, Jan 6, 9:36 PM · Product-Analytics (Kanban), Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics, Data-Engineering
Ottomata updated the task description for T298657: Requesting access to the data engineering team resources for Antoine Qu'hen.
Thu, Jan 6, 7:18 PM · Data-Engineering-Kanban, SRE, SRE-Access-Requests
Ottomata added a comment to T298657: Requesting access to the data engineering team resources for Antoine Qu'hen.

Approved.

Thu, Jan 6, 7:17 PM · Data-Engineering-Kanban, SRE, SRE-Access-Requests
Ottomata created T298721: MobileWikiAppiOSUserHistory sending incompatible data.
Thu, Jan 6, 7:16 PM · Product-Analytics (Kanban), Wikipedia-iOS-App-Backlog, iOS-app-feature-Analytics, Data-Engineering

Wed, Dec 22

Ottomata added a comment to T293700: Decide whether and how to consolidate Wmfdata-Python and Refinery's Python modules.

More related stuff: In T296543: Tooling for Deploying Conda Environments are working on some shared python libs related to airflow, and along the way I've copied and modified some conda related code from wmfdata.

Wed, Dec 22, 3:42 AM · Data-Engineering, Product-Analytics, wmfdata-python

Tue, Dec 21

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Update. Stuff not quite ready for review, but I've done a lot of bootstrapping work in https://gitlab.wikimedia.org/otto/workflow_utils/.

Tue, Dec 21, 11:36 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T298124: Requesting access to analytics-platform-eng-admins for lbowmaker.

Should Luke be added as a member to Platform Engineering just like any other existing member there? That would cover this request and possibly more.

Tue, Dec 21, 9:30 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T298124: Requesting access to analytics-platform-eng-admins for lbowmaker.

Approved.

Tue, Dec 21, 7:16 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T297734: Hive query failure in Jupyter notebook on stat1005.

Right, and it isn't a race condition, since the existent parquet-1.log was created Nov 2 or before.

Tue, Dec 21, 2:03 PM · Data-Engineering-Kanban, Data-Engineering

Mon, Dec 20

Ottomata added a comment to T294911: Apparent latency warning in 90th centile of eventgate-logging-external.

 BadRequestError: request aborted at IncomingMessage.onAborted

https://github.com/expressjs/body-parser/blob/master/README.md#request-aborted

Mon, Dec 20, 2:35 PM · Analytics, Data-Engineering, Event-Platform, Observability-Alerting
Ottomata added a comment to T294911: Apparent latency warning in 90th centile of eventgate-logging-external.

As we know, the primary use case for eventgate-logging-external is that of Network Error Logging (aka NEL).

Just a clarification, the original use case for eventgate-logging-external was for mediawiki.client-error logging. Until I looked just now, I had assumed that this was also the majority of events here, but you are right, NEL is larger.

Mon, Dec 20, 2:33 PM · Analytics, Data-Engineering, Event-Platform, Observability-Alerting
Ottomata added a comment to T297927: Requesting access to Superset for Spatel.

Approved!

Mon, Dec 20, 2:27 PM · SRE, Product-Analytics, SRE-Access-Requests

Dec 17 2021

Ottomata added a comment to T222795: Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats.

Oh, is that not related? Anyway, I'm not aware of any alerts on GC stats, and at the very worst we'll have to adjust the grafana panel next time we look at it.

Dec 17 2021, 7:22 PM · observability, serviceops-radar, Services (later), service-runner, SRE
Ottomata added a comment to T222795: Re-evaluate service-runner's (ab)use of statsd timing metric for nodejs GC stats.

I had to go back and check, but both eventgate and evenstreams are using service-runner prometheus directly, and no longer using prometheus-statsd-exporter.

Dec 17 2021, 7:21 PM · observability, serviceops-radar, Services (later), service-runner, SRE

Dec 16 2021

Ottomata updated subscribers of T297908: Requesting access to analytics-privatedata-users for ryankemper.
Dec 16 2021, 9:10 PM · SRE, SRE-Access-Requests
Ottomata updated subscribers of T297910: Requesting shell access for Brian King.

Approved!

Dec 16 2021, 9:10 PM · SRE, LDAP-Access-Requests, SRE-Access-Requests
Ottomata added a comment to T297908: Requesting access to analytics-privatedata-users for ryankemper.

Approved!

Dec 16 2021, 9:08 PM · SRE, SRE-Access-Requests
Ottomata reassigned T293938: (Need By: TBD) rack/setup/install an-test-coord1002 from Ottomata to BTullis.

I'm not familiar with what is going on with this node atm, pinging @BTullis!

Dec 16 2021, 5:04 PM · Analytics-Clusters, SRE, ops-eqiad, DC-Ops
Ottomata added a comment to T297842: Requesting wmf LDAP and analytics-private-data access for Mary Munyoki.

Approved!

Dec 16 2021, 2:02 PM · SRE, SRE-Access-Requests
Ottomata added a comment to T292586: Sticky Header: Create schema to track returning to the top of the page.

Then what is the request referring to in the common fragment?

Dec 16 2021, 2:00 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), QTE-TestingOverview, Patch-For-Review, MediaWiki-extensions-WikimediaEvents, Desktop Improvements, Readers-Web-Backlog (Kanbanana-FY-2021-22)
Ottomata added a comment to T297841: Apache atlas build fails due to expired certificate (https://maven.restlet.com).

You might be able to workaround this by manually downloading and then uploading this dependency to our archiva, and then making sure your own maven repo settings.xml is configured to use our archiva.

Dec 16 2021, 1:55 PM · User-razzi, Data-Engineering-Kanban, Data-Engineering

Dec 15 2021

Ottomata added a comment to T296670: Run Atlas on test cluster.

We should punch a hole to test-zookeeper. However!

Dec 15 2021, 4:24 PM · User-razzi, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

Yeahh...but then we have to manage and maintain another custom ingestion job. We're trying to reduce the number of those.

Dec 15 2021, 2:43 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

I'd prefer to avoid scheduling another special job for this if we can. Can we make the NetflowTransform functions smart enough to know to do the right thing based on the input data somehow?

Dec 15 2021, 2:34 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

Would it hurt to keep the same augmentations? If the schema is the sameish (it sounds like it is), we can just apply the exact same pipeline on netflow internal with no code changes.

Dec 15 2021, 2:24 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE

Dec 14 2021

Ottomata added a comment to T297734: Hive query failure in Jupyter notebook on stat1005.

Hm, strange In /etc/hive/conf/parquet-logging.properties:

Dec 14 2021, 6:29 PM · Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T296699: Pool eventgate-main in both datacenters (active/active).
root@puppetmaster1001:~# confctl --object-type discovery select 'dnsdisc=eventgate-main,name=codfw' set/pooled=true
eventgate-main/codfw: pooled changed False => True
Dec 14 2021, 6:26 PM · Analytics, Data-Engineering, Sustainability (Incident Followup), Event-Platform, SRE
Ottomata added a comment to T297114: Requesting Kerberos Identity .

This was originally requested and approved in T295552: Requesting access to analytics-privatedata-users for SCherukuwada but not completed. I think you mean to say that your email was 'scherukuwada@wikimedia.org'.

Dec 14 2021, 5:02 PM · Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T295485: [SPIKE] Investigate Approach for Shipping Airflow/Data Pipeline Metrics.

Oh! This is kind of cool:

Dec 14 2021, 2:39 PM · Spike, Generated Data Platform
Ottomata added a comment to T291120: MediaWiki Event Carried State Transfer - Problem Statement.

Perhaps a better titled would be "Event Carried State Transfer of MediaWiki State"?

Dec 14 2021, 2:36 PM · Analytics, Platform Engineering, Data-Engineering, Event-Platform, tech-decision-forum
Ottomata added a comment to T120242: Consistent MediaWiki state change events | MediaWiki events as source of truth.

This is this kind of thing we need to have a way to reconcile: https://wikitech.wikimedia.org/wiki/Incident_documentation/2021-11-25_eventgate-main_outage

Dec 14 2021, 2:34 PM · Analytics, Data-Engineering, DBA, WMF-Architecture-Team, Platform Team Legacy (Later), Event-Platform, Services (later)

Dec 13 2021

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

I've been able to build a packed and stacked (stack 'N' pack?) conda env on to of anaconda-wmf. This allows us to build the conda env without including dependencies already available on all of the hadoop workers.

Dec 13 2021, 5:27 PM · Data-Engineering-Kanban, Airflow
Ottomata moved T294024: [Airflow] Automate sync'ing archiva packages to HDFS from In Progress to Done on the Airflow board.
Dec 13 2021, 4:41 PM · Data-Engineering-Kanban, Airflow, Data-Engineering
Ottomata moved T295380: [Airflow] Set up scap deployment from In Progress to Done on the Airflow board.
Dec 13 2021, 4:41 PM · Patch-For-Review, Data-Engineering-Kanban, Airflow, Data-Engineering
Ottomata moved T296543: Tooling for Deploying Conda Environments from Next Up to In Progress on the Data-Engineering-Kanban board.
Dec 13 2021, 4:03 PM · Data-Engineering-Kanban, Airflow
Ottomata edited projects for T296543: Tooling for Deploying Conda Environments , added: Data-Engineering-Kanban; removed Analytics-Kanban.
Dec 13 2021, 4:03 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T269832: Add a presto query logger.

Okay, sounds good! Just FYI: T291645: Integrate Event Platform and ECS logs

Dec 13 2021, 2:17 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T297604: cergen should include the cert's name in SAN too.

I have some vague feeling that there was a reason not to do this, but I can't recall why and I can't find any docs by my past self to indicate that it was (which I usually do if intentional).

Dec 13 2021, 2:14 PM · SRE Observability (FY2021/2022-Q3), User-fgiunchedi

Dec 10 2021

Ottomata updated subscribers of T296543: Tooling for Deploying Conda Environments .
Dec 10 2021, 8:39 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Ok, more findings, this time about how to run a python function in a packaged conda environment as a Spark job without having that conda env and python function locally on the Airflow worker.

Dec 10 2021, 8:38 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T269832: Add a presto query logger.

If you end up wanting to produce this as an event on Event Platform, https://gerrit.wikimedia.org/r/plugins/gitiles/wikimedia-event-utilities/#generate-an-event-prior-sending-it-to-kafka will help you do so in Java.

Dec 10 2021, 7:02 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T269832: Add a presto query logger.

Very cool!

Dec 10 2021, 7:01 PM · Patch-For-Review, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T297400: '.search_result_page_id' should be integer.

Okay, thanks!

Dec 10 2021, 5:12 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), SDAW-MediaSearch, Patch-For-Review, Structured-Data-Backlog (Current Work), Wikimedia-production-error
Ottomata added a comment to T296670: Run Atlas on test cluster.

_ Failed to execute goal com.github.eirslett:frontend-maven-plugin:1.4:npm (npm install) on project atlas-dashboardv2: Failed to run task: 'npm install' failed. org.apache.commons.exec.ExecuteException: Process exited with an error: 254 (Exit value: 254) -> [Help 1]

Dec 10 2021, 2:33 PM · User-razzi, Data-Engineering-Kanban, Data-Engineering
Ottomata added a comment to T297400: '.search_result_page_id' should be integer.

wmf.12 is currently already out to test and commons and MediaSearch isn't deployed on any group 2 wiki's so not a train blocker.

@Seddon you sure? Since yesterday the validation error rate jumped a ton and is causing more alerts

Dec 10 2021, 2:29 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), SDAW-MediaSearch, Patch-For-Review, Structured-Data-Backlog (Current Work), Wikimedia-production-error

Dec 9 2021

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Did you consider unifying package management and express pip+conda deps in the same environment.yml file?

Dec 9 2021, 8:10 PM · Data-Engineering-Kanban, Airflow
Ottomata added a subtask for T294258: Data Catalog Requirements: T296848: [SPIKE] Collect backlog of data security / privacy design choices for future data governance.
Dec 9 2021, 5:39 PM · Data-Catalog, Epic, Data-Engineering-Kanban, Data-Engineering
Ottomata added a parent task for T296848: [SPIKE] Collect backlog of data security / privacy design choices for future data governance: T294258: Data Catalog Requirements.
Dec 9 2021, 5:39 PM · Metrics-Platform
Ottomata added a subtask for T296848: [SPIKE] Collect backlog of data security / privacy design choices for future data governance: T276955: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization.
Dec 9 2021, 5:38 PM · Metrics-Platform
Ottomata added a parent task for T276955: Develop comprehensive process, guidelines, and roles for Event Platform stream sanitization: T296848: [SPIKE] Collect backlog of data security / privacy design choices for future data governance.
Dec 9 2021, 5:38 PM · Analytics, Data-Engineering, Product-Analytics, Event-Platform, Better Use Of Data
Ottomata updated subscribers of T297400: '.search_result_page_id' should be integer.

I'm not sure if this should be a train blocker, this is an analytics instrumentation event.

Dec 9 2021, 4:01 PM · MW-1.38-notes (1.38.0-wmf.12; 2021-12-06), SDAW-MediaSearch, Patch-For-Review, Structured-Data-Backlog (Current Work), Wikimedia-production-error
Ottomata closed T295397: Herald rule for Data-Engineering as Resolved.
Dec 9 2021, 3:57 PM · Data-Engineering-Kanban, Data-Engineering, Phabricator
Ottomata added a comment to T295397: Herald rule for Data-Engineering.

Ah, it works!
https://phabricator.wikimedia.org/T297399

Dec 9 2021, 3:56 PM · Data-Engineering-Kanban, Data-Engineering, Phabricator
Ottomata closed T297399: dummy test task rule T295397 as Invalid.
Dec 9 2021, 3:56 PM · Data-Engineering, Event-Platform, Analytics
Ottomata created T297399: dummy test task rule T295397.
Dec 9 2021, 3:49 PM · Data-Engineering, Event-Platform, Analytics
Ottomata edited projects for T295427: Problem with delay caused by intake-analytics.wikimedia.org, added: Data-Engineering; removed Analytics.
Dec 9 2021, 3:35 PM · Analytics, Data-Engineering, Event-Platform, Metrics-Platform, Browser-Support-Microsoft-Edge
Ottomata added a comment to T295397: Herald rule for Data-Engineering.

if tagged with Analytics-Radar remove tag Data-Engineering

Tested, this works!
https://phabricator.wikimedia.org/herald/transcript/4545650/

Dec 9 2021, 3:34 PM · Data-Engineering-Kanban, Data-Engineering, Phabricator
Ottomata added a project to T295427: Problem with delay caused by intake-analytics.wikimedia.org: Event-Platform.
Dec 9 2021, 3:31 PM · Analytics, Data-Engineering, Event-Platform, Metrics-Platform, Browser-Support-Microsoft-Edge
Ottomata removed projects from T295427: Problem with delay caused by intake-analytics.wikimedia.org: Analytics-Radar, Event-Platform.
Dec 9 2021, 3:30 PM · Analytics, Data-Engineering, Event-Platform, Metrics-Platform, Browser-Support-Microsoft-Edge
Ottomata edited projects for T295427: Problem with delay caused by intake-analytics.wikimedia.org, added: Analytics-Radar; removed Analytics.
Dec 9 2021, 3:30 PM · Analytics, Data-Engineering, Event-Platform, Metrics-Platform, Browser-Support-Microsoft-Edge
Ottomata reassigned T295397: Herald rule for Data-Engineering from Ottomata to Milimetric.
Dec 9 2021, 3:24 PM · Data-Engineering-Kanban, Data-Engineering, Phabricator
Ottomata added a comment to T295397: Herald rule for Data-Engineering.

My PR to have Data-Engineering added has been merged. @Urbanecm can we make a Herald rule to do

Dec 9 2021, 3:24 PM · Data-Engineering-Kanban, Data-Engineering, Phabricator
Ottomata added a comment to T297231: Sending Apache Spark metrics to PushGateway.

"spark_app_id"

Might get confusing if this is not the YARN application_id when in YARN...wait or is it?

Dec 9 2021, 2:12 PM · Observability-Metrics

Dec 8 2021

Ottomata added a project to T296543: Tooling for Deploying Conda Environments : Analytics-Kanban.
Dec 8 2021, 3:16 PM · Data-Engineering-Kanban, Airflow
Ottomata claimed T296543: Tooling for Deploying Conda Environments .
Dec 8 2021, 3:16 PM · Data-Engineering-Kanban, Airflow
Ottomata moved T296543: Tooling for Deploying Conda Environments from Backlog to In Progress on the Airflow board.
Dec 8 2021, 3:15 PM · Data-Engineering-Kanban, Airflow

Dec 7 2021

Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Experimental Dockerfile that does this here:

Dec 7 2021, 10:41 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

Or, hm, if we use pip for all other python dependencies, then a basic python only conda env could just use a requirements.txt file instead of a conda environment.yml file. Oh ho, yes, then conda environment.yml files would not specify much more than the python version to use, and other requirements would just be pip installed as usual.

Dec 7 2021, 8:48 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T296543: Tooling for Deploying Conda Environments .

So, I'm looking into how to automate generating these conda envs.

Dec 7 2021, 8:47 PM · Data-Engineering-Kanban, Airflow
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

So yeah, unless we can at least control the event format, we can't really make a event platform compatible event.

Dec 7 2021, 7:42 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE
Ottomata moved T296945: Deploy research_poc Swift credidentials to Hadoop from Ready to Deploy to Done on the Data-Engineering-Kanban board.
Dec 7 2021, 5:07 PM · Data-Engineering-Kanban, Data-Engineering, SRE-swift-storage
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

can I start this anytime, or we need to create the kafka topic somewhere?

Not really needed, unless you need to set special topic settings (like the # of partitions). The topic will be auto created the first time it is produced to.

Dec 7 2021, 2:40 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE
Ottomata added a comment to T263277: Collect netflow data for internal traffic.

Agree on the name of the flow :

Some guidelines: https://wikitech.wikimedia.org/wiki/Event_Platform/Schemas/Guidelines#Event_Data_Modeling_and_Schema_Naming

Dec 7 2021, 2:29 PM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Traffic-Icebox, Infrastructure-Foundations, netops, SRE