Page MenuHomePhabricator

gmodena (GModena (WMF))
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Nov 2 2020, 1:15 PM (181 w, 5 d)
Availability
Available
IRC Nick
gmodena
LDAP User
Gmodena
MediaWiki User
GModena (WMF) [ Global Accounts ]

Recent Activity

Thu, Apr 25

gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

The haproxy_id field has been added to messages.

Thu, Apr 25, 2:06 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena moved T353940: We should provide DQ integration with Python from In Review to Done on the Data-Engineering (Q4 2024 April 1st - June 30th) board.
Thu, Apr 25, 9:07 AM · Data-Engineering (Q4 2024 April 1st - June 30th)

Fri, Apr 19

gmodena closed T351117: Move analytics log from Varnish to HAProxy as Resolved.

I'm afraid mixing varnishkafka and benthos payloads would break ingestion piepelines, since old/new events have a different schema. We could reuse the current topics, but we'd have to drain them first.

We can do both, for us it's just a matter of changing a string on puppet. I think decision is more on your side, choose the easiest/best option for you and we'll implement!

Fri, Apr 19, 7:00 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena changed the point value for T362780: [DQ] Add support for distribution metrics in data quality exporters from 2 to 3.
Fri, Apr 19, 6:34 AM · Data-Engineering
gmodena set the point value for T362782: [DQ][NEEDS GROOMING] Add support for deequ's RowLevelSchemaValidator in refinery to 3.
Fri, Apr 19, 6:33 AM · Data-Engineering
gmodena set the point value for T362780: [DQ] Add support for distribution metrics in data quality exporters to 2.
Fri, Apr 19, 6:33 AM · Data-Engineering

Thu, Apr 18

gmodena set the point value for T362783: Add instrumentation for actor signatures to 1.
Thu, Apr 18, 1:46 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena set the point value for T362785: Add host level instrumentation on webrequest to 1.
Thu, Apr 18, 1:46 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T361853: [Datasets Config][Spike] Understand and document the details and conflicts between Datasets Config, Refine refactor, Dynamic EventStreamConfig, and Metrics Platform Instrumentation Configurator from Next Up to In progress on the Data-Engineering (Q4 2024 April 1st - June 30th) board.
Thu, Apr 18, 1:35 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

About the sequence issue, that's the most plausible hypotheses. We could append (or prepend) other information pieces to the sequence number (like the haproxy process id) to avoid duplicates but we couldn't guarantee the monotonic increase (or the increase, even) in this case. I suggest using this current approach for the moment and eventually rework later.

Thu, Apr 18, 1:12 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

Next steps: now that we are starting to collect more logs, we can start comparing current / new webrequest records.

Thu, Apr 18, 11:04 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Wed, Apr 17

gmodena moved T362783: Add instrumentation for actor signatures from Next Up to In Review on the Data-Engineering (Q4 2024 April 1st - June 30th) board.
Wed, Apr 17, 5:57 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T362785: Add host level instrumentation on webrequest from Next Up to In Review on the Data-Engineering (Q4 2024 April 1st - June 30th) board.
Wed, Apr 17, 5:57 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena created T362785: Add host level instrumentation on webrequest.
Wed, Apr 17, 3:18 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena created T362783: Add instrumentation for actor signatures.
Wed, Apr 17, 3:15 PM · Patch-For-Review, Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena created T362782: [DQ][NEEDS GROOMING] Add support for deequ's RowLevelSchemaValidator in refinery.
Wed, Apr 17, 3:08 PM · Data-Engineering
gmodena created T362780: [DQ] Add support for distribution metrics in data quality exporters.
Wed, Apr 17, 3:03 PM · Data-Engineering

Fri, Apr 5

gmodena added a parent task for T361017: [SPIKE] Can we express Event Platform configs in Datasets Config?: T361853: [Datasets Config][Spike] Understand and document the details and conflicts between Datasets Config, Refine refactor, Dynamic EventStreamConfig, and Metrics Platform Instrumentation Configurator.
Fri, Apr 5, 6:42 AM · Data-Engineering (Q4 2024 April 1st - June 30th), Spike, Event-Platform
gmodena added a subtask for T361853: [Datasets Config][Spike] Understand and document the details and conflicts between Datasets Config, Refine refactor, Dynamic EventStreamConfig, and Metrics Platform Instrumentation Configurator: T361017: [SPIKE] Can we express Event Platform configs in Datasets Config?.
Fri, Apr 5, 6:42 AM · Data-Engineering (Q4 2024 April 1st - June 30th)

Thu, Apr 4

gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

@gmodena you should have some more data to play with now, while I work on the performance optimization and on Benthos internal metrics...

Thu, Apr 4, 1:16 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 27 2024

gmodena created T361094: [NEEDS GROOMING] Orchestrate gobblin ingestion task with Airflow.
Mar 27 2024, 11:50 AM · Event-Platform, Data-Engineering

Mar 26 2024

gmodena moved T359051: eventstreams: change default num_workers to 0 from Ready to Deploy to Done on the Data-Engineering (Sprint 9) board.
Mar 26 2024, 3:37 PM · Data-Engineering (Sprint 9)
gmodena moved T359051: eventstreams: change default num_workers to 0 from In Review to Ready to Deploy on the Data-Engineering (Sprint 9) board.
Mar 26 2024, 3:37 PM · Data-Engineering (Sprint 9)
gmodena created T361017: [SPIKE] Can we express Event Platform configs in Datasets Config?.
Mar 26 2024, 1:57 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Spike, Event-Platform

Mar 25 2024

gmodena updated the task description for T353940: We should provide DQ integration with Python.
Mar 25 2024, 8:13 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena added a comment to T353940: We should provide DQ integration with Python.

I need to add a wrapper to the Alert generation SerDe

Mar 25 2024, 8:04 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T353940: We should provide DQ integration with Python from In progress to In Review on the Data-Engineering (Sprint 9) board.
Mar 25 2024, 8:01 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 22 2024

gmodena added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

Tagging T360642: Remove extra fields currently sent to Kafka

Mar 22 2024, 8:14 AM · Patch-For-Review, Data-Engineering, Event-Platform
gmodena added a comment to T360642: Remove extra fields currently sent to Kafka.

These are the fields that are sent from Benthos that aren't present in the current webrequest stream:

Mar 22 2024, 8:12 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena updated subscribers of T360642: Remove extra fields currently sent to Kafka.
Mar 22 2024, 8:08 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a project to T360642: Remove extra fields currently sent to Kafka: Event-Platform.
Mar 22 2024, 7:58 AM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 21 2024

gmodena added a comment to T353940: We should provide DQ integration with Python.

lets maybe pair on it?

I'd love to hack on this at the offsite!!

Mar 21 2024, 2:10 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

See https://github.com/wikimedia/service-runner/commit/b9c98eab5398413c16df2317562745f6ffe74439

Mar 21 2024, 11:39 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Mar 19 2024

gmodena added a project to T360450: Add $schema key to Benthos payload: Event-Platform.
Mar 19 2024, 4:37 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena updated subscribers of T360450: Add $schema key to Benthos payload.

For context: this is the approach we follow with other producers, e.g. Java.

Mar 19 2024, 4:33 PM · Event-Platform, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Mar 8 2024

gmodena added a comment to T353940: We should provide DQ integration with Python.

IIUC, the necessity for py4j is only tied to the fact that we developed helper code like the case of HivePartition and DeequAnalyzersToDataQualityMetrics that we'd like to reuse, correct?

Mar 8 2024, 2:40 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 7 2024

gmodena created T359561: Add user fabfur to analytics-privatedata-users.
Mar 7 2024, 4:19 PM · Patch-For-Review, Data-Platform-SRE (2024.03.25 - 2024.04.14), SRE, SRE-Access-Requests
gmodena moved T353940: We should provide DQ integration with Python from Next Up to In progress on the Data-Engineering (Sprint 9) board.
Mar 7 2024, 10:39 AM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena updated subscribers of T353940: We should provide DQ integration with Python.

We can integrate our DQ framework with Python by piggy backing on pyspark 's py4j gateway. Following is a rudimentary example that produces
metrics with data_quality_metrics table format:

Mar 7 2024, 10:36 AM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics from In progress to In Review on the Data-Engineering (Sprint 9) board.
Mar 7 2024, 8:50 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena renamed T353940: We should provide DQ integration with Python from [NEEDS GROOMING] we should provide DQ integration with Python to We should provide DQ integration with Python.
Mar 7 2024, 8:36 AM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 6 2024

gmodena moved T353940: We should provide DQ integration with Python from SDS3.3 - Data Quality to Sprint 9 on the Data-Engineering board.
Mar 6 2024, 3:28 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 5 2024

gmodena claimed T353940: We should provide DQ integration with Python.
Mar 5 2024, 1:42 PM · Data-Engineering (Q4 2024 April 1st - June 30th)

Mar 4 2024

gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

@Ottomata @Jdforrester-WMF there's a caveat wrt using collectDefaultMetrics. The method call does not allow setting custom labels. If i understand the doc correctly, we can still define them at registry level. This would clash with the current implementation. I'm not super keen in refactoring current behaviour given the codebase status, so I'd lean towards avoiding custom labels if possible. Are they used at all?

Mar 4 2024, 8:05 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

I spent some time learning this code base, touching base to validate direction. If this makes sense, I'll open a PR. My proposal here would be to add a new collect_default option to the prometheus metrics option block

Mar 4 2024, 2:28 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena claimed T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables.
Mar 4 2024, 2:03 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena set the point value for T359051: eventstreams: change default num_workers to 0 to 1.
Mar 4 2024, 1:06 PM · Data-Engineering (Sprint 9)
gmodena moved T359051: eventstreams: change default num_workers to 0 from In progress to In Review on the Data-Engineering (Sprint 9) board.
Mar 4 2024, 1:05 PM · Data-Engineering (Sprint 9)
gmodena created T359051: eventstreams: change default num_workers to 0.
Mar 4 2024, 1:05 PM · Data-Engineering (Sprint 9)
gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

Happy to pair / code review in case.

If you could implement on top of my PR that'd be great.

Mar 4 2024, 12:33 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Feb 28 2024

gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

FYI: this is a list of metrics reported with a local run:

Feb 28 2024, 3:00 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena added a comment to T249745: Could not enqueue jobs: "Unable to deliver all events: 503: Service Unavailable".

I was on PTO last week and trying to piece together what happened and how the UBN was mitigated.

Feb 28 2024, 12:50 PM · MediaWiki-Engineering, Data-Engineering, Unstewarded-production-error, User-brennen, serviceops, WMF-JobQueue, Wikimedia-production-error
gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

Update: We've got access back, and v4.0.0 is finally released. Still happy to break whatever we need to, and help people migrate.

Feb 28 2024, 10:52 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Feb 27 2024

gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

@Jdforrester-WMF FWIW I saw you started deprecation work in https://github.com/wikimedia/service-runner/pull/249/files.

Feb 27 2024, 11:14 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

I am taking a stab at this tasks, because we need gc and memory info to help track T357005: eventstreams regularly uses more than 95% of its memory limit.

Feb 27 2024, 11:12 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner
gmodena moved T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics from Next Up to In progress on the Data-Engineering (Sprint 9) board.
Feb 27 2024, 10:27 AM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Feb 26 2024

gmodena claimed T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.
Feb 26 2024, 7:12 PM · Data-Engineering, observability, ChangeProp, Event-Platform, service-runner

Feb 15 2024

gmodena moved T347586: [Maintenance] Delete sanitized events removed from sanitization list from In progress to Done on the Data-Engineering (Sprint 8) board.
Feb 15 2024, 6:14 PM · Data-Engineering (Sprint 8)
gmodena added a comment to T347586: [Maintenance] Delete sanitized events removed from sanitization list.

Data has been deleted from HDFS. It will be quarantined in hdfs://analytics-hadoop/user/hdfs/.Trash/Current/wmf/data/event_sanitized for a period longer than the on week grace time required by this task.

@JAllemandou could you ack if it's ok to move ahead and delete related tables from event?

Feb 15 2024, 6:14 PM · Data-Engineering (Sprint 8)
gmodena updated subscribers of T356866: [Data Quality] Update data_quality schemas to be compatible with Iceberg tables.

Spoke a bit about this with @xcollazo.

Feb 15 2024, 1:42 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

Open question: do we want webrequest.frontent (or whatever we settle on) to be a versioned stream? https://wikitech.wikimedia.org/wiki/Event_Platform/Stream_Configuration#Stream_versioning

Feb 15 2024, 11:49 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

the currently suggested one is webrequest.frontend. @gmodena, the idea there is to group all webrequest topics into the same stream, by setting topics manually in stream config. Gobblin will ingest the topics configured in stream config.

Feb 15 2024, 11:32 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a comment to T347586: [Maintenance] Delete sanitized events removed from sanitization list.

Data has been deleted from HDFS. It will be quarantined in hdfs://analytics-hadoop/user/hdfs/.Trash/Current/wmf/data/event_sanitized for a period longer than the on week grace time required by this task.

Feb 15 2024, 11:08 AM · Data-Engineering (Sprint 8)
gmodena updated the task description for T347586: [Maintenance] Delete sanitized events removed from sanitization list.
Feb 15 2024, 11:06 AM · Data-Engineering (Sprint 8)

Feb 14 2024

gmodena moved T347586: [Maintenance] Delete sanitized events removed from sanitization list from Next Up to In progress on the Data-Engineering (Sprint 8) board.
Feb 14 2024, 11:07 AM · Data-Engineering (Sprint 8)
gmodena added a comment to T347586: [Maintenance] Delete sanitized events removed from sanitization list.

May I proceed with deleting the tables from the Hive metastore for the impacted datasets?

Feb 14 2024, 11:07 AM · Data-Engineering (Sprint 8)
gmodena claimed T347586: [Maintenance] Delete sanitized events removed from sanitization list.
Feb 14 2024, 10:12 AM · Data-Engineering (Sprint 8)

Feb 13 2024

gmodena added a comment to T314956: [Event Platform] Declare webrequest as an Event Platform stream.

@Fabfur and I would like to start some integration tests in the short term. I moved the webrequest schema from GA to development in the primary repo. This follows the same process we adopted with page_change, and should allow for faster iteration speed without messing around with schema versions.

Feb 13 2024, 7:58 PM · Patch-For-Review, Data-Engineering, Event-Platform

Feb 12 2024

gmodena added a comment to T357005: eventstreams regularly uses more than 95% of its memory limit.

Looking at the logs, this seems to coincide with the redaction patch to eventstreams, but looking at the code I'm having a hard time finding where a memory leak could've happened... more confusing that it's just 1 or 2 pods hitting the limit

Feb 12 2024, 1:57 PM · Data-Engineering, Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

TBD on final stream name in T314956: [Event Platform] Declare webrequest as an Event Platform stream, but the currently suggested one is webrequest.frontend

Feb 12 2024, 11:22 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Feb 9 2024

gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

Both approaches are feasible (also at the same time if we do accept to increase the payload a little)...

Feb 9 2024, 12:34 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena updated subscribers of T351117: Move analytics log from Varnish to HAProxy.

@Fabfur here is example payload with added meta, as we'd expect to receive according to the WIP webrequest event schema.

{
  "meta": {
      dt: "2023-11-23T16:04:17Z", # value set by Benthos
      stream: "webrequest_text", # value set by Benthos
      domain: "en.wikipedia.org", # can we get this from HAProxy?
      request_id: request-uuid # can we get this from HAProxy?
      id: "event-uuid" # value set by Benthos? 
   },
  "accept": "application/json; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/Summary/1.2.0\"",
  "accept_language": "en",
  "backend": "ATS/9.1.4",
  "cache_status": "hit-front",
  "content_type": "application/json; charset=utf-8; profile=\"https://www.mediawiki.org/wiki/Specs/Summary/1.5.0\"",
  "dt": "2023-11-23T16:04:17Z", # value recorded by HAProxy
  "hostname": "cp3067.esams.wmnet",
  "http_method": "GET",
  "http_status": "200",
  "ip": "<REDACTED>",
  "range": "-",
  "referer": "https://en.wikipedia.org/w/index.php?title=Category:Films_based_on_non-fiction_books&pagefrom=Power+Play+%281978+film%29%0APower+Play+%281978+film%29",
  "response_size": 987,
  "sequence": 10558502962,
  "time_firstbyte": 0.000201,
  "tls": "vers=TLSv1.3;keyx=UNKNOWN;auth=ECDSA;ciph=AES-256-GCM-SHA384;prot=h2;sess=new",
  "uri_host": "en.wikipedia.org",
  "uri_path": "/api/rest_v1/page/summary/Secretariat_(film)",
  "uri_query": "",
  "user_agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36 Edg/119.0.0.0",
  "x_analytics": "WMF-Last-Access=23-Nov-2023;WMF-Last-Access-Global=23-Nov-2023;include_pv=0;https=1;client_port=33126",
  "x_cache": "cp3067 miss, cp3067 hit/5"
}
Feb 9 2024, 8:18 AM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic

Feb 8 2024

gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

@Fabfur nice!

Feb 8 2024, 7:13 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a comment to T351117: Move analytics log from Varnish to HAProxy.

Some updates about the ongoing work:

Feb 8 2024, 1:53 PM · Data Products, Patch-For-Review, Data-Engineering, Observability-Logging, Traffic
gmodena added a comment to T349763: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics.

tl;dr: our approach to address this spike is currently documented at https://wikitech.wikimedia.org/wiki/Data_Engineering/Data_Quality.

Feb 8 2024, 1:20 PM · Data-Engineering (Sprint 8), Patch-For-Review
gmodena moved T349763: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics from Blocked/Paused to Ready to Deploy on the Data-Engineering (Sprint 8) board.
Feb 8 2024, 1:18 PM · Data-Engineering (Sprint 8), Patch-For-Review
gmodena moved T356401: [BUG] webrequest analyzer DQ jobs fails to store data from Ready to Deploy to Done on the Data-Engineering (Sprint 8) board.
Feb 8 2024, 1:18 PM · Data-Engineering (Sprint 8)
gmodena moved T356628: [Data quality] Create database and tables for DQ backend from Ready to Deploy to Done on the Data-Engineering (Sprint 8) board.
Feb 8 2024, 8:56 AM · Data-Engineering (Sprint 8)
gmodena moved T356628: [Data quality] Create database and tables for DQ backend from In Review to Ready to Deploy on the Data-Engineering (Sprint 8) board.
Feb 8 2024, 8:56 AM · Data-Engineering (Sprint 8)
gmodena updated the task description for T356628: [Data quality] Create database and tables for DQ backend.
Feb 8 2024, 8:56 AM · Data-Engineering (Sprint 8)
gmodena added a comment to T356401: [BUG] webrequest analyzer DQ jobs fails to store data.

db and tables have been created:

spark-sql (default)> use wmf_data_ops;
Response code
Time taken: 2.698 seconds
spark-sql (default)> show tables;
database	tableName	isTemporary
wmf_data_ops	data_quality_alerts	false
wmf_data_ops	data_quality_metrics	false
Time taken: 0.569 seconds, Fetched 2 row(s)
Feb 8 2024, 8:55 AM · Data-Engineering (Sprint 8)

Feb 6 2024

gmodena updated subscribers of T356762: [Refine refactoring] Extract refine schema management into a dedicated tool.
Feb 6 2024, 1:46 PM · Data-Engineering (Q4 2024 April 1st - June 30th), Patch-For-Review
gmodena created T356762: [Refine refactoring] Extract refine schema management into a dedicated tool.
Feb 6 2024, 11:55 AM · Data-Engineering (Q4 2024 April 1st - June 30th), Patch-For-Review

Feb 5 2024

gmodena added a parent task for T314956: [Event Platform] Declare webrequest as an Event Platform stream: T354694: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition.
Feb 5 2024, 8:05 PM · Patch-For-Review, Data-Engineering, Event-Platform
gmodena added a subtask for T354694: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition: T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Feb 5 2024, 8:04 PM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T356628: [Data quality] Create database and tables for DQ backend from In progress to In Review on the Data-Engineering (Sprint 8) board.
Feb 5 2024, 8:04 PM · Data-Engineering (Sprint 8)
gmodena claimed T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Feb 5 2024, 8:01 PM · Patch-For-Review, Data-Engineering, Event-Platform
gmodena updated subscribers of T314956: [Event Platform] Declare webrequest as an Event Platform stream.
Feb 5 2024, 6:34 PM · Patch-For-Review, Data-Engineering, Event-Platform
gmodena updated subscribers of T356628: [Data quality] Create database and tables for DQ backend.
Feb 5 2024, 1:58 PM · Data-Engineering (Sprint 8)
gmodena renamed T356628: [Data quality] Create database and tables for DQ backend from [NEEDS GROOMING][Data quality] Create database and tables for DQ backend to [Data quality] Create database and tables for DQ backend.
Feb 5 2024, 1:57 PM · Data-Engineering (Sprint 8)
gmodena moved T349763: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics from Ready to Deploy to Blocked/Paused on the Data-Engineering (Sprint 8) board.
Feb 5 2024, 11:34 AM · Data-Engineering (Sprint 8), Patch-For-Review
gmodena set the point value for T356628: [Data quality] Create database and tables for DQ backend to 1.
Feb 5 2024, 11:34 AM · Data-Engineering (Sprint 8)
gmodena created T356628: [Data quality] Create database and tables for DQ backend.
Feb 5 2024, 11:34 AM · Data-Engineering (Sprint 8)
gmodena moved T349763: [Data Quality] Develop Airflow post processing instrumentation to collect and log configurable data metrics from In Review to Ready to Deploy on the Data-Engineering (Sprint 8) board.
Feb 5 2024, 10:44 AM · Data-Engineering (Sprint 8), Patch-For-Review
gmodena moved T354694: [Maintenance] Safeguard VarnishKafka to HAProxy analytics transition from Blocked/Paused to In progress on the Data-Engineering (Sprint 8) board.
Feb 5 2024, 10:43 AM · Data-Engineering (Q4 2024 April 1st - June 30th)
gmodena moved T356401: [BUG] webrequest analyzer DQ jobs fails to store data from In progress to Ready to Deploy on the Data-Engineering (Sprint 8) board.
Feb 5 2024, 10:43 AM · Data-Engineering (Sprint 8)

Feb 1 2024

gmodena added a comment to T356401: [BUG] webrequest analyzer DQ jobs fails to store data.

On prod (an-launcher1002, job submitted with user analytics) missing databases are not created.

Feb 1 2024, 8:04 PM · Data-Engineering (Sprint 8)
gmodena updated subscribers of T356401: [BUG] webrequest analyzer DQ jobs fails to store data.
Feb 1 2024, 12:51 PM · Data-Engineering (Sprint 8)
gmodena added a comment to T356401: [BUG] webrequest analyzer DQ jobs fails to store data.

Investigating.
...

  • On dev enviroments (stat1005, job submitted with user gmodena) missing databases are created.
  • On prod (an-launcher1002, job submitted with user analytics) missing databases are not created.
Feb 1 2024, 12:50 PM · Data-Engineering (Sprint 8)
gmodena claimed T356401: [BUG] webrequest analyzer DQ jobs fails to store data.
Feb 1 2024, 12:46 PM · Data-Engineering (Sprint 8)