Page MenuHomePhabricator

tchin (Thomas)
Software Engineer

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Jun 21 2021, 2:34 PM (141 w, 10 h)
Availability
Available
LDAP User
TChin
MediaWiki User
TChin (WMF) [ Global Accounts ]

Recent Activity

Tue, Feb 27

tchin added a comment to T350180: Upgrade prom-client in NodeJS service-runner and enable collectDefaultMetrics.

If it's to a point where we even need to use a new name, might as well break everything. I'd love to join in on the fun

Tue, Feb 27, 2:09 PM · Data-Engineering (Sprint 9), observability, ChangeProp, Event-Platform, service-runner

Sun, Feb 11

tchin moved T357005: eventstreams regularly uses more than 95% of its memory limit from Next Up to Radar (External Teams) on the Data-Engineering (Sprint 8) board.
Sun, Feb 11, 3:04 AM · Data-Engineering (Sprint 9), Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes
tchin edited projects for T357005: eventstreams regularly uses more than 95% of its memory limit, added: Data-Engineering (Sprint 8); removed Data-Engineering.
Sun, Feb 11, 3:03 AM · Data-Engineering (Sprint 9), Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes
tchin added a comment to T357005: eventstreams regularly uses more than 95% of its memory limit.

Looking at the logs, this seems to coincide with the redaction patch to eventstreams, but looking at the code I'm having a hard time finding where a memory leak could've happened... more confusing that it's just 1 or 2 pods hitting the limit

Sun, Feb 11, 3:01 AM · Data-Engineering (Sprint 9), Event-Platform, EventStreams, serviceops, Prod-Kubernetes, Kubernetes

Jan 30 2024

tchin moved T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Blocked/Paused to Ready to Deploy on the Data-Engineering (Sprint 8) board.
Jan 30 2024, 2:17 PM · Data-Engineering (Sprint 8)

Jan 22 2024

tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

Using lz4 compression works but checking it with parquet-tools doesn't. I see something like compression: UNKNOWN (space_saved: -25%) Seems like a known issue.

Jan 22 2024, 1:51 PM · Data-Engineering (Sprint 7), Patch-For-Review

Jan 5 2024

tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

INSERT OVERRIDE with PARTITION also doesn't work anymore because Iceberg uses hidden partitioning so had to enable Spark's dynamic overwrite
https://iceberg.apache.org/docs/latest/spark-writes/#insert-overwrite

Jan 5 2024, 6:32 PM · Data-Engineering (Sprint 7), Patch-For-Review
tchin added a comment to T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.

TIL when setting the compression codec to snappy, Iceberg doesn't end the files in hdfs with .snappy.parquet. I had to check if the format was correct using parquet-tools.

Jan 5 2024, 6:23 PM · Data-Engineering (Sprint 7), Patch-For-Review
tchin moved T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg from Next Up to In progress on the Data-Engineering (Sprint 6) board.
Jan 5 2024, 5:58 PM · Data-Engineering (Sprint 7), Patch-For-Review
tchin claimed T352671: [Iceberg Migration] Migrate interlanguage tables to Iceberg.
Jan 5 2024, 5:58 PM · Data-Engineering (Sprint 7), Patch-For-Review

Dec 19 2023

tchin added a comment to T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.

Tested to see if the COALESCE hints still work in Iceberg by creating 2 tables and filling then with/without the hint. It still seems to work.

Dec 19 2023, 7:30 AM · Data-Engineering (Sprint 8)

Dec 18 2023

tchin awarded T336739: Post Oozie -> Airflow migration refactorings a Barnstar token.
Dec 18 2023, 3:12 PM · Patch-For-Review, Data-Engineering, Epic, Data Pipelines
tchin moved T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Next Up to In progress on the Data-Engineering (Sprint 6) board.
Dec 18 2023, 2:53 PM · Data-Engineering (Sprint 8)

Dec 16 2023

tchin added a comment to T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.

Tested on a stat machine with

CREATE EXTERNAL TABLE IF NOT EXISTS `aqs_hourly`(  
    `cache_status`      string     COMMENT 'Cache status',  
    `http_status`       string     COMMENT 'HTTP status of response',  
    `http_method`       string     COMMENT 'HTTP method of request',  
    `response_size`     bigint     COMMENT 'Response size',  
    `uri_host`          string     COMMENT 'Host of request',  
    `uri_path`          string     COMMENT 'Path of request',  
    `request_count`     bigint     COMMENT 'Number of requests',  
    `hour`              timestamp  COMMENT 'The aggregated hour. Covers from minute 00 to 59'  
)  
USING ICEBERG
PARTITIONED BY (days(hour))
;

And

spark3-sql --master yarn --executor-memory 8G --executor-cores 4 --driver-memory 2G --conf spark.dynamicAllocation.maxExecutors=64 \
-f aqs_hourly_iceberg.hql  \
-d source_table=wmf.webrequest \
-d webrequest_source=text \
-d destination_table=tchin.aqs_hourly \
-d coalesce_partitions=1 \
-d year=2023 \
-d month=12 \
-d day=3 \
-d hour=0
Dec 16 2023, 4:23 AM · Data-Engineering (Sprint 8)

Dec 14 2023

tchin changed the status of T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg, a subtask of T333013: [Iceberg Migration] Apache Iceberg Migration, from Open to In Progress.
Dec 14 2023, 6:16 AM · Data-Engineering, Epic
tchin changed the status of T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg from Open to In Progress.
Dec 14 2023, 6:16 AM · Data-Engineering (Sprint 8)

Dec 11 2023

tchin awarded T311866: Migrate Database::select usages to SelectQueryBuilder a Barnstar token.
Dec 11 2023, 3:10 PM · MW-1.41-notes (1.41.0-wmf.25; 2023-09-05), MW-1.40-notes (1.40.0-wmf.26; 2023-03-06), MW-1.39-notes (1.39.0-wmf.26; 2022-08-22), Patch-For-Review, Data-Persistence (work done), Platform Engineering
tchin claimed T352669: [Iceberg Migration] Migrate aqs hourly tables to Iceberg.
Dec 11 2023, 2:45 PM · Data-Engineering (Sprint 8)

Dec 2 2023

tchin awarded T347347: Make "Quick" MW install a thing a Love token.
Dec 2 2023, 11:21 PM · MW-1.42-notes (1.42.0-wmf.12; 2024-01-02), User-zeljkofilipin, MediaWiki-Platform-Team, MediaWiki-Documentation

Nov 14 2023

tchin added a comment to T351092: [tbs] Improve Harbor quota handling and docs.

I think the per-image quota should probably be increased. I tested building a few projects locally and a project with NodeJS and 0 dependencies results in a built image that's 805.58 MB. One with only VueJS as a dependency bumps it up to 858.13 MB. I'm probably not going to be the last one who needs more than 200 MB of working space :/

Nov 14 2023, 4:40 AM · Toolforge Build Service, Documentation

Nov 13 2023

tchin added a comment to T351092: [tbs] Improve Harbor quota handling and docs.

Example error:

step-export: 2023-11-13T05:41:56.835942824Z ERROR: failed to export: failed to write image to the following tags: [tools-harbor.wmcloud.org/tool-dpe-alerts-dashboard/tool-dpe-alerts-dashboard:latest: PATCH https://tools-harbor.wmcloud.org/v2/tool-dpe-alerts-dashboard/tool-dpe-alerts-dashboard/blobs/uploads/b62dd944-4fad-4ee8-b900-8409f7860d6c?_state=REDACTED: unexpected status code 413 Request Entity Too Large: <html>
step-export: 2023-11-13T05:41:56.835973012Z <head><title>413 Request Entity Too Large</title></head>
step-export: 2023-11-13T05:41:56.835976984Z <body>
step-export: 2023-11-13T05:41:56.835979969Z <center><h1>413 Request Entity Too Large</h1></center>
step-export: 2023-11-13T05:41:56.835983468Z <hr><center>nginx/1.18.0</center>
step-export: 2023-11-13T05:41:56.836002364Z </body>
step-export: 2023-11-13T05:41:56.836005027Z </html>
step-export: 2023-11-13T05:41:56.836008032Z ]
step-export: 
step-results: 2023-11-13T05:41:57.433667715Z 2023/11/13 05:41:57 Skipping step because a previous step failed
Nov 13 2023, 2:57 PM · Toolforge Build Service, Documentation

Oct 26 2023

tchin added a comment to T347706: [Data Quality] [SPIKE] Document Current Logging, Monitoring and Data Quality Checks for Unique Devices.

Current version of the writeup is here

Oct 26 2023, 3:53 PM · Data Engineering and Event Platform Team (Sprint 4)

Oct 11 2023

tchin added a comment to T345389: [SPIKE] Should we introduce static typing to Event Platform nodejs codebases?.

If we do introduce something, we should use JSDoc3 and follow what's happening on this ticket T138401

Oct 11 2023, 2:28 PM · Data-Engineering, Event-Platform

Oct 3 2023

tchin moved T347706: [Data Quality] [SPIKE] Document Current Logging, Monitoring and Data Quality Checks for Unique Devices from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 3) board.
Oct 3 2023, 5:39 PM · Data Engineering and Event Platform Team (Sprint 4)

Sep 29 2023

tchin added a comment to T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.

DeliveryGuarantee.AT_LEAST_ONCE: The sink will wait for all outstanding records in the Kafka buffers to be acknowledged by the Kafka producer on a checkpoint. No messages will be lost in case of any issue with the Kafka brokers but messages may be duplicated when Flink restarts because Flink reprocesses old input records.

https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/connectors/datastream/kafka/#fault-tolerance

Sep 29 2023, 12:11 PM · Event-Platform, Data Engineering and Event Platform Team, Data-Engineering, Data-Platform-SRE
tchin merged T347615: mw-page-content-change-enrich not checkpointing into T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.
Sep 29 2023, 11:57 AM · Event-Platform, Data Engineering and Event Platform Team, Data-Engineering, Data-Platform-SRE
tchin merged task T347615: mw-page-content-change-enrich not checkpointing into T347676: Partition reassignment on kafka-jumbo negatively impacting mw-page-content-change-enrich.
Sep 29 2023, 11:57 AM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform

Sep 28 2023

tchin added a comment to T347615: mw-page-content-change-enrich not checkpointing.

Unaligned checkpoints didn't work. Maybe it's because of data being moved around to new brokers and Kafka is too overloaded.

Sep 28 2023, 6:04 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin updated subscribers of T347615: mw-page-content-change-enrich not checkpointing.
Sep 28 2023, 6:00 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin moved T347615: mw-page-content-change-enrich not checkpointing from Data Eng Backlog to Sprint 2 on the Data Engineering and Event Platform Team board.
Sep 28 2023, 5:59 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin moved T347615: mw-page-content-change-enrich not checkpointing from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 2) board.
Sep 28 2023, 5:59 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin renamed T347615: mw-page-content-change-enrich not checkpointing from mw-page-content-change-enrich not checkpoint to mw-page-content-change-enrich not checkpointing.
Sep 28 2023, 5:27 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin created T347615: mw-page-content-change-enrich not checkpointing.
Sep 28 2023, 5:26 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform
tchin added a comment to T347521: Troubleshoot mw-page-content-change-enrich and flink-operator.

@bking Gabriele is currently on sick leave but yes let's try incrementing the helm chart version

Sep 28 2023, 1:29 PM · Data-Platform-SRE

Sep 19 2023

tchin placed T287405: Refactor ILocalizedException to be DI-friendly. up for grabs.
Sep 19 2023, 6:59 AM · MediaWiki-General, MW-1.41-notes (1.41.0-wmf.30; 2023-10-10), Patch-For-Review, User-thiemowmde, WMDE-TechWish-Maintenance, Move-Files-To-Commons, MW-1.37-notes (1.37.0-wmf.23; 2021-09-13), Dependency injection, User-DannyS712, Platform Team Workboards (MW Expedition)
tchin placed T291009: LoadExtensionSchemaUpdates hook needs to have access to Config up for grabs.
Sep 19 2023, 6:58 AM · MediaWiki-Core-Hooks, Platform Team Workboards (MW Expedition)

Aug 31 2023

tchin added a comment to T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Associated GitHub PR: https://github.com/wikimedia/jsonschema-tools/pull/48

Aug 31 2023, 5:21 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 29 2023

tchin moved T344511: Enum with an entry of `null` should fail jsonschema-tools validation from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 1) board.
Aug 29 2023, 5:34 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review
tchin edited projects for T344511: Enum with an entry of `null` should fail jsonschema-tools validation, added: Data Engineering and Event Platform Team (Sprint 1); removed Data Engineering and Event Platform Team.
Aug 29 2023, 5:34 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review
tchin claimed T344511: Enum with an entry of `null` should fail jsonschema-tools validation.
Aug 29 2023, 5:19 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review
tchin added a comment to T344511: Enum with an entry of `null` should fail jsonschema-tools validation.

Seems like in jsonschema-tools the enums are only validated through ajv and their strict union type checking allows null so will have to implement the check ourselves

Aug 29 2023, 5:18 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 28 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

While adding a workaround to T344235, I noticed that additionalProperties isn't very well represented in DataHub.

"custom_data": {
    "additionalProperties": {
        "properties": {
            "data_type": {
                "type": "string",
                "enum": ["number", "string", "boolean", "null"],
            }
        }
    },
    "propertyNames": {
        "maxLength": 255,
        "minLength": 1,
        "pattern": "^[$a-z]+[a-z0-9_]*$",
    },
},

Just shows up in DataHub as a Struct with no defined nested fields (which I guess makes sense, but is not helpful).

Aug 28 2023, 5:52 AM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 22 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

From the recent meeting:

  • Event Streams will be the name of the platform
  • Streams are upstream to Kafka topics
Aug 22 2023, 6:30 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform
tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

After experimenting a lot, I have a Datahub transformer for Kafka that generates an Event Streams platform, adds description, schema, and path. However, I don't know if it should be a transformer since it's doing a bit more than just transforming.

Aug 22 2023, 5:23 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Aug 18 2023

tchin created T344511: Enum with an entry of `null` should fail jsonschema-tools validation.
Aug 18 2023, 5:55 PM · Data Engineering and Event Platform Team (Sprint 2), Data-Engineering, Event-Platform, Patch-For-Review

Aug 16 2023

tchin added a comment to T318863: [Event Platform] Event Platform and DataHub Integration.

Since Datahub has the concept of platforms, I think the best way forward is to have a separate platform called Event Streams where the datasets under it are the streams defined in the stream config. We can then keep the Kafka platform for all the individual Kafka topics. Then what we can do is have a transform attached to the current Kafka ingestion recipe that will attach the schemas to the individual topics when supported but also at the same time insert the streams into the Event Streams platform. This way we can have the schemas on both the stream and its topics

Aug 16 2023, 6:27 PM · Data Engineering and Event Platform Team (Sprint 3), Data-Engineering, Data-Catalog, Event-Platform

Jul 28 2023

tchin claimed T341277: mediawiki page_content_change should generate new meta.id field.
Jul 28 2023, 5:54 AM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform
tchin moved T341277: mediawiki page_content_change should generate new meta.id field from Next Up to In Review on the Data Engineering and Event Platform Team (Sprint 1) board.
Jul 28 2023, 5:54 AM · Data Engineering and Event Platform Team (Sprint 1), Data-Engineering, Event-Platform

Jul 12 2023

tchin added a comment to T340765: jsonschema-tools test should fail if fields are removed in new (non major) version.

On the wiki for schema guidelines there's a blanket statement that all modifications should be backwards compatible - I assume this doesn't apply to major version changes so will note that

Jul 12 2023, 4:12 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering

Jul 10 2023

tchin moved T300404: jsonschema-tools tests should fail if schema $id does not match title or path from In progress to In Review on the Event-Platform (Sprint 14 B) board.
Jul 10 2023, 12:44 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T300404: jsonschema-tools tests should fail if schema $id does not match title or path from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 10 2023, 12:44 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T340765: jsonschema-tools test should fail if fields are removed in new (non major) version from Next Up to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 10 2023, 12:43 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering
tchin edited projects for T340765: jsonschema-tools test should fail if fields are removed in new (non major) version, added: Data Engineering and Event Platform Team (Sprint 0); removed Data Engineering and Event Platform Team.
Jul 10 2023, 12:43 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering

Jul 6 2023

tchin moved T300404: jsonschema-tools tests should fail if schema $id does not match title or path from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 6 2023, 5:35 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T333795: Event Catalog: Standardize Options Handling from In Review to Done on the Event-Platform (Sprint 14 B) board.
Jul 6 2023, 5:45 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T340765: jsonschema-tools test should fail if fields are removed in new (non major) version from In progress to In Review on the Event-Platform (Sprint 14 B) board.
Jul 6 2023, 5:03 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering
tchin moved T340746: mw-page-content-change-enrich should bump page_change schema from Next Up to In Review on the Event-Platform (Sprint 14 B) board.
Jul 6 2023, 4:34 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T300404: jsonschema-tools tests should fail if schema $id does not match title or path from Next Up to In progress on the Event-Platform (Sprint 14 B) board.
Jul 6 2023, 4:33 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin moved T340765: jsonschema-tools test should fail if fields are removed in new (non major) version from Next Up to In progress on the Event-Platform (Sprint 14 B) board.
Jul 6 2023, 4:33 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering
tchin edited projects for T340765: jsonschema-tools test should fail if fields are removed in new (non major) version, added: Event-Platform (Sprint 14 B); removed Event-Platform.
Jul 6 2023, 4:33 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B), Data-Engineering

Jun 21 2023

tchin added a comment to T335024: Update eventgate and eventstreams helm chart to use automatic kafka egress networkpolicies and envoy service mesh.

I could try taking a crack at it

Jun 21 2023, 6:02 PM · Event-Platform (Sprint 14 B), Data-Engineering
tchin moved T333795: Event Catalog: Standardize Options Handling from Next Up to In progress on the Event-Platform (Sprint 14 B) board.
Jun 21 2023, 4:57 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)

Jun 20 2023

tchin moved T337855: jsonschema-tools deterministic schema test should fail if a schema uses oneOf with different types from In Review to Done on the Event-Platform (Sprint 14 B) board.
Jun 20 2023, 8:07 AM · Event-Platform (Sprint 14 B), Data-Engineering
tchin moved T338228: jsonschema-tools deterministic schema test should fail if a object field does not have schema from In Review to Done on the Event-Platform (Sprint 14 B) board.
Jun 20 2023, 8:07 AM · Event-Platform (Sprint 14 B), Data-Engineering
tchin updated the task description for T338228: jsonschema-tools deterministic schema test should fail if a object field does not have schema.
Jun 20 2023, 8:07 AM · Event-Platform (Sprint 14 B), Data-Engineering

Jun 14 2023

tchin updated the task description for T338228: jsonschema-tools deterministic schema test should fail if a object field does not have schema.
Jun 14 2023, 7:47 PM · Event-Platform (Sprint 14 B), Data-Engineering

Jun 13 2023

tchin awarded T337964: Temporarily replace the Phabricator logo for Pride Month a Love token.
Jun 13 2023, 3:21 AM · Release-Engineering-Team, User-brennen, Phabricator

Jun 8 2023

tchin moved T337400: Get coverage artifacts from Kokkuri from Next Up to In progress on the Event-Platform (Sprint 14 B) board.
Jun 8 2023, 12:54 PM · Event-Platform (Sprint 14 B), Data-Engineering

Jun 6 2023

tchin added a comment to T337400: Get coverage artifacts from Kokkuri.

Can you point me to the job output showing the hang?

Jun 6 2023, 7:56 PM · Event-Platform (Sprint 14 B), Data-Engineering
tchin moved T337395: Remove user is_registered field from mediawiki/page/change schema from In progress to Blocked/Paused on the Event-Platform (Sprint 14 B) board.
Jun 6 2023, 1:26 PM · MW-1.41-notes (1.41.0-wmf.12; 2023-06-06), Event-Platform (Sprint 14 B), Data-Engineering
tchin moved T337855: jsonschema-tools deterministic schema test should fail if a schema uses oneOf with different types from Next Up to In progress on the Event-Platform (Sprint 14 B) board.
Jun 6 2023, 1:17 PM · Event-Platform (Sprint 14 B), Data-Engineering

Jun 5 2023

tchin closed T324980: Event Driven Enrichment Pipelines repositories should be generated from a template as Resolved.
Jun 5 2023, 7:22 PM · Event-Platform (Sprint 12), Data-Engineering-Planning
tchin added a comment to T335045: mediawiki-event-enrichment and event enrichment job repo templating should bundle schema repos.

Is there a benefit to doing this in blubber though?

Jun 5 2023, 1:29 PM · Data-Engineering, Event-Platform
tchin added a comment to T335045: mediawiki-event-enrichment and event enrichment job repo templating should bundle schema repos.

The cookiecutter template also does this via a post-generation hook
https://gitlab.wikimedia.org/repos/data-engineering/eventutilities-python/-/blob/main/cookiecutter-event-pipeline/hooks/post_gen_project.py

Jun 5 2023, 1:25 PM · Data-Engineering, Event-Platform
tchin updated subscribers of T337400: Get coverage artifacts from Kokkuri.

Ok so just recounting my experiments:

Jun 5 2023, 8:12 AM · Event-Platform (Sprint 14 B), Data-Engineering
tchin updated the task description for T337400: Get coverage artifacts from Kokkuri.
Jun 5 2023, 8:09 AM · Event-Platform (Sprint 14 B), Data-Engineering

May 25 2023

tchin moved T337395: Remove user is_registered field from mediawiki/page/change schema from Next Up to In progress on the Event-Platform (Sprint 14 A) board.
May 25 2023, 1:11 PM · MW-1.41-notes (1.41.0-wmf.12; 2023-06-06), Event-Platform (Sprint 14 B), Data-Engineering
tchin edited projects for T337395: Remove user is_registered field from mediawiki/page/change schema, added: Event-Platform (Sprint 14 A); removed Event-Platform.
May 25 2023, 1:09 PM · MW-1.41-notes (1.41.0-wmf.12; 2023-06-06), Event-Platform (Sprint 14 B), Data-Engineering

May 24 2023

tchin added a subtask for T328013: Improve mediawiki-event-enrichment test suite: T337400: Get coverage artifacts from Kokkuri.
May 24 2023, 2:30 PM · Event-Platform (Sprint 14 B), Data-Engineering-Planning
tchin added a parent task for T337400: Get coverage artifacts from Kokkuri: T328013: Improve mediawiki-event-enrichment test suite.
May 24 2023, 2:30 PM · Event-Platform (Sprint 14 B), Data-Engineering
tchin created T337400: Get coverage artifacts from Kokkuri.
May 24 2023, 2:29 PM · Event-Platform (Sprint 14 B), Data-Engineering
tchin claimed T337395: Remove user is_registered field from mediawiki/page/change schema.
May 24 2023, 1:39 PM · MW-1.41-notes (1.41.0-wmf.12; 2023-06-06), Event-Platform (Sprint 14 B), Data-Engineering

May 8 2023

tchin moved T328013: Improve mediawiki-event-enrichment test suite from Next Up to In progress on the Event-Platform (Sprint 12) board.
May 8 2023, 1:00 PM · Event-Platform (Sprint 14 B), Data-Engineering-Planning
tchin claimed T328013: Improve mediawiki-event-enrichment test suite.
May 8 2023, 1:00 PM · Event-Platform (Sprint 14 B), Data-Engineering-Planning

May 7 2023

tchin added a comment to T328013: Improve mediawiki-event-enrichment test suite.

Oof, was looking at how to potentially mock the http session and response object, but turns out mocks don't work when pickled/multiprocessed. I guess the only option is to spin up a web server during testing and hit that instead

May 7 2023, 9:08 PM · Event-Platform (Sprint 14 B), Data-Engineering-Planning

May 5 2023

tchin added a comment to T335802: eventutilities-python manager should set up python logging with ECS format.

Do we know what's turning them into ecs format in the first place?

May 5 2023, 6:53 AM · Event-Platform (Sprint 14 A), Data-Engineering

May 4 2023

tchin moved T335802: eventutilities-python manager should set up python logging with ECS format from Next Up to In progress on the Event-Platform (Sprint 12) board.
May 4 2023, 1:28 PM · Event-Platform (Sprint 14 A), Data-Engineering
tchin moved T335802: eventutilities-python manager should set up python logging with ECS format from Backlog to Sprint 12 on the Event-Platform board.
May 4 2023, 1:18 PM · Event-Platform (Sprint 14 A), Data-Engineering
tchin claimed T335802: eventutilities-python manager should set up python logging with ECS format.
May 4 2023, 1:13 PM · Event-Platform (Sprint 14 A), Data-Engineering

May 2 2023

tchin moved T324980: Event Driven Enrichment Pipelines repositories should be generated from a template from In progress to In Review on the Event-Platform (Sprint 12) board.
May 2 2023, 1:03 PM · Event-Platform (Sprint 12), Data-Engineering-Planning

Apr 19 2023

tchin moved T327251: Q4 eventutilities-python should bundle java deps. from In Review to Done on the Event-Platform (Sprint 11) board.
Apr 19 2023, 12:45 PM · Event-Platform (Sprint 11), Data-Engineering-Planning

Apr 18 2023

tchin moved T327251: Q4 eventutilities-python should bundle java deps. from In Progress to In Review on the Event-Platform (Sprint 11) board.
Apr 18 2023, 7:02 AM · Event-Platform (Sprint 11), Data-Engineering-Planning

Apr 17 2023

tchin added a comment to T327251: Q4 eventutilities-python should bundle java deps..

Don't know how to connect gitlab merge requests to phab but here's the link for posterity's sake:
Bundle Java jars when building wheel

Apr 17 2023, 1:09 PM · Event-Platform (Sprint 11), Data-Engineering-Planning

Apr 10 2023

tchin added a comment to T327251: Q4 eventutilities-python should bundle java deps..

A less opaque place for inspiration is that the py4j library bundles the jar with its python wheel as well. They leave adding it to the classpath to the user though. I actually don't see how we'd include the jars in the classpath without injecting them at runtime. Does something in pyflink automatically find it?

Apr 10 2023, 7:59 PM · Event-Platform (Sprint 11), Data-Engineering-Planning

Apr 3 2023

tchin claimed T333795: Event Catalog: Standardize Options Handling.
Apr 3 2023, 3:20 PM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin updated subscribers of T333795: Event Catalog: Standardize Options Handling.
Apr 3 2023, 6:45 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)
tchin created T333795: Event Catalog: Standardize Options Handling.
Apr 3 2023, 6:44 AM · Data Engineering and Event Platform Team (Sprint 0), Event-Platform (Sprint 14 B)

Mar 29 2023

tchin claimed T331542: EventStreamCatalog should not remove user specified options in CREATE TABLE statements.
Mar 29 2023, 1:05 PM · Data-Engineering, Event-Platform

Mar 22 2023

tchin moved T330441: Flink EventStreamCatalog should add watermark from In Review to Done on the Event-Platform (Sprint 10) board.
Mar 22 2023, 2:07 PM · Event-Platform (Sprint 10), Data-Engineering-Planning

Mar 16 2023

tchin moved T330703: Flink EventStreamCatalog should not prevent creation of VIEWs from In Review to Done on the Event-Platform (Sprint 10) board.
Mar 16 2023, 9:21 PM · Event-Platform (Sprint 10), Data-Engineering-Planning