Page MenuHomePhabricator

xcollazo (Xabriel J. Collazo Mojica)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Jun 9 2022, 6:42 PM (10 w, 10 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
XCollazo-WMF [ Global Accounts ]

Recent Activity

Today

xcollazo added a comment to T311860: Set up the section topics data pipeline Spark code base.

I wonder whether a deploy token would make more sense, see https://gitlab.wikimedia.org/help/user/project/deploy_keys/index and https://gitlab.wikimedia.org/help/user/project/deploy_tokens/index#gitlab-deploy-token

Looks like deploy tokens can only read, and there is no option for them to commit, which is what we use the ssh keypair for.

Fri, Aug 19, 3:32 AM · Structured-Data-Backlog (Current Work), Section-Topics
xcollazo updated the task description for T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Fri, Aug 19, 3:13 AM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo updated the task description for T315633: Decomission an-airflow1003 (legacy platform_eng instance).
Fri, Aug 19, 3:12 AM · Data Pipelines, Data Engineering Planning
xcollazo created T315633: Decomission an-airflow1003 (legacy platform_eng instance).
Fri, Aug 19, 3:11 AM · Data Pipelines, Data Engineering Planning

Yesterday

xcollazo closed T312858: New airflow instance related to Image Suggestion Jobs as Resolved.

All right! Just verified that the image_suggestions job is running smoothly on the new an-airflow1004.eqiad.wmnet Airflow instance.

Thu, Aug 18, 11:10 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo closed T312858: New airflow instance related to Image Suggestion Jobs, a subtask of T311417: Migration of Image Suggestion Job to DE Data Pipeline, as Resolved.
Thu, Aug 18, 11:10 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo closed T314591: Couple fixes for image-suggestions repo as Resolved.

Applied (1) via https://gitlab.wikimedia.org/repos/generated-data-platform/image-suggestions/-/merge_requests/1. Closing.

Thu, Aug 18, 11:05 PM · Data Engineering Planning, Image-Suggestions
xcollazo closed T314591: Couple fixes for image-suggestions repo, a subtask of T311417: Migration of Image Suggestion Job to DE Data Pipeline, as Resolved.
Thu, Aug 18, 11:05 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

I have updated the Wikitech documentation for that

Thanks @KCVelaga_WMF !

Thu, Aug 18, 7:18 PM · Data Pipelines (Sprint 00), Data Engineering Planning

Wed, Aug 17

xcollazo created T315486: Add xcollazo@wikimedia.org to the analytics-alerts mailing list.
Wed, Aug 17, 8:05 PM · Mail, Infrastructure-Foundations, SRE
xcollazo closed T314147: Data Pipeline for Unique Editor metrics by Country as Resolved.

Verified that the table content and airflow job are working as designed. Closing.

Wed, Aug 17, 7:16 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo closed T314147: Data Pipeline for Unique Editor metrics by Country, a subtask of T310224: New Data Pipeline for Unique Editor metrics by Geo (Country & Region), as Resolved.
Wed, Aug 17, 7:16 PM · Data Pipelines, Data Engineering Planning, Foundational Technology Requests
xcollazo updated the task description for T314147: Data Pipeline for Unique Editor metrics by Country.
Wed, Aug 17, 7:10 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314389: [SPIKE] Decide on technical solution for page state stream backfill process.

If we can do this with Flink, we should, since then we don't have to maintain 2 codebases that do the same thing. But, it also might prove too difficult, and in that case we'd use Spark.

Wed, Aug 17, 2:13 PM · Data-Engineering, Event-Platform Value Stream (Sprint 00), Spike

Tue, Aug 16

xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

Successfully deployed the migration script and Airflow job today.

Tue, Aug 16, 9:24 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

Updated migration notes to use Spark3.

Tue, Aug 16, 4:22 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

Copying some notes I sent thru email here for completeness:

Tue, Aug 16, 1:32 AM · Data Pipelines (Sprint 00), Data Engineering Planning

Mon, Aug 15

xcollazo added a comment to T311860: Set up the section topics data pipeline Spark code base.

Just wanted to confirm if there's any specific requirement for the CI user credentials that will commit the release

Mon, Aug 15, 2:35 PM · Structured-Data-Backlog (Current Work), Section-Topics

Fri, Aug 5

xcollazo moved T314147: Data Pipeline for Unique Editor metrics by Country from Ready to deploy to In code review on the Data Engineering Planning (Sprint 02) board.
Fri, Aug 5, 5:30 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo updated subscribers of T314147: Data Pipeline for Unique Editor metrics by Country.

Status update:

Fri, Aug 5, 5:29 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo moved T314147: Data Pipeline for Unique Editor metrics by Country from In code review to Ready to deploy on the Data Engineering Planning (Sprint 02) board.
Fri, Aug 5, 5:25 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo moved T312858: New airflow instance related to Image Suggestion Jobs from In code review to Ready to deploy on the Data Engineering Planning (Sprint 02) board.
Fri, Aug 5, 5:25 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo updated the task description for T314592: Requesting membership of the analytics group in gerrit for 'snwachukwu' and 'nokafor'.
Fri, Aug 5, 5:13 PM · Gerrit-Privilege-Requests, Data-Engineering-Radar, Release-Engineering-Team
xcollazo added a comment to T312858: New airflow instance related to Image Suggestion Jobs.

Status update:

Fri, Aug 5, 4:52 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo added a comment to T314591: Couple fixes for image-suggestions repo.

Fixed (2) above via https://gitlab.wikimedia.org/repos/generated-data-platform/image-suggestions/-/commit/02d0ba2070a61bcac37f41a09b50224eec1c97ae

Fri, Aug 5, 12:21 PM · Data Engineering Planning, Image-Suggestions

Thu, Aug 4

xcollazo added a project to T314591: Couple fixes for image-suggestions repo: Data Engineering Planning.
Thu, Aug 4, 6:23 PM · Data Engineering Planning, Image-Suggestions
xcollazo updated subscribers of T314147: Data Pipeline for Unique Editor metrics by Country.

( After a conversation with @KCVelaga, we decided to keep things simple for now, thus the monthly table unique_editors_per_country_monthly will not be using GROUPING SETs .)

Thu, Aug 4, 4:27 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

Merge request for the new pipeline https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/114

Thu, Aug 4, 4:21 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T314147: Data Pipeline for Unique Editor metrics by Country.

Notes to do migration contained here:

Thu, Aug 4, 4:17 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a subtask for T311417: Migration of Image Suggestion Job to DE Data Pipeline: T314591: Couple fixes for image-suggestions repo.
Thu, Aug 4, 3:59 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a parent task for T314591: Couple fixes for image-suggestions repo: T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Thu, Aug 4, 3:59 PM · Data Engineering Planning, Image-Suggestions
xcollazo created T314591: Couple fixes for image-suggestions repo.
Thu, Aug 4, 3:58 PM · Data Engineering Planning, Image-Suggestions

Wed, Aug 3

xcollazo added a comment to T312858: New airflow instance related to Image Suggestion Jobs.

Synced up with Ben over chat, copying here:

Wed, Aug 3, 5:43 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo closed T313734: Try to reproduce the silently failing airflow sensor error. , a subtask of T311976: Investigate why airflow sensor tasks fail without sending errors, as Resolved.
Wed, Aug 3, 5:16 PM · Data Engineering Planning, Data Pipelines, Data-Engineering-Kanban
xcollazo closed T313734: Try to reproduce the silently failing airflow sensor error. as Resolved.

Closing this as @mforns already found root cause and advised on solutions on T311976.

Wed, Aug 3, 5:16 PM · Data Engineering Planning (Sprint 02), Data Pipelines

Mon, Aug 1

xcollazo added a comment to T314181: Airflow does not send SLA emails nor update SLA misses in the db.

Thank you for looking into this @mforns. As we briefly discussed over slack, regardless of us upgrading Airflow to pickup the fix, we should also implement a log rotation mechanism since many other issues could make the logs balloon. We discussed this in today's standup, and I believe @EChetty is going to open a task for it.

Mon, Aug 1, 3:43 PM · Data Engineering Planning

Fri, Jul 29

xcollazo closed T311176: Add xcollazo to analytics-admins as Resolved.
Fri, Jul 29, 1:27 PM · SRE-Access-Requests, SRE, Data Engineering Planning
xcollazo added a comment to T311176: Add xcollazo to analytics-admins.

Confirmed sudo access:

Fri, Jul 29, 1:27 PM · SRE-Access-Requests, SRE, Data Engineering Planning
xcollazo added a comment to T311176: Add xcollazo to analytics-admins.

Thank you all for taking care of this.

Fri, Jul 29, 1:22 PM · SRE-Access-Requests, SRE, Data Engineering Planning

Thu, Jul 28

xcollazo closed T313734: Try to reproduce the silently failing airflow sensor error. , a subtask of T311976: Investigate why airflow sensor tasks fail without sending errors, as Resolved.
Thu, Jul 28, 7:16 PM · Data Engineering Planning, Data Pipelines, Data-Engineering-Kanban
xcollazo closed T313734: Try to reproduce the silently failing airflow sensor error. as Resolved.

Blocked until T311176 is resolved (lack of privileges).

Thu, Jul 28, 7:16 PM · Data Engineering Planning (Sprint 02), Data Pipelines
xcollazo reopened T311176: Add xcollazo to analytics-admins as "Open".

Re-opening this ticket. As part of the work I am doing to fix T311976, I now need to be able to sudo into the analytics user. @mforns informs be that this is part of the analytics-admins group.

Thu, Jul 28, 4:34 PM · SRE-Access-Requests, SRE, Data Engineering Planning
xcollazo changed the status of T313734: Try to reproduce the silently failing airflow sensor error. from Open to In Progress.
Thu, Jul 28, 3:47 PM · Data Engineering Planning (Sprint 02), Data Pipelines
xcollazo changed the status of T313734: Try to reproduce the silently failing airflow sensor error. , a subtask of T311976: Investigate why airflow sensor tasks fail without sending errors, from Open to In Progress.
Thu, Jul 28, 3:47 PM · Data Engineering Planning, Data Pipelines, Data-Engineering-Kanban

Wed, Jul 27

xcollazo added a comment to T311417: Migration of Image Suggestion Job to DE Data Pipeline.

Fixed a sensor correctness issue via https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/108.

Wed, Jul 27, 7:44 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T313734: Try to reproduce the silently failing airflow sensor error. .

@EChetty can you add to the description, or point me to the right person to ask for details? Thanks!

Wed, Jul 27, 4:32 PM · Data Engineering Planning (Sprint 02), Data Pipelines
xcollazo added a project to T311417: Migration of Image Suggestion Job to DE Data Pipeline: Data Pipelines.
Wed, Jul 27, 4:29 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T312858: New airflow instance related to Image Suggestion Jobs.

(not confident about the patch above, but still wanted to have something for review.)

Wed, Jul 27, 1:27 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review

Tue, Jul 26

xcollazo changed the status of T312858: New airflow instance related to Image Suggestion Jobs, a subtask of T311417: Migration of Image Suggestion Job to DE Data Pipeline, from Open to Stalled.
Tue, Jul 26, 8:12 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo changed the status of T312858: New airflow instance related to Image Suggestion Jobs from Open to Stalled.

While in the midst of following instructions to make the puppet changes for https://wikitech.wikimedia.org/wiki/Analytics/Systems/Airflow#Create_a_scap_deployment_source, I hit a wall. It seems @Ottomata had set it up so that converting the current platform_eng Airflow instance would be a simple config change as seen here: https://github.com/wikimedia/puppet/blob/production/hieradata/role/common/analytics_cluster/airflow/platform_eng.yaml#L53-L57. However, since we have the prod run of the image_suggestions dag on the original server, going forward with this I believe will nuke it.

Tue, Jul 26, 8:12 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo added a comment to T312858: New airflow instance related to Image Suggestion Jobs.

Put together what I think the correct scap configuration is at https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags-scap-platform_eng.

Tue, Jul 26, 6:26 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo updated subscribers of T312858: New airflow instance related to Image Suggestion Jobs.

Synced up with @mforns on this task. We will attempt to move it forward as much as we can until we get an SRE to help.

Tue, Jul 26, 4:08 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo updated the task description for T312858: New airflow instance related to Image Suggestion Jobs.
Tue, Jul 26, 2:37 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review

Mon, Jul 25

xcollazo added a comment to T311860: Set up the section topics data pipeline Spark code base.

@mfossati, here are a couple pointers:

Mon, Jul 25, 6:54 PM · Structured-Data-Backlog (Current Work), Section-Topics
xcollazo closed T313410: [SPIKE] Investigate and Design Solution for Data Pipelines to Create Diffs, a subtask of T312798: Section Topics - Data Platform Tasks, as Resolved.
Mon, Jul 25, 5:03 PM · Epic, Generated Data Platform
xcollazo closed T313410: [SPIKE] Investigate and Design Solution for Data Pipelines to Create Diffs as Resolved.

After today's architectural discussion, we decided that we will not be pursuing making the data available in MariaDB for now. We can still do the diffs in the future with the approach above. For now, closing.

Mon, Jul 25, 5:03 PM · Spike, Generated Data Platform
xcollazo added a comment to T312900: [M] Design database model for section topics pipeline.

Following today's architecture discussion, we will not be pursuing making the data available in MariaDB to be consumed publicly for now. But if we want to keep that option open, let's make sure that we identify a primary key for all tables in Hive (that is, a column or set of columns that make the row unique).

Mon, Jul 25, 5:01 PM · Structured-Data-Backlog (Current Work), Generated Data Platform, Section-Topics
xcollazo added a comment to T312858: New airflow instance related to Image Suggestion Jobs.

Some more context:

Mon, Jul 25, 3:10 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review
xcollazo changed the status of T313410: [SPIKE] Investigate and Design Solution for Data Pipelines to Create Diffs, a subtask of T312798: Section Topics - Data Platform Tasks, from Open to In Progress.
Mon, Jul 25, 2:54 PM · Epic, Generated Data Platform
xcollazo changed the status of T313410: [SPIKE] Investigate and Design Solution for Data Pipelines to Create Diffs from Open to In Progress.

Though about this a bit last Friday. I think it will be straightforward as long as we can identify a primary key for all the tables that we want to sync up.

Mon, Jul 25, 2:54 PM · Spike, Generated Data Platform
xcollazo added a project to T312858: New airflow instance related to Image Suggestion Jobs: Data Pipelines.
Mon, Jul 25, 2:17 PM · Data Pipelines (Sprint 00), Data Engineering Planning, Patch-For-Review

Wed, Jul 20

xcollazo updated the task description for T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Wed, Jul 20, 2:23 PM · Data Pipelines (Sprint 00), Data Engineering Planning

Jul 19 2022

xcollazo added a comment to T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster.

We're thinking of doing deltas for section topics, so we'd need to be able to do those too

@Cparle If you really wanted to do that step in MariaDB, you could use HiveToMariaDB to import the new data into a temp table, and then use MySqlOperator to run your INSERT/DELETE transformations against the old data.
I'd like to understand your use case, where can I go read about it?

Jul 19 2022, 3:26 PM · Section-Topics, Spike, Generated Data Platform

Jul 18 2022

xcollazo closed T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster, a subtask of T312798: Section Topics - Data Platform Tasks, as Resolved.
Jul 18 2022, 8:10 PM · Epic, Generated Data Platform
xcollazo closed T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster as Resolved.
  • We need a non-test use case for the MariaDB DBA folks to let us connect to their servers.
  • The prerequisites for making MariaDB connections to the misc-clusters both from the Airflow cluster as well as from the Hadoop cluster are there.
  • We need to provide tooling (the aforementioned HiveToMariaDB refine helper class) for moving data from Hive tables into MariaDB.
Jul 18 2022, 8:10 PM · Section-Topics, Spike, Generated Data Platform
xcollazo added a comment to T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster.

As a complement to the notes above, I also tried to make sure that connections are possible from within the Hadoop cluster. I manually ran the following, which runs a simple test in one of the Spark worker nodes:

Jul 18 2022, 8:05 PM · Section-Topics, Spike, Generated Data Platform

Jul 15 2022

xcollazo added a comment to T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster.

Some notes regarding this POC:

Jul 15 2022, 8:54 PM · Section-Topics, Spike, Generated Data Platform
xcollazo added a comment to T312968: Create MariaDB schema "image_suggestions", and grant permissions to user xcollazo.

Connection is successful:

Jul 15 2022, 1:56 PM · DBA

Jul 14 2022

xcollazo added a comment to T312968: Create MariaDB schema "image_suggestions", and grant permissions to user xcollazo.

If that's all you'd need you can just try a telnet (or netcat) to one of the proxies to the port 3306.

Sorry, I misspoke. I want my workflow system (Airflow) to connect to this MariaDB instance, as there may be helper functions we'd need to create to make this sort of connection easy for our customers, so it's not just making sure the connection is possible.

Jul 14 2022, 7:17 PM · DBA
xcollazo added a comment to T312968: Create MariaDB schema "image_suggestions", and grant permissions to user xcollazo.

What do you need to test the connection apart from that?

We want to make sure there is no connectivity/firewall issues between our Airflow and Analytics cluster to the MariaDB instances.

Jul 14 2022, 5:23 PM · DBA

Jul 13 2022

xcollazo created T312968: Create MariaDB schema "image_suggestions", and grant permissions to user xcollazo.
Jul 13 2022, 5:10 PM · DBA

Jul 12 2022

xcollazo updated the task description for T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster.
Jul 12 2022, 7:23 PM · Section-Topics, Spike, Generated Data Platform
xcollazo changed the status of T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster, a subtask of T312798: Section Topics - Data Platform Tasks, from Open to In Progress.
Jul 12 2022, 7:20 PM · Epic, Generated Data Platform
xcollazo changed the status of T312799: [SPIKE] POC Airflow Job to Read/Write Data to the misc-cluster from Open to In Progress.
Jul 12 2022, 7:20 PM · Section-Topics, Spike, Generated Data Platform
xcollazo updated subscribers of T311417: Migration of Image Suggestion Job to DE Data Pipeline.

CC @WDoranWMF.

Jul 12 2022, 1:37 PM · Data Pipelines (Sprint 00), Data Engineering Planning

Jul 11 2022

xcollazo closed T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors as Resolved.

Closing this ticket for now since there have been no reply or activity on my request upstream, and there is the yarn logs ... workaround.

Jul 11 2022, 4:39 PM · Generated Data Platform

Jul 1 2022

xcollazo updated the task description for T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Jul 1 2022, 7:29 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors.

In general, the skein project seems to be... not dead, but certainly dormant. This particular log bug PR has been open since 2020... !

Jul 1 2022, 5:34 PM · Generated Data Platform
xcollazo updated the task description for T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Jul 1 2022, 5:32 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a comment to T311417: Migration of Image Suggestion Job to DE Data Pipeline.

Making good progress here.

Jul 1 2022, 5:31 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo updated subscribers of T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors.
Jul 1 2022, 4:26 PM · Generated Data Platform
xcollazo updated subscribers of T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors.

@Ottomata @mforns , FYI.

Jul 1 2022, 4:26 PM · Generated Data Platform
xcollazo added a comment to T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors.

There is a discussion of this issue in a skein PR from 2020 ( https://github.com/jcrist/skein/pull/212 ). Unfortunately, it was never merged. The stop gap change is quite simple: https://github.com/jcrist/skein/pull/212/commits/fe906f746e0a3b8b3cb89ce61140271e02601699.

Jul 1 2022, 4:24 PM · Generated Data Platform
xcollazo added a comment to T311772: Delete account xcollazo.

Thank you @Aklapper!

Jul 1 2022, 12:43 PM · Phabricator
Aklapper renamed xcollazo from XCollazo-WMF to xcollazo.
Jul 1 2022, 8:17 AM

Jun 30 2022

xcollazo created T311772: Delete account xcollazo.
Jun 30 2022, 8:46 PM · Phabricator

Jun 29 2022

xcollazo closed T311657: Add XCollazo-WMF maintainer to generated-data-platform GitLab group as Resolved.

Thanks for creating this @gmodena. @Eevans took care of it. Closing.

Jun 29 2022, 6:58 PM · Release-Engineering-Team, GitLab
xcollazo added a comment to T311417: Migration of Image Suggestion Job to DE Data Pipeline.

( For the github repo containing the business logic, following git instructions at https://stackoverflow.com/questions/1365541/how-to-move-some-files-from-one-git-repo-to-another-not-a-clone-preserving-hi to avoid losing the commit history. )

Jun 29 2022, 5:19 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo added a project to T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors: Generated Data Platform.
Jun 29 2022, 4:17 PM · Generated Data Platform
xcollazo created T311646: Figure how to bump gRPC message size max on Skein to avoid logging errors.
Jun 29 2022, 4:16 PM · Generated Data Platform

Jun 28 2022

xcollazo added a comment to T311525: Upgrade to latest PrestoDB and enable iceberg support.

I'm worried that the Presto Iceberg connector might not have kerberos support?

Typically, you can pass these details down with a Hadoop Configuration object. Is that not the case with Presto?

Jun 28 2022, 7:15 PM · Data Engineering Planning (Sprint 01), Patch-For-Review, Data-Engineering-Kanban

Jun 27 2022

xcollazo moved T311417: Migration of Image Suggestion Job to DE Data Pipeline from Backlog to Work in Progress ⚙️ on the Generated Data Platform board.
Jun 27 2022, 2:22 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo closed T310692: Investigate Migration of Image Suggestion Job to DE Data Pipeline as Resolved.

Created T311417 to track the implementation work.

Jun 27 2022, 2:20 PM · Generated Data Platform
xcollazo created T311417: Migration of Image Suggestion Job to DE Data Pipeline.
Jun 27 2022, 2:19 PM · Data Pipelines (Sprint 00), Data Engineering Planning
xcollazo renamed T310692: Investigate Migration of Image Suggestion Job to DE Data Pipeline from [NEEDS GROOMING] Investigate Migration of Image Suggestion Job to DE Data Pipeline to Investigate Migration of Image Suggestion Job to DE Data Pipeline.
Jun 27 2022, 2:17 PM · Generated Data Platform
xcollazo updated subscribers of T310692: Investigate Migration of Image Suggestion Job to DE Data Pipeline.

I had meetings with the stakeholders for this effort.

Jun 27 2022, 2:16 PM · Generated Data Platform

Jun 22 2022

xcollazo created T311176: Add xcollazo to analytics-admins.
Jun 22 2022, 8:18 PM · SRE-Access-Requests, SRE, Data Engineering Planning
xcollazo added a comment to T311085: [Shared Event Platform] [SPIKE] Decide on page state change storing and backfill approach.

@lbowmaker shared with me the following Slack thread with @JAllemandou's rationale: https://wikimedia.slack.com/archives/C02BB8L2S5R/p1654174524991399?thread_ts=1654106678.906859&cid=C02BB8L2S5R

Jun 22 2022, 2:49 PM · Event-Platform Value Stream (Sprint 00), Data-Engineering, Epic
xcollazo added a comment to T311085: [Shared Event Platform] [SPIKE] Decide on page state change storing and backfill approach.

The page content will be too large to efficiently store and query as parquet, so needs a special case to be stored in avro

Jun 22 2022, 1:10 PM · Event-Platform Value Stream (Sprint 00), Data-Engineering, Epic

Jun 13 2022

xcollazo created T310555: Requesting access to Analytics for xcollazo.
Jun 13 2022, 9:49 PM · SRE, SRE-Access-Requests