Page MenuHomePhabricator

JEbe-WMF (Jennifer)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Jan 16 2023, 7:16 PM (45 w, 2 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
JEbe-WMF [ Global Accounts ]

Recent Activity

Today

JEbe-WMF moved T346278: Implement an Airflow job that runs and publishes the XML dumps from Sprint Backlog to In Process on the Data Products (Data Product Sprint 04) board.
Thu, Nov 30, 9:43 AM · Data Products (Data Product Sprint 04), Dumps 2.0
JEbe-WMF moved T347958: Geo Analytics: Add project validation to manage the 'invalid characters' error (400 Bad Request) from In Process to Code Review / Tech Input on the Data Products (Data Product Sprint 04) board.
Thu, Nov 30, 9:43 AM · Patch-For-Review, Data Products (Data Product Sprint 04), AQS2.0
JEbe-WMF edited projects for T346278: Implement an Airflow job that runs and publishes the XML dumps, added: Data Products (Data Product Sprint 04); removed Data Products.
Thu, Nov 30, 8:24 AM · Data Products (Data Product Sprint 04), Dumps 2.0
JEbe-WMF updated Other Assignee for T346278: Implement an Airflow job that runs and publishes the XML dumps, added: JEbe-WMF.
Thu, Nov 30, 8:23 AM · Data Products (Data Product Sprint 04), Dumps 2.0
JEbe-WMF closed T347524: Add integration tests to the PySpark jobs, a subtask of T330296: Make MediaWiki XML content dump available for external consumption, as Resolved.
Thu, Nov 30, 8:22 AM · Data Products (Epics Timeline), Data Pipelines, Epic
JEbe-WMF closed T347524: Add integration tests to the PySpark jobs as Resolved.
Thu, Nov 30, 8:22 AM · Data Products (Data Product Sprint 04), Dumps 2.0

Wed, Nov 22

JEbe-WMF moved T347958: Geo Analytics: Add project validation to manage the 'invalid characters' error (400 Bad Request) from Sprint Backlog to In Process on the Data Products (Data Product Sprint 04) board.
Wed, Nov 22, 1:19 PM · Patch-For-Review, Data Products (Data Product Sprint 04), AQS2.0
JEbe-WMF moved T347961: Geo Analytics: Remove indentation from response (minify response) from Sprint Backlog to Code Review / Tech Input on the Data Products (Data Product Sprint 04) board.
Wed, Nov 22, 1:13 PM · Data Products (Data Product Sprint 04), AQS2.0
JEbe-WMF claimed T347961: Geo Analytics: Remove indentation from response (minify response).
Wed, Nov 22, 12:23 PM · Data Products (Data Product Sprint 04), AQS2.0

Mon, Nov 20

JEbe-WMF moved T347524: Add integration tests to the PySpark jobs from Code Review / Tech Input to Done on the Data Products (Data Product Sprint 04) board.
Mon, Nov 20, 9:15 AM · Data Products (Data Product Sprint 04), Dumps 2.0

Wed, Nov 15

JEbe-WMF moved T347524: Add integration tests to the PySpark jobs from Ready for Code Review to In code review / Tech Input on the Data Products (Data Products (Sprint 03)) board.
Wed, Nov 15, 1:02 PM · Data Products (Data Product Sprint 04), Dumps 2.0

Thu, Nov 9

JEbe-WMF moved T347524: Add integration tests to the PySpark jobs from In Process to Ready for Code Review on the Data Products (Data Products (Sprint 03)) board.
Thu, Nov 9, 12:56 PM · Data Products (Data Product Sprint 04), Dumps 2.0

Thu, Nov 2

JEbe-WMF moved T347524: Add integration tests to the PySpark jobs from In Process to In code review / Tech Input on the Data Products (Data Products (Sprint 03)) board.
Thu, Nov 2, 8:31 AM · Data Products (Data Product Sprint 04), Dumps 2.0

Oct 12 2023

JEbe-WMF set the point value for T347524: Add integration tests to the PySpark jobs to 5.
Oct 12 2023, 12:25 PM · Data Products (Data Product Sprint 04), Dumps 2.0
JEbe-WMF moved T348567: Estimate Sprint 02 tasks - Jennifer from Sprint Backlog to In Process on the Data Products (Sprint 02) board.
Oct 12 2023, 12:24 PM · Data Products (Sprint 02)
JEbe-WMF moved T336415: Editor analytics service: Configure routing in staging and production from In Process to Paused on the Data Products (Sprint 02) board.
Oct 12 2023, 12:20 PM · Data Products (Sprint 02)
JEbe-WMF moved T346287: [Javascript] Create Metrics Platform API for Submitting Core Interaction Events from Ready for Code Review to In code review / Tech Input on the Data Products (Sprint 02) board.
Oct 12 2023, 12:16 PM · Data Products (Sprint 02)
JEbe-WMF moved T348635: Async Q2 planning contributions - Santi from Sign Off to Done on the Data Products (Sprint 02) board.
Oct 12 2023, 12:09 PM · Data Products (Sprint 02)
JEbe-WMF moved T346300: [Spike] Surbhi's review of Knowledge Gaps for porting to AQS 2.0 from Sign Off to Done on the Data Products (Sprint 02) board.
Oct 12 2023, 12:08 PM · Data Products (Sprint 02)
JEbe-WMF moved T344867: [SPIKE] Commons Impact Metrics preliminary technical review from Sign Off to Done on the Data Products (Sprint 02) board.
Oct 12 2023, 12:07 PM · Data Products (Sprint 02)
JEbe-WMF moved T327840: AQS 2.0: Consider mediawiki_history_reduced snapshot handling from Sign Off to Done on the Data Products (Sprint 02) board.
Oct 12 2023, 12:06 PM · Data Products (Data Products (Sprint 03)), Spike, AQS2.0
JEbe-WMF moved T348639: Async Q2 planning contributions - Jennifer from Sprint Backlog to In Process on the Data Products (Sprint 02) board.
Oct 12 2023, 11:15 AM · Data Products (Sprint 02)

Oct 4 2023

JEbe-WMF added a comment to T347524: Add integration tests to the PySpark jobs.
  • Step 1: Figure out test are useful for pyspark jobs
  • Step 2: Decide on acceptable parameters for test
  • Step 3: Create/script tests on local
  • Step 4: Move test to prod- gitlab
Oct 4 2023, 4:45 PM · Data Products (Data Product Sprint 04), Dumps 2.0
JEbe-WMF claimed T347524: Add integration tests to the PySpark jobs.
Oct 4 2023, 4:41 PM · Data Products (Data Product Sprint 04), Dumps 2.0

Oct 3 2023

JEbe-WMF moved T346279: [Spike] Figure out what are good indicators for dumps data quality from Wormhole To Sprint 02 to In code review / Tech Input on the Data Products (Sprint 02) board.
Oct 3 2023, 1:44 PM · Data Products (Sprint 02), Dumps 2.0
JEbe-WMF moved T346279: [Spike] Figure out what are good indicators for dumps data quality from In code review / Tech Input to Wormhole To Sprint 02 on the Data Products (Sprint 02) board.
Oct 3 2023, 1:43 PM · Data Products (Sprint 02), Dumps 2.0

Oct 2 2023

JEbe-WMF moved T346279: [Spike] Figure out what are good indicators for dumps data quality from In Process to In code review / Tech Input on the Data Products (Sprint 01) board.
Oct 2 2023, 5:50 PM · Data Products (Sprint 02), Dumps 2.0
JEbe-WMF added a comment to T346279: [Spike] Figure out what are good indicators for dumps data quality.

XML Schema Validation(Dan is already doing this using IntelliJ):

  • If your XML files adhere to a predefined XML schema (XSD), you can validate them against the schema to identify structural differences.
  • Any non-conformance with the schema will be flagged as a difference.

Size and Visual Comparison of the XML:

  • Open the two XML files in a text editor or XML viewer that supports syntax highlighting for easier readability.
  • Manually review the size of the files side by side.

Size and Visual Random Spot Comparison of the tables in HDFS:

  • Using Diff or Minus to compare the Hive table (Mediawiki wikitext history) and the iceberg table (wikitext_raw_rc1)
  • Manually review the size of same partitions(if exist)

Stream Parsing:

  • Comparing/parsing both files in streaming XML Processes
Oct 2 2023, 5:50 PM · Data Products (Sprint 02), Dumps 2.0

Sep 13 2023

JEbe-WMF moved T344698: Change revision visibility code to conform to the new schema from In Process to Done on the Data Products (Sprint 00) board.
Sep 13 2023, 8:39 AM · Data Products (Sprint 00)
JEbe-WMF moved T340880: Merge visibility changes into hourly target table from In Process to Done on the Data Products (Sprint 00) board.
Sep 13 2023, 8:38 AM · Data Products (Sprint 00)

Sep 12 2023

JEbe-WMF moved T336744: Harmonize tags across Airflow dags from In code review / Tech Input to Done on the Data Products (Sprint 00) board.
Sep 12 2023, 1:41 PM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)

Sep 6 2023

JEbe-WMF moved T343325: Develop Dumps Triage Runbook from In Process to Sign Off on the Data Products (Sprint 00) board.
Sep 6 2023, 3:34 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering

Sep 5 2023

JEbe-WMF moved T340880: Merge visibility changes into hourly target table from BLOCKED to In Process on the Data Products (Sprint 00) board.
Sep 5 2023, 12:22 PM · Data Products (Sprint 00)
JEbe-WMF moved T344698: Change revision visibility code to conform to the new schema from Sprint Backlog to In Process on the Data Products (Sprint 00) board.
Sep 5 2023, 12:03 PM · Data Products (Sprint 00)
JEbe-WMF moved T336744: Harmonize tags across Airflow dags from Ready for Code Review to In code review / Tech Input on the Data Products (Sprint 00) board.
Sep 5 2023, 11:01 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
JEbe-WMF moved T336744: Harmonize tags across Airflow dags from In Process to Ready for Code Review on the Data Products (Sprint 00) board.
Sep 5 2023, 11:01 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)

Sep 4 2023

JEbe-WMF moved T336744: Harmonize tags across Airflow dags from Sprint Backlog to In Process on the Data Products (Sprint 00) board.
Sep 4 2023, 9:16 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
JEbe-WMF moved T336744: Harmonize tags across Airflow dags from Incoming to Sprint 00 on the Data Products board.
Sep 4 2023, 9:16 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)
JEbe-WMF added a project to T336744: Harmonize tags across Airflow dags: Data Products.
Sep 4 2023, 9:15 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)

Aug 22 2023

JEbe-WMF claimed T344698: Change revision visibility code to conform to the new schema.
Aug 22 2023, 1:36 PM · Data Products (Sprint 00)
JEbe-WMF created T344698: Change revision visibility code to conform to the new schema.
Aug 22 2023, 1:36 PM · Data Products (Sprint 00)
JEbe-WMF added a comment to T343325: Develop Dumps Triage Runbook.

In general, I think the document includes great knowledge to have as a newcomer to Dumps 1.0.

From a runbook perspective, I think we should include a listing of common issues and resolutions. As it stands today, there seems to only be one such listings in the document. Granted, as we are all mostly Dumps newbies here, we would need to actually hit those issues in order to solve them and properly document them. So I guess we need time and experience to add more?

Aug 22 2023, 12:51 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering

Aug 7 2023

JEbe-WMF moved T343325: Develop Dumps Triage Runbook from In Progress to Ready for Code Review/ Ready for Tech input on the Data Products (Sprint 0) board.
Aug 7 2023, 9:16 AM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
JEbe-WMF moved T340880: Merge visibility changes into hourly target table from In Progress to Ready for Code Review/ Ready for Tech input on the Data Products (Sprint 0) board.
Aug 7 2023, 9:16 AM · Data Products (Sprint 00)

Aug 2 2023

JEbe-WMF added a project to T343325: Develop Dumps Triage Runbook: Dumps 2.0.
Aug 2 2023, 3:00 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
JEbe-WMF added a project to T343325: Develop Dumps Triage Runbook: Data-Engineering.
Aug 2 2023, 3:00 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
JEbe-WMF claimed T343325: Develop Dumps Triage Runbook.
Aug 2 2023, 1:08 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
JEbe-WMF created T343328: Develop Dumps Triage Runbook.
Aug 2 2023, 1:02 PM · Data Products (Sprint 0)
JEbe-WMF created T343325: Develop Dumps Triage Runbook.
Aug 2 2023, 1:01 PM · Data Products (Sprint 00), Dumps 2.0, Data-Engineering
JEbe-WMF created T343324: Develop Dumps Triage Runbook.
Aug 2 2023, 1:00 PM · Data Products (Sprint 0)

Jul 25 2023

JEbe-WMF moved T340880: Merge visibility changes into hourly target table from Next Up to In Progress on the Data Products (Sprint 0) board.
Jul 25 2023, 2:06 PM · Data Products (Sprint 00)

Jul 24 2023

JEbe-WMF added a comment to T341559: Deployment Training Request for jebe.

7am UTC on Thursday (2023-07-27) works for me.

Jul 24 2023, 8:22 AM · Release-Engineering-Team (Deployment Training Requests)

Jul 19 2023

JEbe-WMF edited projects for T340880: Merge visibility changes into hourly target table, added: Data Products; removed Dumps 2.0.
Jul 19 2023, 2:01 PM · Data Products (Sprint 00)
JEbe-WMF added a project to T340880: Merge visibility changes into hourly target table: Dumps 2.0.
Jul 19 2023, 1:37 PM · Data Products (Sprint 00)
JEbe-WMF moved T336744: Harmonize tags across Airflow dags from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 19 2023, 12:21 PM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)

Jul 14 2023

JEbe-WMF updated the task description for T341559: Deployment Training Request for jebe.
Jul 14 2023, 12:55 PM · Release-Engineering-Team (Deployment Training Requests)

Jul 12 2023

JEbe-WMF added a member for Dumps-Generation: JEbe-WMF.
Jul 12 2023, 12:09 PM

Jul 11 2023

JEbe-WMF added a watcher for Dumps-Generation: JEbe-WMF.
Jul 11 2023, 1:14 PM
JEbe-WMF created T341559: Deployment Training Request for jebe.
Jul 11 2023, 9:28 AM · Release-Engineering-Team (Deployment Training Requests)
JEbe-WMF moved T340880: Merge visibility changes into hourly target table from In progress to In Review on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 11 2023, 9:27 AM · Data Products (Sprint 00)
JEbe-WMF moved T340880: Merge visibility changes into hourly target table from Next Up to In progress on the Data Engineering and Event Platform Team (Sprint 0) board.
Jul 11 2023, 9:27 AM · Data Products (Sprint 00)
JEbe-WMF created T341557: Grant Access to wmf for Jennifer Ebe.
Jul 11 2023, 9:24 AM · SRE, LDAP-Access-Requests

Jul 6 2023

JEbe-WMF added a comment to T341045: Get Data Engineering folks access to hosts and systems needed for maintenance of the existing dumps system.

@ArielGlenn i used my wikimedia email

Jul 6 2023, 2:07 PM · Data-Engineering, Dumps-Generation

Jun 30 2023

JEbe-WMF added a comment to T336744: Harmonize tags across Airflow dags.

The data engineering team had a meeting and the conclusion was capture tags based on
*Frequency,
*Ownership,
*Criticality,
*Requires a certain table e ie Webrequest
*Destination of data source ie Iceberg, Hive

  • Remove tags that do not meet this Criteria
Jun 30 2023, 9:53 AM · Data Engineering and Event Platform Team (Sprint 2), Data Products (Sprint 00), Data Pipelines (Sprint 14)

May 8 2023

JEbe-WMF moved T330202: mediawiki-history-reduced job migration from In Progress to In Review on the Data Pipelines (Sprint 12) board.
May 8 2023, 4:03 PM · Data Pipelines (Sprint 14)

Apr 13 2023

JEbe-WMF claimed T330202: mediawiki-history-reduced job migration.
Apr 13 2023, 1:21 PM · Data Pipelines (Sprint 14)

Mar 27 2023

JEbe-WMF moved T330199: Migrate virtual page view from Oozie to Airflow from Next Up to In Progress on the Data Pipelines (sprint 10) board.
Mar 27 2023, 4:23 PM · Data Pipelines (sprint 10)

Mar 16 2023

JEbe-WMF placed T330199: Migrate virtual page view from Oozie to Airflow up for grabs.
Mar 16 2023, 3:02 PM · Data Pipelines (sprint 10)
JEbe-WMF moved T305842: Migrate the referrer job from Next Up to In Progress on the Data Pipelines (sprint 10) board.
Mar 16 2023, 3:02 PM · Data Pipelines (sprint 10)
JEbe-WMF claimed T305842: Migrate the referrer job.
Mar 16 2023, 3:01 PM · Data Pipelines (sprint 10)
JEbe-WMF claimed T330199: Migrate virtual page view from Oozie to Airflow.
Mar 16 2023, 3:01 PM · Data Pipelines (sprint 10)
JEbe-WMF placed T330199: Migrate virtual page view from Oozie to Airflow up for grabs.
Mar 16 2023, 3:01 PM · Data Pipelines (sprint 10)
JEbe-WMF moved T330199: Migrate virtual page view from Oozie to Airflow from In Progress to Next Up on the Data Pipelines (sprint 10) board.
Mar 16 2023, 3:01 PM · Data Pipelines (sprint 10)
JEbe-WMF moved T330199: Migrate virtual page view from Oozie to Airflow from Next Up to In Progress on the Data Pipelines (sprint 10) board.
Mar 16 2023, 2:28 PM · Data Pipelines (sprint 10)
JEbe-WMF claimed T330199: Migrate virtual page view from Oozie to Airflow.
Mar 16 2023, 2:28 PM · Data Pipelines (sprint 10)

Mar 1 2023

JEbe-WMF moved T330206: [Airflow] Migrate mediacounts load Oozie job from Next Up to In Progress on the Data Pipelines (Sprint 11) board.
Mar 1 2023, 5:04 PM · Data Pipelines (sprint 10)
JEbe-WMF claimed T330206: [Airflow] Migrate mediacounts load Oozie job.
Mar 1 2023, 3:11 PM · Data Pipelines (sprint 10)

Feb 9 2023

JEbe-WMF added a comment to T327458: Document Traffic Datasets in Datahub.

Documented the following datasets and added wikitech links where applicable

  • uniques devices
  • banner-activity (druid)
  • mobile_apps_session_metrics_by_os
Feb 9 2023, 3:19 PM · Data Pipelines (Sprint 11), Data-Catalog

Feb 8 2023

JEbe-WMF added a comment to T327458: Document Traffic Datasets in Datahub.

I have documented the following datasets and I am awaiting feedback.

  • mediawiki_api_request
  • mobile apps session metrics
  • mobile apps uniques
Feb 8 2023, 5:02 PM · Data Pipelines (Sprint 11), Data-Catalog

Jan 23 2023

JEbe-WMF updated the task description for T327406: Requesting access to Data Engineering team resources for Jennifer Ebe.
Jan 23 2023, 10:23 AM · Data-Engineering, SRE, SRE-Access-Requests
JEbe-WMF updated the task description for T327406: Requesting access to Data Engineering team resources for Jennifer Ebe.
Jan 23 2023, 10:17 AM · Data-Engineering, SRE, SRE-Access-Requests

Jan 20 2023

JEbe-WMF added a comment to T327255: Grant Access to wmf and ops for Jennifer Ebe.

[ ... ]

I am not exactly certain. Because I am new, I am not sure what I need and don't but it is stated on my onboarding task to be added the OPS group.

Do feel free to get my manager's approval. cc @odimitrijevic

I went ahead and added you to the wmf group for now. (along with the WMF-NDA Phabricator group). If you do need ops membership, then feel free to re-open this issue (or better yet, @odimitrijevic can do so with the approval).

Welcome aboard!

Jan 20 2023, 7:12 AM · SRE, LDAP-Access-Requests

Jan 19 2023

JEbe-WMF updated subscribers of T327406: Requesting access to Data Engineering team resources for Jennifer Ebe.

I would also be requiring access to a Kerberos principal. cc @odimitrijevic @Snwachukwu

Jan 19 2023, 2:50 PM · Data-Engineering, SRE, SRE-Access-Requests
JEbe-WMF created T327406: Requesting access to Data Engineering team resources for Jennifer Ebe.
Jan 19 2023, 2:48 PM · Data-Engineering, SRE, SRE-Access-Requests
JEbe-WMF updated subscribers of T327255: Grant Access to wmf and ops for Jennifer Ebe.

Hi Jennifer,

The specific LDAP group that you want to be added to (optional): WMF and OPS

In order to add you to group ops, I would need your managers approval; Are you sure you need ops? Gauging from the other Data Engineers on your team, I think group wmf might suffice.

Jan 19 2023, 8:09 AM · SRE, LDAP-Access-Requests

Jan 18 2023

JEbe-WMF added a comment to T327255: Grant Access to wmf and ops for Jennifer Ebe.

Ticket grant Jennifer Ebe LDAP access for Onboarding.

Jan 18 2023, 11:43 AM · SRE, LDAP-Access-Requests
JEbe-WMF updated the task description for T327255: Grant Access to wmf and ops for Jennifer Ebe.
Jan 18 2023, 11:42 AM · SRE, LDAP-Access-Requests
JEbe-WMF updated the task description for T327255: Grant Access to wmf and ops for Jennifer Ebe.
Jan 18 2023, 11:40 AM · SRE, LDAP-Access-Requests
JEbe-WMF created T327255: Grant Access to wmf and ops for Jennifer Ebe.
Jan 18 2023, 11:40 AM · SRE, LDAP-Access-Requests