Page MenuHomePhabricator

nshahquinn-wmf (Neil Shah-Quinn)
senior data scientist, Product Analytics, Wikimedia Foundation

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2015, 4:17 PM (406 w, 2 d)
Availability
Available
IRC Nick
nshahquinn
LDAP User
Neil P. Quinn-WMF
MediaWiki User
Neil Shah-Quinn (WMF) [ Global Accounts ]

Recent Activity

Fri, Jan 27

nshahquinn-wmf renamed T327983: Wmfdata-Python's CSV loading cannot handle standard quoted CSV values from Wmfdata-Python's CSV loading cannot handle standard quoted CSV fields to Wmfdata-Python's CSV loading cannot handle standard quoted CSV values.
Fri, Jan 27, 12:26 AM · Wmfdata-Python, Data-Engineering, Product-Analytics
nshahquinn-wmf moved T323427: Plan success metrics for the incident reporting system from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Fri, Jan 27, 12:04 AM · Incident-Reporting-System, Product-Analytics (Kanban)

Thu, Jan 26

nshahquinn-wmf renamed T324135: Wmfdata-Python triggers a Pandas warning during mariadb.run and hive.run from Wmfdata-Python triggers a Pandas warning during mariadb.run to Wmfdata-Python triggers a Pandas warning during mariadb.run and hive.run.
Thu, Jan 26, 11:41 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf moved T327221: Update the wiki comparison tool (2022) from Doing to Next 2 weeks on the Product-Analytics (Kanban) board.

Cool! Moving to "next 2 weeks" since I will have to wait a week or so for the Jan 2023 mediawiki_history snapshot.

Thu, Jan 26, 11:40 PM · Product-Analytics (Kanban)
nshahquinn-wmf renamed T315024: Creating a Spark session causes a torrent of log spam from PySpark warning messages to Creating a Spark session causes a torrent of log spam.
Thu, Jan 26, 11:38 PM · Data Pipelines, Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf added a comment to T327221: Update the wiki comparison tool (2022).

I think the neatest way to deal with the pageview data loss is to wait a few more days until the start of February. Then we can do a Jan 2023 snapshot which will look at data going back to Feb 2022. The data loss ended on 27 Jan, so this will avoid it entirely without the need for any special casing.

Thu, Jan 26, 9:42 PM · Product-Analytics (Kanban)
nshahquinn-wmf moved T324995: Include EU Registered Country in the canonical country database from Doing to Blocked on the Product-Analytics (Kanban) board.
Thu, Jan 26, 9:38 PM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning
nshahquinn-wmf added a comment to T317171: Most common Wikipedia Preview clickthrough path on computers not instrumented .

@nshahquinn-wmf are you suggesting that we completely replace wppw1 and wppw1t with wppw2 and wppw2t everywhere or we just use wppw2 for that special case where we alter the url of the link on the page?

Thu, Jan 26, 7:51 PM · Inuka-Team (Kanban), Wikipedia-Preview
nshahquinn-wmf added a comment to T327221: Update the wiki comparison tool (2022).

Hmm, no, on reflection, I don't think it's worth making these improvements now. I'm planning to just update it as-is.

Thu, Jan 26, 7:11 PM · Product-Analytics (Kanban)
nshahquinn-wmf added a comment to T315024: Creating a Spark session causes a torrent of log spam.

@Mayakp.wiki oh, we already have a task for that: T324135. I think that's useful since the source and potential responses to the warnings are different.

Thu, Jan 26, 6:49 PM · Data Pipelines, Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf added a comment to T327221: Update the wiki comparison tool (2022).

Just updated the canonical wiki dataset, which added 4 new wikis.

Thu, Jan 26, 3:42 AM · Product-Analytics (Kanban)
nshahquinn-wmf moved T327221: Update the wiki comparison tool (2022) from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Thu, Jan 26, 3:42 AM · Product-Analytics (Kanban)
nshahquinn-wmf added a comment to T327221: Update the wiki comparison tool (2022).

Some improvements I could potentially make in this round:

  • Fix the content page count to be based on AQS or mediawiki_history so it's actually the value at the snapshot time rather than at query time.
  • Add external referrer pageviews proportion
  • Add Global South traffic percentage
  • Add monthly new content pages
  • Add whether the project uses language variants
Thu, Jan 26, 3:32 AM · Product-Analytics (Kanban)
nshahquinn-wmf updated the task description for T327983: Wmfdata-Python's CSV loading cannot handle standard quoted CSV values.
Thu, Jan 26, 3:09 AM · Wmfdata-Python, Data-Engineering, Product-Analytics
nshahquinn-wmf created T327983: Wmfdata-Python's CSV loading cannot handle standard quoted CSV values.
Thu, Jan 26, 3:07 AM · Wmfdata-Python, Data-Engineering, Product-Analytics

Wed, Jan 25

nshahquinn-wmf added a comment to T317171: Most common Wikipedia Preview clickthrough path on computers not instrumented .

@SBisson I'm confident it doesn't need approval. It's just a minor tweak to our instrumentation and doesn't change the scope of our data collection.

Wed, Jan 25, 10:57 PM · Inuka-Team (Kanban), Wikipedia-Preview
nshahquinn-wmf added a comment to T327221: Update the wiki comparison tool (2022).

Some improvements I could potentially make in this round:

  • Fix the content page count to be based on AQS or mediawiki_history so it's actually the value at the snapshot time rather than at query time.
  • Add external referrer pageviews proportion
  • Add Global South traffic percentage
  • Add whether the project uses language variants
Wed, Jan 25, 12:51 AM · Product-Analytics (Kanban)

Tue, Jan 24

nshahquinn-wmf updated the task description for T317171: Most common Wikipedia Preview clickthrough path on computers not instrumented .
Tue, Jan 24, 11:04 PM · Inuka-Team (Kanban), Wikipedia-Preview
nshahquinn-wmf updated the task description for T317171: Most common Wikipedia Preview clickthrough path on computers not instrumented .
Tue, Jan 24, 11:02 PM · Inuka-Team (Kanban), Wikipedia-Preview
nshahquinn-wmf closed T324726: Prepare metrics and product learnings presentation for Inuka offsite as Resolved.
Tue, Jan 24, 9:43 PM · Product-Analytics (Kanban), Inuka-Team
nshahquinn-wmf placed T324995: Include EU Registered Country in the canonical country database up for grabs.

The PR to review is here: https://github.com/wikimedia-research/canonical-data/pull/3.

Tue, Jan 24, 12:08 AM · Product-Analytics (Kanban), Data Pipelines (Sprint 07), Data-Engineering-Planning

Wed, Jan 11

nshahquinn-wmf updated the task description for T316972: Expand visibility into Wikipedia Preview.
Wed, Jan 11, 3:06 AM · Product-Analytics, Wikipedia-Preview, Inuka-Team

Dec 23 2022

nshahquinn-wmf moved T316972: Expand visibility into Wikipedia Preview from Kanban to Upcoming Quarter on the Product-Analytics board.
Dec 23 2022, 1:18 AM · Product-Analytics, Wikipedia-Preview, Inuka-Team
nshahquinn-wmf closed T316971: Review literature and collect open questions on community conflict as Resolved.

This review led me to focus on the idea of detecting interpersonal conflict on-wiki by looking at signals such as mutual reverts. If successful, that could lead to applications like:

  • quantifying the incidence of on-wiki conflict for use in high-level metrics and comparison with the number of conflict reports
  • early detection and automated alerting of user conflict.
Dec 23 2022, 1:18 AM · Trust and Safety Tools Team Backlog, Product-Analytics (Kanban)
nshahquinn-wmf placed T272220: Remove "master" terminology from wmfdata-python up for grabs.

This is stalled and it's not clear when we'll be able to finish it, let alone who will do it at that point.

Dec 23 2022, 1:08 AM · Product-Analytics, Data-Engineering, Wmfdata-Python
nshahquinn-wmf placed T292479: wmfdata.mariadb relies on analytics-mysql being available up for grabs.

This shouldn't be assigned to me; I've never had a concrete plan to work on it.

Dec 23 2022, 1:08 AM · Data-Engineering, Product-Analytics, Analytics-Kanban, Wmfdata-Python
nshahquinn-wmf closed T316727: Create Wikistories ETL job, a subtask of T314594: Create initial Wikistories dashboard, as Declined.
Dec 23 2022, 1:02 AM · Product-Analytics, Inuka-Team, Wikistories
nshahquinn-wmf closed T316727: Create Wikistories ETL job as Declined.
Dec 23 2022, 1:02 AM · Product-Analytics, Inuka-Team, Wikistories
nshahquinn-wmf added a comment to T316727: Create Wikistories ETL job.

Apart from being blocked on T316049, Wikistories is still in its infancy, so we should avoid making a major investment like an ETL job. For now, I am manually running and sharing metrics using the Wikistories dashboard.

Dec 23 2022, 1:01 AM · Product-Analytics, Inuka-Team, Wikistories
nshahquinn-wmf updated the task description for T316972: Expand visibility into Wikipedia Preview.
Dec 23 2022, 12:40 AM · Product-Analytics, Wikipedia-Preview, Inuka-Team
nshahquinn-wmf moved T324726: Prepare metrics and product learnings presentation for Inuka offsite from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Dec 23 2022, 12:38 AM · Product-Analytics (Kanban), Inuka-Team
nshahquinn-wmf moved T316971: Review literature and collect open questions on community conflict from Doing to Next 2 weeks on the Product-Analytics (Kanban) board.
Dec 23 2022, 12:38 AM · Trust and Safety Tools Team Backlog, Product-Analytics (Kanban)
nshahquinn-wmf moved T323427: Plan success metrics for the incident reporting system from Doing to Next 2 weeks on the Product-Analytics (Kanban) board.
Dec 23 2022, 12:38 AM · Incident-Reporting-System, Product-Analytics (Kanban)
nshahquinn-wmf closed T314594: Create initial Wikistories dashboard as Resolved.

I've consolidated my ad-hoc reporting into a spreadsheet dashboard [WMF only]. I think that's enough to count as an initial dashboard!

Dec 23 2022, 12:36 AM · Product-Analytics, Inuka-Team, Wikistories
nshahquinn-wmf awarded T309769: Expanding External Referrer Tracking a 100 token.
Dec 23 2022, 12:15 AM · Data Pipelines (Sprint 07), Patch-For-Review, Metrics-Platform-Planning, Foundational Technology Requests
nshahquinn-wmf added a project to T325838: Publish Edit Check measurement plan: Product-Analytics.
Dec 23 2022, 12:14 AM · EditCheck, Product-Analytics (Kanban), Editing-team (Kanban Board), VisualEditor

Dec 22 2022

nshahquinn-wmf added a comment to T323427: Plan success metrics for the incident reporting system.

Drafting ongoing in this Google doc [WMF only].

Dec 22 2022, 10:57 PM · Incident-Reporting-System, Product-Analytics (Kanban)
nshahquinn-wmf updated the task description for T316972: Expand visibility into Wikipedia Preview.
Dec 22 2022, 9:56 PM · Product-Analytics, Wikipedia-Preview, Inuka-Team
nshahquinn-wmf updated the task description for T316972: Expand visibility into Wikipedia Preview.
Dec 22 2022, 9:43 PM · Product-Analytics, Wikipedia-Preview, Inuka-Team

Dec 21 2022

nshahquinn-wmf added a comment to T318850: Provide recommendations for Regional data .

@ntsako, @JAnstee_WMF: here's an offhand thought, in case it's not too late.

Dec 21 2022, 12:46 AM · Product-Analytics

Dec 20 2022

nshahquinn-wmf closed T324376: Detecting in-app browser traffic as Resolved.

I've filed T325611: Add TikTok's in-app browser to ua-parser library and consolidated a bunch of the information we have about referrers (the large majority of it from @Isaac! 😄) onto Research:Referrer on Meta-Wiki.

Dec 20 2022, 1:43 AM · Product-Analytics (Kanban)
nshahquinn-wmf closed T324376: Detecting in-app browser traffic, a subtask of T324230: Update of TikTok referrals, as Resolved.
Dec 20 2022, 1:43 AM · Product-Analytics (Kanban)
nshahquinn-wmf created T325611: Add TikTok's in-app browser to ua-parser library.
Dec 20 2022, 1:14 AM · Data Pipelines, Data-Engineering-Planning, Product-Analytics

Dec 15 2022

nshahquinn-wmf closed T324230: Update of TikTok referrals as Resolved.

The results are available in this Google doc (WMF only). Some of the results are publicly available in my research notebook on GitHub.

Dec 15 2022, 3:06 AM · Product-Analytics (Kanban)

Dec 8 2022

nshahquinn-wmf moved T324376: Detecting in-app browser traffic from Next 2 weeks to Needs Investigation on the Product-Analytics (Kanban) board.
Dec 8 2022, 2:42 AM · Product-Analytics (Kanban)
nshahquinn-wmf edited projects for T324726: Prepare metrics and product learnings presentation for Inuka offsite, added: Product-Analytics (Kanban); removed Product-Analytics.
Dec 8 2022, 2:42 AM · Product-Analytics (Kanban), Inuka-Team
nshahquinn-wmf moved T324726: Prepare metrics and product learnings presentation for Inuka offsite from Backlog to Analyst on the Inuka-Team board.
Dec 8 2022, 2:42 AM · Product-Analytics (Kanban), Inuka-Team
nshahquinn-wmf triaged T324726: Prepare metrics and product learnings presentation for Inuka offsite as Medium priority.
Dec 8 2022, 2:42 AM · Product-Analytics (Kanban), Inuka-Team
nshahquinn-wmf moved T323427: Plan success metrics for the incident reporting system from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Dec 8 2022, 2:36 AM · Incident-Reporting-System, Product-Analytics (Kanban)
nshahquinn-wmf moved T324230: Update of TikTok referrals from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Dec 8 2022, 2:36 AM · Product-Analytics (Kanban)

Dec 5 2022

nshahquinn-wmf added a parent task for T304544: Move Wmfdata-Python from Github to Gitlab: T305039: Migrate active Wikimedia repositories in GitHub to GitLab.
Dec 5 2022, 7:40 PM · Data-Engineering-Kanban, GitLab (Project Migration), Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a subtask for T305039: Migrate active Wikimedia repositories in GitHub to GitLab: T304544: Move Wmfdata-Python from Github to Gitlab.
Dec 5 2022, 7:40 PM · User-AKlapper, Epic, GitLab (Project Migration), Wikimedia-GitHub

Dec 1 2022

nshahquinn-wmf triaged T324230: Update of TikTok referrals as High priority.
Dec 1 2022, 9:57 PM · Product-Analytics (Kanban)
nshahquinn-wmf updated the task description for T324135: Wmfdata-Python triggers a Pandas warning during mariadb.run and hive.run.
Dec 1 2022, 6:12 AM · Data-Engineering, Product-Analytics, Wmfdata-Python

Nov 30 2022

nshahquinn-wmf renamed T322533: MVP for Notebook Scheduler from MVP for Notebook Schedular to MVP for Notebook Scheduler.
Nov 30 2022, 8:17 PM · Data Pipelines
nshahquinn-wmf renamed T322532: Notebook Scheduler for Product Analytics from Notebook Schedular for Product Analytics to Notebook Scheduler for Product Analytics.
Nov 30 2022, 8:17 PM · Epic, Data Pipelines
nshahquinn-wmf added a comment to T324126: Investigate whether admin privileges on Jupyter are correct.

Adding steps for a non-admin user to verify that they do not see the 'Admin' tab:

  1. Connect to JupyterHub in the usual way, say: ssh -N stat1007.eqiad.wmnet -L 8880:127.0.0.1:8880
  2. On a browser, go to http://localhost:8880/hub/home
  3. Confirm whether the 'Admin' tab is there or not
Nov 30 2022, 7:24 PM · Data Pipelines (Sprint 05-06)
nshahquinn-wmf closed T248739: Allow query results to be cached in the filesystem or HDFS as Declined.

I still kind of like this idea, but it would be significant amount of work for a pretty marginal benefit.

Nov 30 2022, 6:08 PM · Data-Engineering, Wmfdata-Python, Product-Analytics
nshahquinn-wmf closed T301734: conda-create-stacked breaks wmfdata.presto as Declined.

The simpler base environment is definitely real now, and in any case I've created a lot of new stacked environments in the past several months without encountering this issue.

Nov 30 2022, 6:02 PM · Wmfdata-Python, Data-Engineering-Kanban, Data-Engineering, Product-Analytics
nshahquinn-wmf closed T294668: Create a script that installs Wmfdata-Python in development mode as Declined.

In a conda-analytics environment, pip install -e . works just fine, so there's no need for an install script.

Nov 30 2022, 5:58 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf created T324135: Wmfdata-Python triggers a Pandas warning during mariadb.run and hive.run.
Nov 30 2022, 5:48 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a comment to T321960: Presto returns incorrect data for an added field.

Here is the same query after the configuration change has been deployed.

presto> SELECT contribution_attempt_id, COUNT(*) AS frequency FROM event.mediawiki_wikistories_contribution_event WHERE year = 2022 AND ( month < 10 OR month = 10 AND day < 17 ) GROUP BY contribution_attempt_id; 
 contribution_attempt_id | frequency 
-------------------------+-----------
 NULL                    |      2205 
(1 row)
Nov 30 2022, 5:18 PM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics

Nov 29 2022

nshahquinn-wmf added a comment to T292479: wmfdata.mariadb relies on analytics-mysql being available.

Updated the description to note:

In addition, analytics-mysql is not available on an-test-client1001, which complicates the process of testing Wmfdata.

Nov 29 2022, 8:37 PM · Data-Engineering, Product-Analytics, Analytics-Kanban, Wmfdata-Python
nshahquinn-wmf updated the task description for T292479: wmfdata.mariadb relies on analytics-mysql being available.
Nov 29 2022, 8:36 PM · Data-Engineering, Product-Analytics, Analytics-Kanban, Wmfdata-Python
nshahquinn-wmf triaged T324053: Remove Matplotlib as a Wmfdata-Python dependency as Low priority.

For the most part, the dependency doesn't matter.

Nov 29 2022, 8:01 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf created T324053: Remove Matplotlib as a Wmfdata-Python dependency.
Nov 29 2022, 7:52 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf awarded T324025: Improve docs around JupyterLab and conda-analytics a Doubloon token.
Nov 29 2022, 3:36 PM · Data Pipelines
nshahquinn-wmf moved T316970: Neil gains familiarity with R for data science from Triage to Upcoming Quarter on the Product-Analytics board.
Nov 29 2022, 6:35 AM · Product-Analytics
nshahquinn-wmf edited projects for T316970: Neil gains familiarity with R for data science, added: Product-Analytics; removed Product-Analytics (Kanban).
Nov 29 2022, 6:34 AM · Product-Analytics

Nov 28 2022

nshahquinn-wmf closed T245713: wmfdata cannot recover from a crashed Spark session, a subtask of T245891: Analysts cannot reliably use wmfdata to run SQL queries against Hive databases, as Resolved.
Nov 28 2022, 7:01 PM · Product-Analytics, Data-Engineering, Analytics-Radar, Wmfdata-Python, Epic
nshahquinn-wmf closed T245713: wmfdata cannot recover from a crashed Spark session as Resolved.

Thanks to T273210, Wmfdata now has the ability to recreate Spark sessions in the same notebook, which should give it the ability to easily recover from a crashed Spark session.

Nov 28 2022, 7:01 PM · Data-Engineering, Analytics-Radar, Product-Analytics, Wmfdata-Python

Nov 23 2022

nshahquinn-wmf added a comment to T321088: Add support for jupyterhub on conda-analytics.

Cool, thank you @xcollazo! 🎉

Nov 23 2022, 2:38 AM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics
nshahquinn-wmf reassigned T300442: Release Wmfdata-Python 2.0 from nshahquinn-wmf to xcollazo.
Nov 23 2022, 2:38 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated subscribers of T300442: Release Wmfdata-Python 2.0.

Okay, I've merged the documentation improvements and version 2.0.0 changes to main and sent a pre-announcement to several Slack channels and analytics-announce@lists.wikimedia.org.

Nov 23 2022, 2:33 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T323426: Update Wmfdata-Python quickstart notebook, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Nov 23 2022, 1:48 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T323426: Update Wmfdata-Python quickstart notebook as Resolved.

Merged in PR40.

Nov 23 2022, 1:48 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated the task description for T298178: Create end-user documentation for Wmfdata-Python.
Nov 23 2022, 1:47 AM · Data-Engineering, Documentation, Product-Analytics, Wmfdata-Python

Nov 21 2022

nshahquinn-wmf added a comment to T321088: Add support for jupyterhub on conda-analytics.

I could add the following for you on the global condarc:

# With strict channel priority, packages in lower priority channels are not considered
# if a package with the same name appears in a higher priority channel.
channel_priority: strict

channels:
  - conda-forge
  - defaults
Nov 21 2022, 9:13 PM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics
nshahquinn-wmf added a comment to T321088: Add support for jupyterhub on conda-analytics.

@xcollazo a month ago, I suggested changing the default source of Conda packages in conda-analytics. Let me re-up this here so you can consider doing this before the migration. For context, I think this would be a minor improvement, so it's fine to ignore if you think it's not worth the effort.

Nov 21 2022, 6:45 PM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics

Nov 19 2022

nshahquinn-wmf triaged T323427: Plan success metrics for the incident reporting system as Medium priority.
Nov 19 2022, 3:00 AM · Incident-Reporting-System, Product-Analytics (Kanban)
nshahquinn-wmf edited projects for T323426: Update Wmfdata-Python quickstart notebook, added: Product-Analytics (Kanban); removed Product-Analytics.
Nov 19 2022, 2:52 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf triaged T323426: Update Wmfdata-Python quickstart notebook as Medium priority.
Nov 19 2022, 2:51 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf created T323426: Update Wmfdata-Python quickstart notebook.
Nov 19 2022, 2:51 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T298179: Remove Spark session timeout functionality from Wmfdata-Python as Resolved.

The pull request has been merged!

Nov 19 2022, 2:28 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T298179: Remove Spark session timeout functionality from Wmfdata-Python, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Nov 19 2022, 2:28 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior as Resolved.

The pull request has been merged!

Nov 19 2022, 2:27 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Nov 19 2022, 2:27 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T318587: Upgrade WMFData Python Package to use Spark3, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Nov 19 2022, 2:27 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T318587: Upgrade WMFData Python Package to use Spark3 as Resolved.
Nov 19 2022, 2:27 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python

Nov 17 2022

nshahquinn-wmf created P40122 (An Untitled Masterwork).
Nov 17 2022, 8:58 PM

Nov 16 2022

nshahquinn-wmf created P39868 CondaPackException on an-test-client1001.
Nov 16 2022, 1:16 AM

Nov 15 2022

nshahquinn-wmf added a comment to T300442: Release Wmfdata-Python 2.0.

The removals have been merged. This will stay open until we actually release version 2.0, likely late this week or early next.

Nov 15 2022, 2:52 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T293722: wmfdata.spark module should provide easy access to pyspark as Resolved.

I've verified that import pyspark just works in the new conda-analytics environment. Coincidentally, my changes for T273210 have ended up making PySpark available as wmfdata.spark.pyspark. So this is doubly solved.

Nov 15 2022, 2:48 AM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf closed T305067: Update anaconda-wmf's wmfdata-python to 1.4.0 as Declined.

Soon, we are going to be moving from anaconda-wmf to conda-analytics as the base for new Conda environments (T321088). That will contain Wmfdata-Python 2.0, so we can skip directly to that.

Nov 15 2022, 2:44 AM · Product-Analytics, Data-Engineering, Wmfdata-Python
nshahquinn-wmf moved T318587: Upgrade WMFData Python Package to use Spark3 from In Progress to Done on the Data Pipelines (Sprint 04) board.

@xcollazo's code has been merged, so I think this is done. Work continues on T300442: Release Wmfdata-Python 2.0.

Nov 15 2022, 2:41 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python
nshahquinn-wmf removed a subtask for T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments: T318587: Upgrade WMFData Python Package to use Spark3.
Nov 15 2022, 2:39 AM · Analytics-Jupyter, Product-Analytics, Data-Engineering
nshahquinn-wmf removed a parent task for T318587: Upgrade WMFData Python Package to use Spark3: T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments.
Nov 15 2022, 2:39 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a parent task for T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior: T300442: Release Wmfdata-Python 2.0.
Nov 15 2022, 2:39 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf added a parent task for T298179: Remove Spark session timeout functionality from Wmfdata-Python: T300442: Release Wmfdata-Python 2.0.
Nov 15 2022, 2:39 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf added a parent task for T318587: Upgrade WMFData Python Package to use Spark3: T300442: Release Wmfdata-Python 2.0.
Nov 15 2022, 2:39 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python