Page MenuHomePhabricator

nshahquinn-wmf (Neil Shah-Quinn)
senior data scientist, Product Analytics, Wikimedia Foundation

Projects (7)

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Tuesday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2015, 4:17 PM (398 w, 2 d)
Availability
Available
IRC Nick
nshahquinn
LDAP User
Neil P. Quinn-WMF
MediaWiki User
Neil Shah-Quinn (WMF) [ Global Accounts ]

Recent Activity

Thu, Dec 1

nshahquinn-wmf triaged T324230: Update of TikTok referrals as High priority.
Thu, Dec 1, 9:57 PM · Product-Analytics (Kanban)
nshahquinn-wmf updated the task description for T324135: Wmfdata-Python triggers a Pandas warning during mariadb.run.
Thu, Dec 1, 6:12 AM · Data-Engineering-Planning, Product-Analytics, Wmfdata-Python

Wed, Nov 30

nshahquinn-wmf renamed T322533: MVP for Notebook Scheduler from MVP for Notebook Schedular to MVP for Notebook Scheduler.
Wed, Nov 30, 8:17 PM · Data Pipelines
nshahquinn-wmf renamed T322532: Notebook Scheduler for Product Analytics from Notebook Schedular for Product Analytics to Notebook Scheduler for Product Analytics.
Wed, Nov 30, 8:17 PM · Epic, Data Pipelines
nshahquinn-wmf added a comment to T324126: Investigate whether admin privileges on Jupyter are correct.

Adding steps for a non-admin user to verify that they do not see the 'Admin' tab:

  1. Connect to JupyterHub in the usual way, say: ssh -N stat1007.eqiad.wmnet -L 8880:127.0.0.1:8880
  2. On a browser, go to http://localhost:8880/hub/home
  3. Confirm whether the 'Admin' tab is there or not
Wed, Nov 30, 7:24 PM · Data Pipelines (Sprint 05-06)
nshahquinn-wmf closed T248739: Allow query results to be cached in the filesystem or HDFS as Declined.

I still kind of like this idea, but it would be significant amount of work for a pretty marginal benefit. It's a reasonable expectation that an analyst who makes a mistake and overwrites their source data with an incorrect transformation just has to rerun the query, even if takes a while.

Wed, Nov 30, 6:08 PM · Data-Engineering, Wmfdata-Python, Product-Analytics
nshahquinn-wmf closed T301734: conda-create-stacked breaks wmfdata.presto as Declined.

The simpler base environment is definitely real now, and in any case I've created a lot of new stacked environments in the past several months without encountering this issue.

Wed, Nov 30, 6:02 PM · Wmfdata-Python, Data-Engineering-Kanban, Data-Engineering, Product-Analytics
nshahquinn-wmf closed T294668: Create a script that installs Wmfdata-Python in development mode as Declined.

In a conda-analytics environment, pip install -e . works just fine, so there's no need for an install script.

Wed, Nov 30, 5:58 PM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf created T324135: Wmfdata-Python triggers a Pandas warning during mariadb.run.
Wed, Nov 30, 5:48 PM · Data-Engineering-Planning, Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a comment to T321960: Presto returns incorrect data for an added field.
Here is the same query after the configuration change has been deployed.

presto> SELECT contribution_attempt_id, COUNT(*) AS frequency FROM event.mediawiki_wikistories_contribution_event WHERE year = 2022 AND ( month < 10 OR month = 10 AND day < 17 ) GROUP BY contribution_attempt_id;
contribution_attempt_id | frequency
-------------------------+-----------
NULL | 2205
(1 row)

Wed, Nov 30, 5:18 PM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics

Tue, Nov 29

nshahquinn-wmf added a comment to T292479: wmfdata.mariadb relies on analytics-mysql being available.

Updated the description to note:

In addition, analytics-mysql is not available on an-test-client1001, which complicates the process of testing Wmfdata.

Tue, Nov 29, 8:37 PM · Data-Engineering, Product-Analytics, Analytics-Kanban, SRE, Wmfdata-Python
nshahquinn-wmf updated the task description for T292479: wmfdata.mariadb relies on analytics-mysql being available.
Tue, Nov 29, 8:36 PM · Data-Engineering, Product-Analytics, Analytics-Kanban, SRE, Wmfdata-Python
nshahquinn-wmf triaged T324053: Remove Matplotlib as a Wmfdata-Python dependency as Low priority.

For the most part, the dependency doesn't matter.

Tue, Nov 29, 8:01 PM · Data-Engineering-Planning, Product-Analytics, Wmfdata-Python
nshahquinn-wmf created T324053: Remove Matplotlib as a Wmfdata-Python dependency.
Tue, Nov 29, 7:52 PM · Data-Engineering-Planning, Product-Analytics, Wmfdata-Python
nshahquinn-wmf awarded T324025: Improve docs around JupyterLab and conda-analytics a Doubloon token.
Tue, Nov 29, 3:36 PM · Data Pipelines
nshahquinn-wmf moved T316970: Neil gains familiarity with R for data science from Triage to Upcoming Quarter on the Product-Analytics board.
Tue, Nov 29, 6:35 AM · Product-Analytics
nshahquinn-wmf edited projects for T316970: Neil gains familiarity with R for data science, added: Product-Analytics; removed Product-Analytics (Kanban).
Tue, Nov 29, 6:34 AM · Product-Analytics

Mon, Nov 28

nshahquinn-wmf closed T245713: wmfdata cannot recover from a crashed Spark session, a subtask of T245891: Analysts cannot reliably use wmfdata to run SQL queries against Hive databases, as Resolved.
Mon, Nov 28, 7:01 PM · Product-Analytics, Data-Engineering, Analytics-Radar, Wmfdata-Python, Epic
nshahquinn-wmf closed T245713: wmfdata cannot recover from a crashed Spark session as Resolved.

Thanks to T273210, Wmfdata now has the ability to recreate Spark sessions in the same notebook, which should give it the ability to easily recover from a crashed Spark session.

Mon, Nov 28, 7:01 PM · Data-Engineering, Analytics-Radar, Product-Analytics, Wmfdata-Python

Wed, Nov 23

nshahquinn-wmf added a comment to T321088: Add support for jupyterlab on conda-analytics.

Cool, thank you @xcollazo! 🎉

Wed, Nov 23, 2:38 AM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics
nshahquinn-wmf reassigned T300442: Release Wmfdata-Python 2.0 from nshahquinn-wmf to xcollazo.
Wed, Nov 23, 2:38 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated subscribers of T300442: Release Wmfdata-Python 2.0.

Okay, I've merged the documentation improvements and version 2.0.0 changes to main and sent a pre-announcement to several Slack channels and analytics-announce@lists.wikimedia.org.

Wed, Nov 23, 2:33 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T323426: Update Wmfdata-Python quickstart notebook, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Wed, Nov 23, 1:48 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T323426: Update Wmfdata-Python quickstart notebook as Resolved.

Merged in PR40.

Wed, Nov 23, 1:48 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated the task description for T298178: Create end-user documentation for Wmfdata-Python.
Wed, Nov 23, 1:47 AM · Data-Engineering, Documentation, Product-Analytics, Wmfdata-Python

Mon, Nov 21

nshahquinn-wmf added a comment to T321088: Add support for jupyterlab on conda-analytics.

I could add the following for you on the global condarc:

# With strict channel priority, packages in lower priority channels are not considered
# if a package with the same name appears in a higher priority channel.
channel_priority: strict

channels:
  - conda-forge
  - defaults
Mon, Nov 21, 9:13 PM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics
nshahquinn-wmf added a comment to T321088: Add support for jupyterlab on conda-analytics.

@xcollazo a month ago, I suggested changing the default source of Conda packages in conda-analytics. Let me re-up this here so you can consider doing this before the migration. For context, I think this would be a minor improvement, so it's fine to ignore if you think it's not worth the effort.

Mon, Nov 21, 6:45 PM · Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Analytics-Jupyter, Product-Analytics

Sat, Nov 19

nshahquinn-wmf triaged T323427: Plan success metrics for the incident reporting system as Medium priority.
Sat, Nov 19, 3:00 AM · Incident-Reporting-System, Product-Analytics (Kanban)
nshahquinn-wmf edited projects for T323426: Update Wmfdata-Python quickstart notebook, added: Product-Analytics (Kanban); removed Product-Analytics.
Sat, Nov 19, 2:52 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf triaged T323426: Update Wmfdata-Python quickstart notebook as Medium priority.
Sat, Nov 19, 2:51 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf created T323426: Update Wmfdata-Python quickstart notebook.
Sat, Nov 19, 2:51 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T298179: Remove Spark session timeout functionality from Wmfdata-Python as Resolved.

The pull request has been merged!

Sat, Nov 19, 2:28 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T298179: Remove Spark session timeout functionality from Wmfdata-Python, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Sat, Nov 19, 2:28 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior as Resolved.

The pull request has been merged!

Sat, Nov 19, 2:27 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Sat, Nov 19, 2:27 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T318587: Upgrade WMFData Python Package to use Spark3, a subtask of T300442: Release Wmfdata-Python 2.0, as Resolved.
Sat, Nov 19, 2:27 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T318587: Upgrade WMFData Python Package to use Spark3 as Resolved.
Sat, Nov 19, 2:27 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python

Thu, Nov 17

nshahquinn-wmf created P40122 (An Untitled Masterwork).
Thu, Nov 17, 8:58 PM

Wed, Nov 16

nshahquinn-wmf created P39868 CondaPackException on an-test-client1001.
Wed, Nov 16, 1:16 AM

Tue, Nov 15

nshahquinn-wmf added a comment to T300442: Release Wmfdata-Python 2.0.

The removals have been merged. This will stay open until we actually release version 2.0, likely late this week or early next.

Tue, Nov 15, 2:52 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T293722: wmfdata.spark module should provide easy access to pyspark as Resolved.

I've verified that import pyspark just works in the new conda-analytics environment. Coincidentally, T273210 will end up making PySpark available as wmfdata.spark.pyspark. So this is doubly solved.

Tue, Nov 15, 2:48 AM · Data-Engineering, Product-Analytics, Wmfdata-Python
nshahquinn-wmf closed T305067: Update anaconda-wmf's wmfdata-python to 1.4.0 as Declined.

Soon, we are going to be moving from anaconda-wmf to conda-analytics as the base for new Conda environments (T321088). That will contain Wmfdata-Python 2.0, so we can skip directly to that.

Tue, Nov 15, 2:44 AM · Product-Analytics, Data-Engineering, Wmfdata-Python
nshahquinn-wmf moved T318587: Upgrade WMFData Python Package to use Spark3 from In Progress to Done on the Data Pipelines (Sprint 04) board.

@xcollazo's code has been merged, so I think this is done. Work continues on T300442: Release Wmfdata-Python 2.0.

Tue, Nov 15, 2:41 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python
nshahquinn-wmf removed a subtask for T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments: T318587: Upgrade WMFData Python Package to use Spark3.
Tue, Nov 15, 2:39 AM · Analytics-Jupyter, Product-Analytics, Data-Engineering
nshahquinn-wmf removed a parent task for T318587: Upgrade WMFData Python Package to use Spark3: T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments.
Tue, Nov 15, 2:39 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python
nshahquinn-wmf added a parent task for T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior: T300442: Release Wmfdata-Python 2.0.
Tue, Nov 15, 2:39 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf added a parent task for T298179: Remove Spark session timeout functionality from Wmfdata-Python: T300442: Release Wmfdata-Python 2.0.
Tue, Nov 15, 2:39 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf added a parent task for T318587: Upgrade WMFData Python Package to use Spark3: T300442: Release Wmfdata-Python 2.0.
Tue, Nov 15, 2:39 AM · Data Pipelines (Sprint 04), Product-Analytics, Wmfdata-Python
nshahquinn-wmf added subtasks for T300442: Release Wmfdata-Python 2.0: T318587: Upgrade WMFData Python Package to use Spark3, T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior, T298179: Remove Spark session timeout functionality from Wmfdata-Python.
Tue, Nov 15, 2:39 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf added a subtask for T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments: T300442: Release Wmfdata-Python 2.0.
Tue, Nov 15, 2:38 AM · Analytics-Jupyter, Product-Analytics, Data-Engineering
nshahquinn-wmf added a parent task for T300442: Release Wmfdata-Python 2.0: T302819: Replace anaconda-wmf with smaller, non-stacked Conda environments.
Tue, Nov 15, 2:38 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf moved T300442: Release Wmfdata-Python 2.0 from Doing to Blocked on the Product-Analytics (Kanban) board.
Tue, Nov 15, 2:00 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf closed T317934: Add a edit attempt identifier to the Wikistories contributor data stream as Resolved.
Tue, Nov 15, 1:59 AM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf moved T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior from Next 2 weeks to Blocked on the Product-Analytics (Kanban) board.

Waiting for @xcollazo's review.

Tue, Nov 15, 1:59 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf claimed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior.

Currently up for review in PR36.

Tue, Nov 15, 1:58 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf moved T298179: Remove Spark session timeout functionality from Wmfdata-Python from Next 2 weeks to Blocked on the Product-Analytics (Kanban) board.

Waiting for @xcollazo's review.

Tue, Nov 15, 1:57 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python
nshahquinn-wmf claimed T298179: Remove Spark session timeout functionality from Wmfdata-Python.

This is currently up for review in PR36.

Tue, Nov 15, 1:56 AM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python

Thu, Nov 10

nshahquinn-wmf added a comment to T298179: Remove Spark session timeout functionality from Wmfdata-Python.

If the timeout is removed, it could be possible to detect and alert when not production yarn applications are running for more than a week, for safety.

Thu, Nov 10, 9:48 PM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python

Tue, Nov 8

nshahquinn-wmf added a comment to T300442: Release Wmfdata-Python 2.0.

The bulk of the removals up are for review in https://github.com/wikimedia/wmfdata-python/pull/35, although I just realized I still need to update the change log in that PR.

Tue, Nov 8, 12:38 AM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated the task description for T300442: Release Wmfdata-Python 2.0.
Tue, Nov 8, 12:37 AM · Data-Engineering, Wmfdata-Python

Mon, Nov 7

nshahquinn-wmf claimed T300442: Release Wmfdata-Python 2.0.

I will be doing all of this with the exception of T318587. It's a long list but it's all pretty simple.

Mon, Nov 7, 9:49 PM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf edited Description on Wmfdata-Python.
Mon, Nov 7, 9:47 PM
nshahquinn-wmf renamed Wmfdata-Python from wmfdata-python to Wmfdata-Python.
Mon, Nov 7, 9:45 PM
nshahquinn-wmf moved T300442: Release Wmfdata-Python 2.0 from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Mon, Nov 7, 9:42 PM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf edited projects for T300442: Release Wmfdata-Python 2.0, added: Product-Analytics (Kanban); removed Product-Analytics.
Mon, Nov 7, 9:42 PM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf updated the task description for T300442: Release Wmfdata-Python 2.0.
Mon, Nov 7, 5:58 PM · Data-Engineering, Wmfdata-Python
nshahquinn-wmf renamed T273210: Remodel Wmfdata-Python's Spark API to match underlying behavior from Rerunning Spark functions with changed settings has no effect to Remodel Wmfdata-Python's Spark API to match underlying behavior.
Mon, Nov 7, 5:50 PM · Product-Analytics (Kanban), Data-Engineering, Wmfdata-Python

Nov 3 2022

nshahquinn-wmf renamed T322094: Activity session ID seems to persist too long in some cases from Session tick session ID seems to persist too long in some cases to Activity session ID seems to persist too long in some cases.
Nov 3 2022, 6:35 PM · Product-Analytics, Metrics-Platform-Planning

Nov 2 2022

nshahquinn-wmf moved T317934: Add a edit attempt identifier to the Wikistories contributor data stream from Doing to Blocked on the Product-Analytics (Kanban) board.

Waiting on review by Data Engineering.

Nov 2 2022, 10:40 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf moved T317934: Add a edit attempt identifier to the Wikistories contributor data stream from Backlog to Analyst on the Inuka-Team board.
Nov 2 2022, 10:31 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf edited projects for T317934: Add a edit attempt identifier to the Wikistories contributor data stream, added: Inuka-Team; removed Inuka-Team (Kanban).
Nov 2 2022, 10:31 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf moved T317934: Add a edit attempt identifier to the Wikistories contributor data stream from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.

I need to add this new field to the sanitization allowlist.

Nov 2 2022, 10:14 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf claimed T317934: Add a edit attempt identifier to the Wikistories contributor data stream.
Nov 2 2022, 10:14 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)
nshahquinn-wmf closed T312262: Check up on Wikistories instrumentation as Resolved.

I will handle retaining contribution_attempt_id in T317934, so at long last, this task is done!

Nov 2 2022, 10:13 PM · Inuka-Team, Product-Analytics (Kanban)
nshahquinn-wmf added a comment to T312262: Check up on Wikistories instrumentation.

I've merged the improved documentation, although the generated schemas have some weird formatting (like this). If it's not too much work, I should tidy that up.

Nov 2 2022, 1:57 AM · Inuka-Team, Product-Analytics (Kanban)

Nov 1 2022

nshahquinn-wmf added a comment to T321960: Presto returns incorrect data for an added field.

If you select this data using Hive or Spark, it returns NULL for that column in old data.

Nov 1 2022, 12:47 AM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf updated the task description for T321960: Presto returns incorrect data for an added field.
Nov 1 2022, 12:47 AM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf renamed T321960: Presto returns incorrect data for an added field from Strange values in stored event data generated before instrumentation code was deployed to Presto returns incorrect data for an added field.
Nov 1 2022, 12:47 AM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf updated subscribers of T322094: Activity session ID seems to persist too long in some cases.

@phuedx interested in your thoughts on this 😊

Nov 1 2022, 12:45 AM · Product-Analytics, Metrics-Platform-Planning
nshahquinn-wmf added a comment to T312262: Check up on Wikistories instrumentation.

I spoke too soon: while I was updating the documentation, I thought of one more thing to check and ended up finding T322094.

Nov 1 2022, 12:43 AM · Inuka-Team, Product-Analytics (Kanban)
nshahquinn-wmf added a project to T322094: Activity session ID seems to persist too long in some cases: Product-Analytics.
Nov 1 2022, 12:42 AM · Product-Analytics, Metrics-Platform-Planning
nshahquinn-wmf created T322094: Activity session ID seems to persist too long in some cases.
Nov 1 2022, 12:42 AM · Product-Analytics, Metrics-Platform-Planning
nshahquinn-wmf closed T314622: EventLogging returns a new session ID on each pageview as Resolved.

Given that, I'm going to boldly close this.

Nov 1 2022, 12:26 AM · Metrics-Platform-Planning (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Product-Analytics
nshahquinn-wmf added a comment to T314622: EventLogging returns a new session ID on each pageview.

I happen to be doing checks on the same data again and this problem is clearly fixed.

Nov 1 2022, 12:25 AM · Metrics-Platform-Planning (Metrics Platform Kanban), MW-1.39-notes (1.39.0-wmf.25; 2022-08-15), Product-Analytics

Oct 31 2022

nshahquinn-wmf closed T248310: Parsoid HTML for articles about numerals contains unparsed Lua invocations as Resolved.

The example given in the description (https://en.wikipedia.org/api/rest_v1/page/html/9) now renders correctly:

Screenshot 2022-10-31 at 10.52.25.png (222×326 px, 21 KB)

Oct 31 2022, 5:57 PM · Parsoid-Read-Views (Phase 3 - Main namespace of officewiki / mediawiki.org renders with Parsoid), MediaWiki-Templates, MediaWiki-Parser, Parsoid, Page Content Service, Product-Infrastructure-Team-Backlog, KaiOS-Wikipedia-app, Inuka-Team

Oct 29 2022

nshahquinn-wmf added a comment to T312262: Check up on Wikistories instrumentation.
  • According to the consumption schema, story_open_time is the cumulative amount of time the user has had any story open during the current page view. I suspect that it is instead limited to the story_view event type and contains the amount of time that story was open before being closed. I should figure out which is the case and, if my suspicions are correct, decide whether we should adjust the instrumentation or adjust the definition.
Oct 29 2022, 1:49 AM · Inuka-Team, Product-Analytics (Kanban)
nshahquinn-wmf added a comment to T312262: Check up on Wikistories instrumentation.

Just verified the fix for T318706: Story builder instrumentation has story_already_exists hardcoded to false.

Oct 29 2022, 1:04 AM · Inuka-Team, Product-Analytics (Kanban)
nshahquinn-wmf added a comment to T318706: Story builder instrumentation has story_already_exists hardcoded to false.

Sorry for the delay! I've just checked the data we've received for this and there are a number of events where story_already_exists. I checked a random sample and the logged values were all correct. We can safely close this.

Oct 29 2022, 1:03 AM · MW-1.40-notes (1.40.0-wmf.4; 2022-10-03), Wikistories (R2), Inuka-Team (Kanban)

Oct 28 2022

nshahquinn-wmf added a comment to T312262: Check up on Wikistories instrumentation.

I checked the result of T317934: Add a edit attempt identifier to the Wikistories contributor data stream and noted T321960: Presto returns incorrect data for an added field.

Oct 28 2022, 11:01 PM · Inuka-Team, Product-Analytics (Kanban)
nshahquinn-wmf created T321960: Presto returns incorrect data for an added field.
Oct 28 2022, 10:53 PM · Patch-For-Review, Data Pipelines (Sprint 05-06), Data-Engineering-Planning, Product-Analytics
nshahquinn-wmf added a comment to T317934: Add a edit attempt identifier to the Wikistories contributor data stream.

Sorry for the delay!

Oct 28 2022, 10:20 PM · Inuka-Team, Product-Analytics (Kanban), MW-1.40-notes (1.40.0-wmf.6; 2022-10-17), Wikistories (R2)

Oct 25 2022

nshahquinn-wmf closed T293706: Add sql_tuple function to wmfdata-python as Resolved.

This has been released in Wmfdata 1.4.0.

Oct 25 2022, 6:05 PM · Product-Analytics, Data-Engineering, Wmfdata-Python
nshahquinn-wmf triaged T320993: Implement mobile DiscussionTools A/B test bucketing as Medium priority.
Oct 25 2022, 5:07 PM · MW-1.40-notes (1.40.0-wmf.8; 2022-10-31), Skipped QA, Product-Analytics, Editing-team (FY2021-22 Kanban Board), DiscussionTools
nshahquinn-wmf edited projects for T320993: Implement mobile DiscussionTools A/B test bucketing, added: Product-Analytics (Kanban); removed Product-Analytics.
Oct 25 2022, 5:07 PM · MW-1.40-notes (1.40.0-wmf.8; 2022-10-31), Skipped QA, Product-Analytics, Editing-team (FY2021-22 Kanban Board), DiscussionTools

Oct 21 2022

nshahquinn-wmf closed T318864: Pull basic metrics on KaiOS app use as Resolved.

I collected the data in this document and shared it earlier today.

Oct 21 2022, 11:54 PM · Product-Analytics (Kanban), Inuka-Team, KaiOS-Wikipedia-app
nshahquinn-wmf moved T318864: Pull basic metrics on KaiOS app use from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 21 2022, 8:15 PM · Product-Analytics (Kanban), Inuka-Team, KaiOS-Wikipedia-app
nshahquinn-wmf renamed T305067: Update anaconda-wmf's wmfdata-python to 1.4.0 from Update anaconda-wmf's wmfdata-python to 1.3.3 to Update anaconda-wmf's wmfdata-python to 1.4.0.
Oct 21 2022, 2:57 AM · Product-Analytics, Data-Engineering, Wmfdata-Python

Oct 20 2022

nshahquinn-wmf added a comment to T221482: Identify imported revisions in mediawiki_history.

then just mark all the revisions that have much larger revision ids than their parent (via rev_parent_id as revision_is_probably_imported

Oct 20 2022, 11:05 PM · Data-Engineering, Product-Analytics
nshahquinn-wmf updated subscribers of T319360: [EXPEDITED] Cannot query string data from MariaDB using Wmfdata-Python.
Oct 20 2022, 10:43 PM · Data Pipelines (Sprint 03), Wmfdata-Python, Product-Analytics
nshahquinn-wmf added a comment to T319360: [EXPEDITED] Cannot query string data from MariaDB using Wmfdata-Python.
Oct 20 2022, 10:32 PM · Data Pipelines (Sprint 03), Wmfdata-Python, Product-Analytics