Page MenuHomePhabricator
Feed Advanced Search

Tue, Nov 24

nettrom_WMF moved T265761: Update Media Search measurement specification with Visual Editor measurements from Next 2 weeks to Needs Sign-off on the Product-Analytics (Kanban) board.

I've updated the measurement specification to incorporate the Visual Editor measurements that we're interested in doing. This involved separating between Media Search on Commons and image search in Visual Editor, so there's now some separate sections for those where I thought it was reasonable.

Tue, Nov 24, 8:27 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added a subtask for T260254: Measure usage of Media Search integration in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Tue, Nov 24, 8:20 PM · Product-Analytics, Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-ReleaseCandidate)
nettrom_WMF added a subtask for T259308: Measure usage of image search in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Tue, Nov 24, 8:20 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added parent tasks for T265761: Update Media Search measurement specification with Visual Editor measurements: T260254: Measure usage of Media Search integration in Visual Editor, T259308: Measure usage of image search in Visual Editor.
Tue, Nov 24, 8:20 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF removed a parent task for T259308: Measure usage of image search in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Tue, Nov 24, 8:18 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF removed a parent task for T260254: Measure usage of Media Search integration in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Tue, Nov 24, 8:18 PM · Product-Analytics, Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-ReleaseCandidate)
nettrom_WMF removed subtasks for T265761: Update Media Search measurement specification with Visual Editor measurements: T259308: Measure usage of image search in Visual Editor, T260254: Measure usage of Media Search integration in Visual Editor.
Tue, Nov 24, 8:18 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF moved T255517: Newcomer tasks: reporting notebook after schema changes from Doing to Next 2 weeks on the Product-Analytics (Kanban) board.
Tue, Nov 24, 6:05 PM · Product-Analytics (Kanban), NewcomerTasks 1.2, Growth-Team (Current Sprint)

Mon, Nov 23

nettrom_WMF added a comment to T267333: Migrate Growth EventLogging schemas to Event Platform.

@nettrom_WMF I may have already asked you this elsewhere, but I'll ask again here so we have an officially documented answer.

Do any of these event streams need client IP and/or geocoded data? If not, it will be removed as part of this migration.

Mon, Nov 23, 8:06 PM · Growth-Team, Product-Analytics, Patch-For-Review, Analytics-Kanban, Analytics

Fri, Nov 20

nettrom_WMF updated the task description for T265771: Measure how multimedia content is added to Wikipedia articles.
Fri, Nov 20, 12:46 AM · Structured-Data-Backlog, Product-Analytics

Thu, Nov 19

nettrom_WMF added a comment to T266067: [L] Create edit tags to measure multimedia edits to Wikipedia articles.

Since the description mentions "those measurements": what, specifically, are we trying to measure?

Thu, Nov 19, 8:00 PM · Structured-Data-Backlog (Current Work)
nettrom_WMF moved T266982: Newcomer tasks: productivity on non-suggested edits from Needs Sign-off to Doing on the Product-Analytics (Kanban) board.
Thu, Nov 19, 12:18 AM · Product-Analytics (Kanban), Growth-Team (Current Sprint), NewcomerTasks 1.0 , GrowthExperiments-Homepage

Wed, Nov 18

nettrom_WMF moved T266374: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history from Current Quarter to Upcoming Quarter on the Product-Analytics board.

We're quickly running out of time in Q2, so moving this to Q3.

Wed, Nov 18, 10:23 PM · Product-Analytics, Analytics
nettrom_WMF moved T266982: Newcomer tasks: productivity on non-suggested edits from Ready for Development to Needs PM Review on the Growth-Team (Current Sprint) board.

The first pass of this analysis is now complete. I've put the notebooks for this in the NEWTEA GitHub repository, they're numbered 12 through 17. The notebooks numbered 9 through 11 are the same analysis using all edits, where we found a significant increase in the Homepage group compared to the control group.

Wed, Nov 18, 12:22 AM · Product-Analytics (Kanban), Growth-Team (Current Sprint), NewcomerTasks 1.0 , GrowthExperiments-Homepage
nettrom_WMF moved T266982: Newcomer tasks: productivity on non-suggested edits from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.
Wed, Nov 18, 12:22 AM · Product-Analytics (Kanban), Growth-Team (Current Sprint), NewcomerTasks 1.0 , GrowthExperiments-Homepage

Mon, Nov 16

nettrom_WMF added a parent task for T265101: Instrument event logging for VE's image search: T260254: Measure usage of Media Search integration in Visual Editor.
Mon, Nov 16, 6:08 PM · Editing-team (FY2020-21 Kanban Board), Better Use Of Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Infrastructure-Data, Structured-Data-Backlog (Current Work), Editing-Team-Request, VisualEditor
nettrom_WMF added a subtask for T260254: Measure usage of Media Search integration in Visual Editor: T265101: Instrument event logging for VE's image search.
Mon, Nov 16, 6:08 PM · Product-Analytics, Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-ReleaseCandidate)
nettrom_WMF added a comment to T258229: Build dashboards for search activity on MediaSearch on Commons.

I've made T258183 a subtask of this task, since we can't build the dashboard until the instrumentation is in place.

Mon, Nov 16, 5:55 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics, Structured-Data-Backlog
nettrom_WMF added a parent task for T258183: [L] Instrument MediaSearch results page: T258229: Build dashboards for search activity on MediaSearch on Commons.
Mon, Nov 16, 5:54 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Patch-For-Review, Product-Analytics, Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering
nettrom_WMF added a subtask for T258229: Build dashboards for search activity on MediaSearch on Commons: T258183: [L] Instrument MediaSearch results page.
Mon, Nov 16, 5:54 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics, Structured-Data-Backlog
nettrom_WMF closed T261759: Analyze Media Search A/B test as Resolved.

Now that the subtask is resolved and the notebook is accessible, I'm closing this task as well.

Mon, Nov 16, 5:06 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)
nettrom_WMF closed T265935: Create layperson-friendly version of Media Search A/B test analysis as Resolved.

I also received word from Mikhail that he'd reviewed it and everything's good to go! Closing as resolved.

Mon, Nov 16, 5:04 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)
nettrom_WMF closed T265935: Create layperson-friendly version of Media Search A/B test analysis, a subtask of T261759: Analyze Media Search A/B test, as Resolved.
Mon, Nov 16, 5:04 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)

Wed, Nov 11

nettrom_WMF added a comment to T264831: Variant tests: C vs. D analysis.

@MMiller_WMF : you've asked me to determine what the duration of the Variant C/D experiment should be. Here's what I've come up with.

Wed, Nov 11, 8:26 PM · Product-Analytics, GrowthExperiments, Growth-Team
nettrom_WMF added a comment to T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits.

@Tgr : Thanks for chiming in here and volunteering to help out! Your suggested approach for reconstructing the edits is what I also came up with. I identified three conditions for exclusion: initializing the Newcomer Task module, changing topics, and changing difficulties. In all three cases the module is loaded/refreshed and the link correct, so the tag would be applied.

Wed, Nov 11, 8:19 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage

Tue, Nov 10

nettrom_WMF moved T266982: Newcomer tasks: productivity on non-suggested edits from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Tue, Nov 10, 6:07 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), NewcomerTasks 1.0 , GrowthExperiments-Homepage

Thu, Nov 5

nettrom_WMF added a comment to T267333: Migrate Growth EventLogging schemas to Event Platform.

Also, @nettrom_WMF, can you confirm whether we should migrate all those at the exact same time, or just migrate them close enough?

Thu, Nov 5, 11:50 PM · Growth-Team, Product-Analytics, Patch-For-Review, Analytics-Kanban, Analytics
nettrom_WMF added a comment to T267333: Migrate Growth EventLogging schemas to Event Platform.

@Ottomata : It would be helpful to have ServerSideAccountCreation grouped with these, I've updated the task description to reflect that.

Thu, Nov 5, 5:37 PM · Growth-Team, Product-Analytics, Patch-For-Review, Analytics-Kanban, Analytics
nettrom_WMF added projects to T267333: Migrate Growth EventLogging schemas to Event Platform: Product-Analytics, Growth-Team.
Thu, Nov 5, 5:24 PM · Growth-Team, Product-Analytics, Patch-For-Review, Analytics-Kanban, Analytics
nettrom_WMF added a project to T259163: Migrate legacy metawiki schemas to Event Platform: Product-Analytics.

I've added Product Analytics so the team's aware of this, we have our board refinement coming up today. I also see that our team members get subscribed to the child tasks as they're created so they're individually aware of them, thanks for doing that!

Thu, Nov 5, 5:23 PM · Product-Analytics, MW-1.36-notes (1.36.0-wmf.18; 2020-11-17), Patch-For-Review, Product-Infrastructure-Data, Analytics-Kanban, Analytics, Analytics-EventLogging, Event-Platform

Wed, Nov 4

nettrom_WMF added a project to T264831: Variant tests: C vs. D analysis: Product-Analytics.
Wed, Nov 4, 10:30 PM · Product-Analytics, GrowthExperiments, Growth-Team

Tue, Nov 3

nettrom_WMF moved T265935: Create layperson-friendly version of Media Search A/B test analysis from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.

I've updated the notebook on GitHub, adding text aiming to make it more accessible to everyone.

Tue, Nov 3, 11:34 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)
nettrom_WMF moved T265935: Create layperson-friendly version of Media Search A/B test analysis from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Tue, Nov 3, 6:48 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)

Mon, Nov 2

nettrom_WMF moved T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.
Mon, Nov 2, 7:26 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage
nettrom_WMF moved T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits from In Progress to Needs PM Review on the Growth-Team (Current Sprint) board.
Mon, Nov 2, 7:26 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage

Oct 30 2020

nettrom_WMF moved T255517: Newcomer tasks: reporting notebook after schema changes from Needs Sign-off to Doing on the Product-Analytics (Kanban) board.
Oct 30 2020, 5:17 PM · Product-Analytics (Kanban), NewcomerTasks 1.2, Growth-Team (Current Sprint)

Oct 28 2020

nettrom_WMF added a comment to T250049: Drop data from Prefupdate schema that is older than 90 days.

@Milimetric : Thanks for clarifying that, and for your patience while I got back on this! I chatted with the Product Analytics team about this, and we're fine with waiting for the re-sanitization to come around in early November to fill the gap in the sanitized data.

Oct 28 2020, 5:05 PM · Analytics-Kanban, audits-data-retention, Analytics, Product-Analytics, Privacy Engineering, Privacy, Security
nettrom_WMF added a comment to T266374: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history.

@JAllemandou : Yes, and I'm expecting to see some checksum-based reverts not having the tag because the tag only checks the last 15 edits.

Oct 28 2020, 4:52 PM · Product-Analytics, Analytics

Oct 27 2020

nettrom_WMF updated the task description for T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits.
Oct 27 2020, 10:01 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage
nettrom_WMF moved T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 27 2020, 9:58 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage
nettrom_WMF moved T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits from Ready for Development to In Progress on the Growth-Team (Current Sprint) board.
Oct 27 2020, 9:58 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage
nettrom_WMF triaged T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits as High priority.
Oct 27 2020, 9:25 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage
nettrom_WMF created T266610: Newcomer tasks: estimate impact of edit tags not being applied to all edits.
Oct 27 2020, 9:24 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), GrowthExperiments-Homepage

Oct 26 2020

nettrom_WMF added a comment to T264890: Add variable placeholder to wmfdata-r.

@mpopov : being able to provide query_hive with a list of parameters and have it replace placeholders would be really useful, I definitely support that!

Oct 26 2020, 8:19 PM · Product-Analytics
nettrom_WMF moved T255517: Newcomer tasks: reporting notebook after schema changes from In Progress to Needs PM Review on the Growth-Team (Current Sprint) board.

All of the proposed changes have been implemented and @MMiller_WMF now has the notebook for testing.

Oct 26 2020, 6:50 PM · Product-Analytics (Kanban), NewcomerTasks 1.2, Growth-Team (Current Sprint)
nettrom_WMF moved T255517: Newcomer tasks: reporting notebook after schema changes from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.
Oct 26 2020, 6:50 PM · Product-Analytics (Kanban), NewcomerTasks 1.2, Growth-Team (Current Sprint)
mpopov awarded T266374: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history a Grey Medal token.
Oct 26 2020, 1:41 PM · Product-Analytics, Analytics
JAllemandou awarded T266374: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history a Grey Medal token.
Oct 26 2020, 8:29 AM · Product-Analytics, Analytics

Oct 23 2020

nettrom_WMF updated subscribers of T266375: Add timestamps of important revision events to mediawiki_history.

@Isaac : you wanted me to tag you when I filed the task for getting information about revision tag changes into MediaWiki history. Here's said tag. I don't remember what changes you were interested in, maybe they'll fit here too?

Oct 23 2020, 10:48 PM · Product-Analytics, Analytics
nettrom_WMF created T266375: Add timestamps of important revision events to mediawiki_history.
Oct 23 2020, 10:45 PM · Product-Analytics, Analytics
nettrom_WMF created T266374: Analyze differences between checksum-based and revert-tag based reverts in mediawiki_history.
Oct 23 2020, 10:31 PM · Product-Analytics, Analytics

Oct 20 2020

nettrom_WMF created T266077: Add image table to monthly sqoop list.
Oct 20 2020, 10:47 PM · Patch-For-Review, Analytics-Kanban, Product-Analytics, Analytics, Structured-Data-Backlog
nettrom_WMF moved T265768: Statistics on media usage across Wikipedias from Doing to Blocked on the Product-Analytics (Kanban) board.

I spoke too soon! I've written up a query following the above mentioned idea, but this turns out to not work in practice. The issue is that a wiki can use a file from Commons but also have a local file description page. Attendekall.jpg on Nynorsk Wikipedia is an example of that. The actual file is on Commons, but it has a local description page to categorize it into the local programming category. This means that the page table isn't an authoritative source for whether a file exists locally on the wiki.

Oct 20 2020, 8:04 PM · Product-Analytics (Kanban), Structured-Data-Backlog

Oct 19 2020

nettrom_WMF moved T265768: Statistics on media usage across Wikipedias from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 19 2020, 4:23 PM · Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF triaged T265768: Statistics on media usage across Wikipedias as Medium priority.
Oct 19 2020, 4:23 PM · Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF triaged T265773: Statistics on play rates for audio and video files as Medium priority.
Oct 19 2020, 4:22 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF triaged T265774: Statistics on image clicks from Wikipedia articles across time as Medium priority.
Oct 19 2020, 4:22 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF moved T265773: Statistics on play rates for audio and video files from Triage to Needs Investigation on the Product-Analytics board.
Oct 19 2020, 4:21 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF moved T265774: Statistics on image clicks from Wikipedia articles across time from Triage to Needs Investigation on the Product-Analytics board.
Oct 19 2020, 4:20 PM · Structured-Data-Backlog, Product-Analytics

Oct 16 2020

nettrom_WMF added a comment to T265774: Statistics on image clicks from Wikipedia articles across time.

This statistic was mentioned in the Technology Department's Quarter in Review for Q4 of FY 19/20. Looking further, I found out that it comes from the Understanding Engagement with Images in Wikipedia research project. More detailed statistics can be found on the First Round of Analysis page, which I'll dig into further. Looks like T250154 is the parent task for this work.

Oct 16 2020, 10:37 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF added a project to T261343: Dashboard of multimedia usage on the Wikipedias: Epic.

Created subtasks for all five points, changing this to an epic and moving it to the Epics column on the Product Analytics board.

Oct 16 2020, 10:28 PM · Epic, Structured-Data-Backlog, Product-Analytics
nettrom_WMF updated the task description for T261343: Dashboard of multimedia usage on the Wikipedias.
Oct 16 2020, 10:27 PM · Epic, Structured-Data-Backlog, Product-Analytics
nettrom_WMF created T265774: Statistics on image clicks from Wikipedia articles across time.
Oct 16 2020, 10:24 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF added a comment to T265773: Statistics on play rates for audio and video files.

There's the MediaViewer schema, and there's data from it in the Data Lake. An investigation would be needed to understand what data is actually logged and whether that can answer this.

Oct 16 2020, 10:23 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF created T265773: Statistics on play rates for audio and video files.
Oct 16 2020, 10:21 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF updated subscribers of T265772: Statistics on dwell time and multimedia interaction with Wikipedia articles.

As far as I know, there is not any live instrumentation that would allow us to measure this. The SearchSatisfaction schema measures dwell time, but requires the user to reach a page through an on-wiki search, and we know that's not representative of how visitors reach us.

Oct 16 2020, 10:20 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF created T265772: Statistics on dwell time and multimedia interaction with Wikipedia articles.
Oct 16 2020, 10:12 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF added a comment to T265771: Measure how multimedia content is added to Wikipedia articles.

Based on my conversations with @cchen and @mpopov it looks like this will not be straightforward to do any time soon. If we're interested in understanding this based on existing edits we'll need to extract and process diffs between revisions.

Oct 16 2020, 10:09 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF created T265771: Measure how multimedia content is added to Wikipedia articles.
Oct 16 2020, 10:05 PM · Structured-Data-Backlog, Product-Analytics
nettrom_WMF updated subscribers of T265768: Statistics on media usage across Wikipedias.

I've previously discussed something similar with @jwang in relation to T247417. We can do this on a monthly basis by using the sqooped tables in wmf_raw in the Data Lake. We'll left join mediawiki_imagelinks twice, first with the mediawiki_page table to identify local files, second with mediawiki_page table to identify files used from Commons. If a file isn't found in either of those it should be redlink, and we can mark it as such.

Oct 16 2020, 10:03 PM · Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF created T265768: Statistics on media usage across Wikipedias.
Oct 16 2020, 9:57 PM · Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added a comment to T265101: Instrument event logging for VE's image search.

I agree with @MNeisler that using the VisualEditorFeatureUse schema makes sense since we're asking questions about user behaviour around features in VE specifically.

Oct 16 2020, 9:02 PM · Editing-team (FY2020-21 Kanban Board), Better Use Of Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Infrastructure-Data, Structured-Data-Backlog (Current Work), Editing-Team-Request, VisualEditor
nettrom_WMF added a parent task for T259308: Measure usage of image search in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Oct 16 2020, 8:32 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added a parent task for T260254: Measure usage of Media Search integration in Visual Editor: T265761: Update Media Search measurement specification with Visual Editor measurements.
Oct 16 2020, 8:32 PM · Product-Analytics, Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-ReleaseCandidate)
nettrom_WMF added subtasks for T265761: Update Media Search measurement specification with Visual Editor measurements: T260254: Measure usage of Media Search integration in Visual Editor, T259308: Measure usage of image search in Visual Editor.
Oct 16 2020, 8:32 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF created T265761: Update Media Search measurement specification with Visual Editor measurements.
Oct 16 2020, 8:31 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added a comment to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one.

Also, I think storing previous and current state of the filters is a great way to do it! Perhaps particularly if we switch to a map type for storing additional action parameters/values. The only other alternative I was going to suggest was having a combination of value and is_default fields (similar to how PrefUpdate does it), where is_default is true if the value is set back to whatever the default is, and false otherwise. Looking at it again, I think storing the previous and current state is a better option.

Oct 16 2020, 6:44 PM · Product-Infrastructure-Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering
nettrom_WMF added a comment to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one.

@egardner : Thanks for the updates and work so far. Thanks also for your patience while I work on getting feedback to you on this, I met with @mpopov last week and discussed a lot of things around this schema and should've relayed information to you sooner, sorry!

Oct 16 2020, 6:35 PM · Product-Infrastructure-Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering

Oct 13 2020

nettrom_WMF added a comment to T252391: Reimage one memcached shard per DC to Buster.

Hmm, I spoke too soon. We rely on the wgWMEUnderstandingFirstDay being set in order to oversample in Schema:EditAttemptStep (in WikimediEvents's shouldSchemaEditAttemptStepOversample()), so we need to detangle the configuration value from that method before we can switch off EditorJourney logging. It shouldn't be that complicated -- I think instead of checking to see if wgWMEUnderstandingFirstDay is true, we instead want to see if GrowthExperiments extension is enabled, because we want to oversample edit attempts for all GrowthExperiments users regardless of whether they are opted-in to the Homepage experiment. @nettrom_WMF does that sound right to you?

Oct 13 2020, 4:51 PM · Growth-Team, User-jijiki, User-Elukey, Patch-For-Review, Operations, serviceops

Oct 9 2020

nettrom_WMF added a comment to T250049: Drop data from Prefupdate schema that is older than 90 days.

@Milimetric : It looks like there's no data in event_sanitized.prefupdate for 2020-09-19 through 2020-09-21, and it looks like there's partial data on 2020-09-22. Would it be possible to re-sanitize that date range, or will we need to wait for the re-sanitization script to stop by?

Oct 9 2020, 10:39 PM · Analytics-Kanban, audits-data-retention, Analytics, Product-Analytics, Privacy Engineering, Privacy, Security
nettrom_WMF moved T255517: Newcomer tasks: reporting notebook after schema changes from Next 2 weeks to Doing on the Product-Analytics (Kanban) board.
Oct 9 2020, 8:14 PM · Product-Analytics (Kanban), NewcomerTasks 1.2, Growth-Team (Current Sprint)
nettrom_WMF added a comment to T216668: Welcome survey: investigate Vietnamese abandonment rate.

BTW, I came back to this because of T252391, and noticed that when looking at the two-year registration rate on Vietnamese[1] it looks like the time period where we ran our Welcome Survey A/B test had substantially higher registration rates than expected. If we decide to run another experiment, we should consider fitting a time-series model to the data and use it to predict number of registrations in order to understand if registrations are outside what's expected.

Oct 9 2020, 6:52 PM · Product-Analytics, Growth-Team, CommRel-Specialists-Support (Jan-Mar-2019)
nettrom_WMF added a comment to T252391: Reimage one memcached shard per DC to Buster.

@kostajh : Thanks for picking this up and pinging me about it. I think we should switch off EditorJourney since we're not actively using the data in any ongoing analysis.

Oct 9 2020, 6:47 PM · Growth-Team, User-jijiki, User-Elukey, Patch-For-Review, Operations, serviceops
nettrom_WMF added a comment to T250049: Drop data from Prefupdate schema that is older than 90 days.

@Milimetric : Not a problem, definitely understand that this would be a non-standard request! I've reached out to the PA team and will report back, probably some time on Tuesday.

Oct 9 2020, 5:52 PM · Analytics-Kanban, audits-data-retention, Analytics, Product-Analytics, Privacy Engineering, Privacy, Security
nettrom_WMF added a comment to T250049: Drop data from Prefupdate schema that is older than 90 days.

@Milimetric : I inspected the sanitized data by looking at the event structs of random partitions and aggregating some random months across various years from 2017 onwards, and in all cases the sanitized data looks correct to me.

Oct 9 2020, 5:16 PM · Analytics-Kanban, audits-data-retention, Analytics, Product-Analytics, Privacy Engineering, Privacy, Security

Oct 7 2020

nettrom_WMF moved T262421: [Morten] Review "Schema Migration Audit" document from Doing to Needs Sign-off on the Product-Analytics (Kanban) board.

@mpopov : Thanks for your patience while I work on juggling tasks and finding time to come back to this. I've discussed the schemas with the SD team and we found that the MultimediaViewer and UploadWizard schemas could be marked for deprecation. As I didn't have edit permission of the googledoc, I left a couple of comments to that effect. I think this concludes everything, handing it to you for sign-off!

Oct 7 2020, 7:02 PM · Product-Analytics (Kanban)

Oct 6 2020

nettrom_WMF added a comment to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one.

If there is a better/standard way to capture some of these things I'm happy to re-work the schema (but specific guidance would be helpful).

Oct 6 2020, 10:08 PM · Product-Infrastructure-Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering
nettrom_WMF moved T230174: Newcomer tasks: experiment analysis from Needs Review to Needs Sign-off on the Product-Analytics (Kanban) board.
Oct 6 2020, 5:07 PM · Product-Analytics (Kanban), Growth-Team (Current Sprint), NewcomerTasks 1.0 , GrowthExperiments-Homepage
nettrom_WMF moved T259308: Measure usage of image search in Visual Editor from Needs Review to Blocked on the Product-Analytics (Kanban) board.
Oct 6 2020, 5:07 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF moved T261759: Analyze Media Search A/B test from Needs Review to Needs Sign-off on the Product-Analytics (Kanban) board.
Oct 6 2020, 5:06 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)

Oct 5 2020

nettrom_WMF moved T259308: Measure usage of image search in Visual Editor from Next 2 weeks to Needs Review on the Product-Analytics (Kanban) board.

I've dug into this a bit to get an understanding of what data is available through the VisualEditorFeatureUse schema. I also met with @MNeisler on the Product Analytics team to get a check on whether my understanding of the data was correct, and it appears to be.

Oct 5 2020, 9:26 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF edited projects for T259308: Measure usage of image search in Visual Editor, added: Product-Analytics (Kanban); removed Product-Analytics.
Oct 5 2020, 8:53 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Product-Analytics (Kanban), Structured-Data-Backlog
nettrom_WMF added a comment to T262271: Activate mediasearch profile without requiring an explicit flag.

However I think we need to make sure VE is properly instrumented and get some baselines from T259308 before we make the switch. @nettrom_WMF, do you have any sense of a timeline on that task?

Oct 5 2020, 6:07 PM · SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Patch-For-Review, Structured-Data-Backlog (Current Work)

Oct 1 2020

nettrom_WMF added a comment to T255028: Move the stat1004-6-7 hosts to Debian Buster.

With these new upgrades happening, I wanted to move my Jupyter notebooks from stat1008 to stat1006 as stat1008 has been very busy lately. After rsync'ing my files, I started reinstalling my R libraries and had them error out because one of them wasn't available for R v3.3. That surprised me, because Debian Buster ships with R v3.5 (as can be found on stat1005 and stat1008).

Oct 1 2020, 9:29 PM · Analytics-Kanban, Analytics-Clusters
nettrom_WMF added a comment to T263875: Develop a new schema for MediaSearch analytics or adapt an existing one.

This is awesome work so far! I've read through this task, its parent task, and the proposed patch and updated the measurement specification to reflect the set of questions mentioned by @CBogen in T263875#6495409. From what I can tell, the proposed schema allows us to answer our current set of questions.

Oct 1 2020, 4:18 PM · Product-Infrastructure-Data, SDAW-MediaSearch (MediaSearch-ReleaseCandidate), Analytics-Radar, Patch-For-Review, Product-Analytics, Structured-Data-Backlog (Current Work), Structured Data Engineering

Sep 29 2020

nettrom_WMF added a comment to T262421: [Morten] Review "Schema Migration Audit" document.

@mpopov : Ah, feel free to reopen this if you want me to ping the SD team and have them come back to me with a list of schemas.

Sep 29 2020, 9:54 PM · Product-Analytics (Kanban)
nettrom_WMF moved T261759: Analyze Media Search A/B test from Doing to Needs Review on the Product-Analytics (Kanban) board.

A huge thanks to @mpopov for doing a lot of work on this, improving the data processing code and figuring out ways massage the data from SearchSatisfaction to pull out the insights!

Sep 29 2020, 9:32 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)
nettrom_WMF closed T262421: [Morten] Review "Schema Migration Audit" document, a subtask of T261794: [REQUEST] Event Schema Audit Review, as Resolved.
Sep 29 2020, 8:03 PM · Product-Analytics (Kanban), Better Use Of Data, Product-Infrastructure-Data
nettrom_WMF closed T262421: [Morten] Review "Schema Migration Audit" document as Resolved.

I've gone through the spreadsheet and added information for all known Growth-related schemas. Looks like the Multimedia team already went through and marked theirs as well. Don't think this needs any peer review, so closing it as resolved.

Sep 29 2020, 8:03 PM · Product-Analytics (Kanban)

Sep 24 2020

nettrom_WMF moved T261759: Analyze Media Search A/B test from Needs Review to Doing on the Product-Analytics (Kanban) board.

We're unsure if the finding is trustworthy. I'm moving this back to "Doing" to dig further into this.

Sep 24 2020, 4:11 PM · Product-Analytics (Kanban), Structured-Data-Backlog, SDAW-MediaSearch (MediaSearch-Alpha)