Neil_P._Quinn_WMF
Product analyst, WMF Editing

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Apr 16 2015, 4:17 PM (191 w, 15 h)
Availability
Available
IRC Nick
neilpquinn
LDAP User
Unknown
MediaWiki User
Neil P. Quinn-WMF [ Global Accounts ]

Recent Activity

Today

Neil_P._Quinn_WMF claimed T211274: Write Hive UDF that does date arithmetic on full datetimes.

I need to provide a full spec so @mpopov can write it.

Fri, Dec 14, 1:14 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF created T211951: Prepare wmfdata for general use.
Fri, Dec 14, 1:13 AM · Contributors-Analysis, Product-Analytics
Ryasmeen awarded T211949: Show Rummana how to query EventLogging databases using Hive a Orange Medal token.
Fri, Dec 14, 12:41 AM · VisualEditor (Current work), Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF closed T193463: Investigate interest in wmfdata package among SWAP users as Resolved.

It's become clear that there's interest among the Product Analytics team at least, and it doesn't seem worth looking for other users at this point while we're still ironing out the kinks.

Fri, Dec 14, 12:30 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF closed T202594: wmfdata package can be installed but not imported as Resolved.

I sat down with @Tbayer and it looks like it's now working for him!

Fri, Dec 14, 12:28 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF created T211949: Show Rummana how to query EventLogging databases using Hive.
Fri, Dec 14, 12:26 AM · VisualEditor (Current work), Contributors-Analysis, Product-Analytics

Yesterday

Neil_P._Quinn_WMF moved T209771: Calculate Global South new editor retention within the limits of geolocation data purging from Next Up to Blocked on the Product-Analytics board.
Thu, Dec 13, 7:24 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a comment to T186124: Wikistats. How about historic data?.

Don't we have historic data already? Or is this specific to a particular metric, like pageviews, which only seems to go back to July 2015?

Thu, Dec 13, 6:52 PM · Analytics-Wikistats, Analytics
Neil_P._Quinn_WMF renamed T208504: Review CE Insights survey plan and prioritize our survey questions from Review CE Insights survey plan to Review CE Insights survey plan and prioritize our survey questions.
Thu, Dec 13, 5:51 PM · CE Insights - Survey Design, Product-Design-Strategy, Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T204180: Prepare contributors section of Audiences quarterly metrics review from Next Up to Doing on the Product-Analytics board.
Thu, Dec 13, 5:47 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T211263: Pair with Neil on the contributor health metrics for December 2018 from Next Up to Doing on the Product-Analytics board.
Thu, Dec 13, 5:47 PM · Product-Analytics
Neil_P._Quinn_WMF closed T210556: Clarification on mobile editing stats as Resolved.
Thu, Dec 13, 2:24 AM · VisualEditor (Current work), Product-Analytics

Wed, Dec 12

Neil_P._Quinn_WMF moved T210040: Calculate and report contributors health metrics for November 2018 from Next Up to Doing on the Product-Analytics board.
Wed, Dec 12, 11:53 PM · Contributors-Analysis, Product-Analytics

Tue, Dec 11

Neil_P._Quinn_WMF added a comment to T172410: Phase out and replace analytics-store (multisource).

To follow up what I wrote (after a chat with the data persistence team):

  • the proposal in T210478#4794536 would move sX sections (so the database groupings listed in s1.dblist, s2.dblist etc..) to their own mysql instance on an assigned dbstore node. For example, all wikis in S5 will be available (i.e. replicated) to a mysql instance on dbstore1003 (with an assigned port that we don't know yet). So joins between schemas belonging to different sX sections will not be possible anymore (we already knew this).
  • the staging database will likely be assigned to a separate mysql instance, so people will be able to keep using its data. It will still be possible to create tables etc.., but importing data from various wiki databases will need some extra work (dump the data, import it, etc..).

    Would the above points be ok for everybody? Any special need or use case that is not taken into consideration?
Tue, Dec 11, 2:01 AM · Analytics, WMDE-Analytics-Engineering, User-Addshore, User-Elukey, Research
Neil_P._Quinn_WMF added a comment to T208332: Add EditAttemptStep properties to the schema whitelist.

This data is now flowing into Hive:

select
    to_date(dt) as date,
    count(*) as events
from editattemptstep
where year = 2018 and month=12
group by to_date(dt)
Tue, Dec 11, 1:09 AM · Analytics, Patch-For-Review, Growth-Team, Product-Analytics

Sat, Dec 8

Neil_P._Quinn_WMF awarded T211263: Pair with Neil on the contributor health metrics for December 2018 a Party Time token.
Sat, Dec 8, 6:22 PM · Product-Analytics
Neil_P._Quinn_WMF added a comment to P5608 Update production known hosts.
In P5608#46554, @Volans wrote:

@Volans should this really be world-editable? 🤔

Of course not, I've fixed it. Thanks for spotting this, I thought I've made it just readable by anyone.

Sat, Dec 8, 6:21 PM

Fri, Dec 7

Neil_P._Quinn_WMF added a comment to P5608 Update production known hosts.

@Volans should this really be world-editable? 🤔

Fri, Dec 7, 6:27 PM

Thu, Dec 6

Neil_P._Quinn_WMF created T211274: Write Hive UDF that does date arithmetic on full datetimes.
Thu, Dec 6, 4:54 AM · Contributors-Analysis, Product-Analytics

Tue, Dec 4

Neil_P._Quinn_WMF updated the task description for T209955: [EPIC: Focus] Isolate Section Editing .
Tue, Dec 4, 8:59 PM · VisualEditor

Sat, Dec 1

Neil_P._Quinn_WMF added a comment to T207803: Update EventLogging code to facilitate move to EditAttemptStep schema.

However, you're right that event_init_timing is always null, and it wasn't like that in the old schema.

Sat, Dec 1, 12:01 AM · Growth-Team (Current Sprint), MW-1.33-notes (1.33.0-wmf.2; 2018-10-30)

Fri, Nov 30

Neil_P._Quinn_WMF added a comment to T118063: Clean up the EditAttemptStep schema and its implementations.

Oh, I see, it's because his username was in the task description.

Fri, Nov 30, 11:55 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF updated the task description for T118063: Clean up the EditAttemptStep schema and its implementations.
Fri, Nov 30, 11:54 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF updated subscribers of T118063: Clean up the EditAttemptStep schema and its implementations.

Damn it!

Fri, Nov 30, 11:54 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF updated the task description for T118063: Clean up the EditAttemptStep schema and its implementations.
Fri, Nov 30, 11:53 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF updated subscribers of T118063: Clean up the EditAttemptStep schema and its implementations.

Sorry @phuedx, I don't know why you keep getting resubscribed.

Fri, Nov 30, 11:52 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF renamed T118063: Clean up the EditAttemptStep schema and its implementations from Reconsider the schema of the Edit event log to Clean up the EditAttemptStep schema and its implementations.
Fri, Nov 30, 11:51 PM · Epic, Product-Analytics, VisualEditor, VisualEditor-MediaWiki, Contributors-Analysis
Neil_P._Quinn_WMF moved T202594: wmfdata package can be installed but not imported from Backlog to Doing on the Product-Analytics board.
Fri, Nov 30, 9:50 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a comment to T202594: wmfdata package can be installed but not imported.

Testing again after Neil's update:

It now detects the outdated matplotlib and appears to try resolve it using "kiwisolver", but unsuccessfully, resulting in the same error message for the import:

Fri, Nov 30, 9:22 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T209771: Calculate Global South new editor retention within the limits of geolocation data purging from Triage to Next Up on the Product-Analytics board.

@kzimmerman also pulling into Next Up as a blocker for November's movement metrics.

Fri, Nov 30, 7:26 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a subtask for T210040: Calculate and report contributors health metrics for November 2018: T209771: Calculate Global South new editor retention within the limits of geolocation data purging.
Fri, Nov 30, 7:25 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a parent task for T209771: Calculate Global South new editor retention within the limits of geolocation data purging: T210040: Calculate and report contributors health metrics for November 2018.
Fri, Nov 30, 7:25 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF renamed T209771: Calculate Global South new editor retention within the limits of geolocation data purging from Make calcuation of Global South new editor retention tolerant of the geolocation data purging to Calculate Global South new editor retention within the limits of geolocation data purging.
Fri, Nov 30, 7:25 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated subscribers of T210807: Update editor_month generation to use change_tag table rather than tag_summary.

@kzimmerman I've pulled this into Next Up since we need to do this before calculating November's movement metrics (T210040).

Fri, Nov 30, 7:12 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a subtask for T210040: Calculate and report contributors health metrics for November 2018: T210807: Update editor_month generation to use change_tag table rather than tag_summary.
Fri, Nov 30, 7:11 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a parent task for T210807: Update editor_month generation to use change_tag table rather than tag_summary: T210040: Calculate and report contributors health metrics for November 2018.
Fri, Nov 30, 7:11 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T210807: Update editor_month generation to use change_tag table rather than tag_summary from Triage to Next Up on the Product-Analytics board.
Fri, Nov 30, 7:10 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T210556: Clarification on mobile editing stats from Triage to Doing on the Product-Analytics board.
Fri, Nov 30, 7:10 PM · VisualEditor (Current work), Product-Analytics
Neil_P._Quinn_WMF claimed T210556: Clarification on mobile editing stats.
Fri, Nov 30, 7:10 PM · VisualEditor (Current work), Product-Analytics
Neil_P._Quinn_WMF claimed T210807: Update editor_month generation to use change_tag table rather than tag_summary.
Fri, Nov 30, 7:10 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated the task description for T210807: Update editor_month generation to use change_tag table rather than tag_summary.
Fri, Nov 30, 7:09 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF closed T210042: Clean up contributors health metrics code as Resolved.
Fri, Nov 30, 6:54 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated the task description for T210042: Clean up contributors health metrics code.
Fri, Nov 30, 6:53 PM · Contributors-Analysis, Product-Analytics

Thu, Nov 29

Neil_P._Quinn_WMF created T210807: Update editor_month generation to use change_tag table rather than tag_summary.
Thu, Nov 29, 11:06 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF awarded T185355: Normalize change tag schema a Love token.
Thu, Nov 29, 9:56 PM · MW-1.33-notes (1.33.0-wmf.6; 2018-11-27), Wikidata, Wikidata-Campsite (Wikidata-Campsite-Iteration-∞), User-Ladsgroup, MW-1.32-notes (WMF-deploy-2018-05-15 (1.32.0-wmf.4)), Wikidata-Ministry-Of-Magic-Tech-Debt, Wikidata-Ministry-Of-Magic, TechCom-RFC (TechCom-Approved), MediaWiki-Database, MediaWiki-Change-tagging

Tue, Nov 27

Neil_P._Quinn_WMF closed T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report as Resolved.

@JCuriel, this is done. The results are in the table below; let me know if you notice any issues.

Tue, Nov 27, 2:54 AM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF added a comment to T206883: mediawiki_history datasets have null user_text for IP edits.

I hear your point and it makes a lot of sense. I think our views differ in the notion of current name. In my world a current name is associated to events only when we're sure those events have been made by the same person/account. In my world the event_user_text field references single users. I however understand that taken from a purely name-changing perspective, an IP has the same value before and now. My concern lay in misrepresentations of IP changes in time: If you use an IP as current-name for an edit, you might be tempted to consider other edits made by that IPs as belonging to the same user - Which is false (but true in the case of non-anonymous edits).

Tue, Nov 27, 2:51 AM · Product-Analytics, Analytics-Data-Quality, Analytics

Thu, Nov 22

Neil_P._Quinn_WMF triaged T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report as Normal priority.
Thu, Nov 22, 1:27 AM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF moved T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report from Next Up to Doing on the Product-Analytics board.
Thu, Nov 22, 1:26 AM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF updated the task description for T210042: Clean up contributors health metrics code.
Thu, Nov 22, 12:46 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF removed Due Date on T209740: Investigate the impact of Wiki Education Foundation program participants on editing health metrics.
Thu, Nov 22, 12:45 AM · Contributors-Analysis, Product-Analytics

Wed, Nov 21

Neil_P._Quinn_WMF moved T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report from Triage to Next Up on the Product-Analytics board.

@kzimmerman this is the task that I used as an example in our meeting today. As discussed, it's quite small and has clear value, so I'm auto-accepting it :)

Wed, Nov 21, 8:49 PM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF added a project to T209705: Dashboard EditSchema timing metrics: Product-Analytics.
Wed, Nov 21, 4:07 PM · Product-Analytics, VisualEditor (Current work)
Neil_P._Quinn_WMF moved T210042: Clean up contributors health metrics code from Triage to Doing on the Product-Analytics board.
Wed, Nov 21, 2:39 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF created T210042: Clean up contributors health metrics code.
Wed, Nov 21, 2:39 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated the task description for T199574: Improve formatting and data input for board health metrics.
Wed, Nov 21, 2:34 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated subscribers of T210040: Calculate and report contributors health metrics for November 2018.

@nettrom_WMF, should we plan to pair on this?

Wed, Nov 21, 2:28 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T210040: Calculate and report contributors health metrics for November 2018 from Triage to Next Up on the Product-Analytics board.
Wed, Nov 21, 2:25 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF closed T206895: Calculate and report contributors health metrics for October 2018 as Resolved.
Wed, Nov 21, 2:25 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF triaged T210040: Calculate and report contributors health metrics for November 2018 as High priority.
Wed, Nov 21, 2:24 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated the task description for T206895: Calculate and report contributors health metrics for October 2018.
Wed, Nov 21, 2:22 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF claimed T209995: Investigate why init->loaded drop-off is so high in desktop wikitext metrics.
Wed, Nov 21, 12:17 AM · Product-Analytics, VisualEditor
Neil_P._Quinn_WMF added a project to T209995: Investigate why init->loaded drop-off is so high in desktop wikitext metrics: Product-Analytics.
Wed, Nov 21, 12:16 AM · Product-Analytics, VisualEditor

Mon, Nov 19

Neil_P._Quinn_WMF added a comment to T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report.

The last time I did these calculations, I wrapped them up in a notebook, so they'll be very easy to rerun.

Mon, Nov 19, 11:13 PM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF updated the task description for T209889: Calculate monthly mobile Wikipedia edits for Major Gifts report.
Mon, Nov 19, 11:03 PM · Product-Analytics, Contributors-Analysis
Neil_P._Quinn_WMF updated the task description for T193296: Consolidate data access documentation.
Mon, Nov 19, 10:59 PM · Wikimania-Hackathon-2018, Documentation, Product-Analytics
Neil_P._Quinn_WMF closed T202594: wmfdata package can be installed but not imported as Resolved.

I fixed this in this commit!

Mon, Nov 19, 10:54 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF updated subscribers of T209544: Vet the EditAttemptStep and VisualEditorFeatureUse data streams.

@nettrom_WMF mentioned two current known issues with the data:

  • client side mobile events were not submitted until last week's train (the one ending on 15 November)
  • there's a continuing issue with mobile init events not being submitted (fix planned for next week).
Mon, Nov 19, 9:15 PM · Contributors-Analysis, Product-Analytics

Sat, Nov 17

Neil_P._Quinn_WMF created T209771: Calculate Global South new editor retention within the limits of geolocation data purging.
Sat, Nov 17, 7:33 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF removed Due Date on T204180: Prepare contributors section of Audiences quarterly metrics review.
Sat, Nov 17, 5:48 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T209740: Investigate the impact of Wiki Education Foundation program participants on editing health metrics from Triage to Doing on the Product-Analytics board.
Sat, Nov 17, 5:44 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF created T209740: Investigate the impact of Wiki Education Foundation program participants on editing health metrics.
Sat, Nov 17, 12:03 AM · Contributors-Analysis, Product-Analytics

Fri, Nov 16

Neil_P._Quinn_WMF added a comment to T203498: Upgrade Hive to ≥ 2.0.

Now I am wondering how much stability we'd sacrifice if we upgrade to Hive 2.11 like described above. It would be extremely useful to some people, but I fear that it might cause major headaches to all of us in the long term. Writing this note just to say that we didn't forget about this task, but only that it might not as easy as we thought.

Fri, Nov 16, 7:23 PM · Analytics-Cluster, Analytics
Neil_P._Quinn_WMF added a comment to T209536: Hive query fails with local join.

Is this related to T206279 ?

I don't think the root-cause is the same (one problem happens on HiveServer2, the other on Hive CLI). The solution however is the same, as the issues are happening on the same portion of process:
Issues happen when a MapRedLocalTask is executed. Those tasks are an optimization over joins when data size is small enough. When hive.auto.convert.join=true hive automatically converts joins to map-join when it thinks it's good.
By setting hive.auto.convert.join=false by default, we'll lose some small optimization (because data is small).

Fri, Nov 16, 7:17 PM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics
Neil_P._Quinn_WMF added a comment to T206883: mediawiki_history datasets have null user_text for IP edits.

Hi @Neil_P._Quinn_WMF ,
While I understand the usage frustration, keeping the IPs in event_user_text_historical is for me a matter of data correctness.
Doing it represents nothing in term of code change, but I'd rather not do it to keep the semantics of event_user_text and event_user_text_historical valid and similar for anonymous and non-anonymous edits.

Fri, Nov 16, 7:13 PM · Product-Analytics, Analytics-Data-Quality, Analytics
Neil_P._Quinn_WMF moved T209536: Hive query fails with local join from Triage to Tracking on the Product-Analytics board.
Fri, Nov 16, 2:59 AM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics
Neil_P._Quinn_WMF added a project to T209536: Hive query fails with local join: Product-Analytics.
Fri, Nov 16, 2:59 AM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics
Neil_P._Quinn_WMF added a comment to T209536: Hive query fails with local join.

Thanks for working on this, @JAllemandou! I've just started suffering this on a lot of queries, including ones that previously worked fine. The error message is "OperationalError: Error while processing statement: FAILED: Execution Error, return code 134 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask".

Fri, Nov 16, 2:54 AM · Patch-For-Review, Product-Analytics, Analytics-Kanban, Analytics
Neil_P._Quinn_WMF reopened T206883: mediawiki_history datasets have null user_text for IP edits as "Open".

Sorry to weigh in so late, but why can't we simply copy the event_user_text_historical to event_user_text for IP editors at the end of the reconstruction process? It doesn't seem like it would be hard to implement, and it's annoying and hard to learn that the way to get the canonical name for any user is not event_user_text like you'd expect, but rather coalesce(event_user_text, event_user_text_historical).

Fri, Nov 16, 1:51 AM · Product-Analytics, Analytics-Data-Quality, Analytics

Thu, Nov 15

Neil_P._Quinn_WMF added a comment to T209503: [EventLogging Sanitization] Enable older-than-90-day purging of unsanitized EL database (event) in Hive.

@Tbayer

Just to double-check: The information in the documentation that "Sanitization happens right after events are generated (with a couple hours lag)" is still current, right? In that case I don't think this will be a concern (although we will need to update some queries - CCing @Groceryheist regarding ReadingDepth).

Yes, it still happens this way. We are considering to shift if back, but will let you know if we do.

Thu, Nov 15, 6:38 PM · Analytics-EventLogging, Analytics-Kanban

Wed, Nov 14

Neil_P._Quinn_WMF created T209544: Vet the EditAttemptStep and VisualEditorFeatureUse data streams.
Wed, Nov 14, 10:22 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF closed T204171: Provide edit samples and user lists for mobile contribution design research as Resolved.

I've now uploaded the survey lists to Qualtrics, so we can call this done.

Wed, Nov 14, 9:50 PM · Design-Research, Product-Design-Strategy, Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a project to T190453: Support R Kernels by default for all users.: Analytics-SWAP.
Wed, Nov 14, 9:48 PM · Analytics-SWAP, Analytics
Neil_P._Quinn_WMF added a project to T139487: Get 'sparklyr' working on stats1005: Analytics-SWAP.
Wed, Nov 14, 9:48 PM · Analytics-SWAP, Product-Analytics, Analytics, Discovery-Analysis
Neil_P._Quinn_WMF added a project to T190769: Notebook machine to double as RStudio Server?: Analytics-SWAP.
Wed, Nov 14, 9:48 PM · Analytics-SWAP, Analytics
Neil_P._Quinn_WMF added a project to T198764: Users should be able to read their jupyter instance logs: Analytics-SWAP.
Wed, Nov 14, 9:48 PM · Analytics-SWAP, Analytics
Neil_P._Quinn_WMF added a project to T190443: Spark Jupyter Notebook integration: Analytics-SWAP.
Wed, Nov 14, 9:48 PM · Analytics-SWAP, Analytics-Kanban, Patch-For-Review, Analytics
Neil_P._Quinn_WMF moved T206895: Calculate and report contributors health metrics for October 2018 from Blocked to Doing on the Product-Analytics board.
Wed, Nov 14, 9:42 PM · Contributors-Analysis, Product-Analytics

Nov 13 2018

Restricted Application changed the subtype of T199286: Create a repository for useful code snippets and definitions from "Deadline" to "Task".

The primary desire here was a gated home for defining things like Global North/Global South, and that exists now that I've created the wikimedia-reasearch/canonical-data repo.

Nov 13 2018, 6:39 PM · Product-Analytics
Neil_P._Quinn_WMF closed T206898: Make plan for counting Global South edits and editors as Resolved.

Now that the data is easily available in Hive, I think this is done!

Nov 13 2018, 6:29 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a comment to T206898: Make plan for counting Global South edits and editors .

What approach do you recommend for joining a CSV in Hive? If it involves creating a temporary Hive table, wouldn't it make sense to instead upload the CSV as a non-temporary table already? (similar to e.g. wmf.domain_abbrev_map or tbayer.country_name_vs_code)

Nov 13 2018, 6:23 PM · Contributors-Analysis, Product-Analytics

Nov 10 2018

Neil_P._Quinn_WMF reopened T206898: Make plan for counting Global South edits and editors as "Open".

Actually, I'll go ahead and upload the CSV file into the Data Lake so we can join it to the other tables. That actually may be just as good as writing a UDF—you don't have to add a jar and create a function at the start of every query, at least.

Nov 10 2018, 11:03 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a comment to T184576: Make an Analytics Data Lake table to provide meta info about wikis .

The most challenging part of this is coming up with human-readable project names, and I've actually already done that as part of the wiki segmentation work. I just started work wrapping that up in a slightly more general form so it can go in the canonical-data repo, although it's not high priority so I don't know when I'll finish.

Nov 10 2018, 11:00 PM · Product-Analytics, Analytics, Contributors-Analysis
Neil_P._Quinn_WMF closed T206898: Make plan for counting Global South edits and editors as Resolved.

@Tbayer, I've created a CSV file with country names, ISO codes, Global North/South classification, and MaxMind continents, tracked in a new wikimedia-research/canonical-data repo. It contains all the countries which appear in projectview_hourly, and I've carefully checked it to make sure the Global North/South classifications match the ones at meta:List of countries by regional classification.

Nov 10 2018, 8:01 PM · Contributors-Analysis, Product-Analytics

Nov 9 2018

Neil_P._Quinn_WMF added a comment to T206898: Make plan for counting Global South edits and editors .

@Tbayer,

The list of Global North countries I've been using is:

(
    "AD", "AL", "AT", "AX", "BA", "BE", "BG", "CH", "CY", "CZ",
    "DE", "DK", "EE", "ES", "FI", "FO", "FR", "FX", "GB", "GG",
    "GI", "GL", "GR", "HR", "HU", "IE", "IL", "IM", "IS", "IT",
    "JE", "LI", "LU", "LV", "MC", "MD", "ME", "MK", "MT", "NL",
    "NO", "PL", "PT", "RO", "RS", "RU", "SE", "SI", "SJ", "SK",
    "SM", "TR", "VA", "AU", "CA", "HK", "MO", "NZ", "JP", "SG",
    "KR", "TW", "US"
)

To check whether this was accurate, I pulled all the countries present in a month of pageviews data that didn't match this list or "--". That's in this spreadsheet and it looks good to me.

Nov 9 2018, 2:55 PM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF added a comment to T206898: Make plan for counting Global South edits and editors .

The list of Global North countries I've been using is:

(
    "AD", "AL", "AT", "AX", "BA", "BE", "BG", "CH", "CY", "CZ",
    "DE", "DK", "EE", "ES", "FI", "FO", "FR", "FX", "GB", "GG",
    "GI", "GL", "GR", "HR", "HU", "IE", "IL", "IM", "IS", "IT",
    "JE", "LI", "LU", "LV", "MC", "MD", "ME", "MK", "MT", "NL",
    "NO", "PL", "PT", "RO", "RS", "RU", "SE", "SI", "SJ", "SK",
    "SM", "TR", "VA", "AU", "CA", "HK", "MO", "NZ", "JP", "SG",
    "KR", "TW", "US"
)
Nov 9 2018, 2:42 PM · Contributors-Analysis, Product-Analytics

Nov 5 2018

Neil_P._Quinn_WMF added a comment to T206898: Make plan for counting Global South edits and editors .

+1. And to clarify, we haven't yet decided whether we should report all three of these regions separately in the board deck and in https://www.mediawiki.org/wiki/Wikimedia_Audiences (it would amount to adding two new sections, as only GS is listed currently). But in any case we resolved to remove the "unknown" country part from the existing GS metrics.

Nov 5 2018, 1:32 PM · Contributors-Analysis, Product-Analytics

Nov 3 2018

Neil_P._Quinn_WMF added a comment to T206898: Make plan for counting Global South edits and editors .

@Tbayer and I discussed this yesterday and came to the following conclusions:

  • Unknown countries account for roughly:
    • 0.3% of pageviews
    • 0.9% of editors (technically, of (wiki, country, editor) entities)
    • 27% of edits (my estimate in the description was too high)
  • We should investigate the small group of editors that are producing all these unknown-country edits
  • Starting with October board metrics (T206895, etc.), we will treat unknown countries as a third region alongside the Global North and the Global South. I will produce some shared code to help with this.
Nov 3 2018, 9:03 AM · Contributors-Analysis, Product-Analytics
Neil_P._Quinn_WMF moved T206898: Make plan for counting Global South edits and editors from Next Up to Doing on the Product-Analytics board.
Nov 3 2018, 9:01 AM · Contributors-Analysis, Product-Analytics

Nov 2 2018

Neil_P._Quinn_WMF updated the task description for T204171: Provide edit samples and user lists for mobile contribution design research .
Nov 2 2018, 4:23 PM · Design-Research, Product-Design-Strategy, Contributors-Analysis, Product-Analytics