Page MenuHomePhabricator

Isaac (Isaac Johnson)
User

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Friday

  • Clear sailing ahead.

User Details

User Since
Oct 1 2018, 2:19 PM (96 w, 2 d)
Availability
Available
IRC Nick
isaacj
LDAP User
Isaac Johnson
MediaWiki User
Isaac (WMF) [ Global Accounts ]

Recent Activity

Today

Isaac added a comment to T254289: Add wikidata to articletopic pipeline.

This is great -- what I'm seeing here @Dibyaaaaax is that the GBC model mostly performs very similarly to the fasttext model when given the same data, but its recall does suffer for low-data topics. We'll have to discuss whether this slightly higher performance in fasttext warrants the complexity of adding the new fasttext class permanently to revscoring and making sure that it would work in production. I'll mention some other things we discussed but jump in if you have more concrete data:

  • GBC models train in >2 hours whereas fasttext trains in ~2 minutes. Makes me wonder whether the HistGradientBoostingClassifier would provide the same performance as GBC (and be super easy to implement) but train much more quickly.
  • Even though you've got fastText setup for training, I'm not certain how it would look like in production if we decided the performance was worth it. It fine-tunes the word embeddings that it's provided so produces a second set of embeddings that are slightly different from the ones trained via mwtext. We maybe just dump those fine-tuned embeddings to a file and reproduce how fastText works with numpy like T242013#6155316.
Wed, Aug 5, 1:03 PM · drafttopic-modeling, Scoring-platform-team (Current), Research

Yesterday

Isaac added a comment to T255028: Move the stat1004-6-7 hosts to Debian Buster.

stat1004 reimaged during this week or the next

@elukey just a heads up that I'm running some long-running SWAP notebooks via stat1004 but it's okay to kill those processes as part of the reimaging if they're still going when you proceed. They're long running because they run a number of sequential pyspark queries and it's easy for me to pick up from where they left off if they get killed. No need to check with me in advance.

Tue, Aug 4, 2:23 PM · Analytics-Clusters
Isaac added a comment to T246912: Clean up History and Society.Society in the topic taxonomy. .

See https://github.com/wikimedia/wikitax/pull/6 for implementation of these changes

Tue, Aug 4, 1:51 PM · drafttopic-modeling, Scoring-platform-team

Fri, Jul 31

Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

This is "overall articles for all projects", correct?

It's actually just for English Wikipedia. The number from the WMDE dashboard for all Wikipedia projects is 31.99% (i.e. the inverse of the 68.01% number provided under "% of Articles that use Wikidata" in the tinier table that aggregates each project family). It varies a lot by wiki too -- vecwiki seems to have almost every article with some form of Wikidata transclusion whereas 62% of articles on Japanese Wikipedia don't have a single Wikidata-based template. This data was only recently added there (see T257962).

Fri, Jul 31, 8:26 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T258805: Experimental API for exploring topic models.

Update: looks likely that I'll be able to work with a contractor on the comprehensive comparison for the month of August, so I'm waiting to hear formally about that before proceeding.

Fri, Jul 31, 8:08 PM · Research (FY2020-21-Research-July-September)
Isaac added a comment to T257869: Identify approaches for defining article importance.

Weekly update: didn't meet this week

Fri, Jul 31, 8:07 PM · Research (FY2020-21-Research-July-September)
Isaac added a comment to T257870: Onboarding for Isaac around code / data for sockpuppet detection.

Weekly update: met with DD and was given an overview of the model choices and future directions. Will be receiving a pointer to code / documentation in the near future. For now, though, I have a decent understanding of the current state of the project, which will hopefully be enough to make interpretation of the code relatively straightforward.

Fri, Jul 31, 8:06 PM · Research (FY2020-21-Research-July-September)
Akuckartz awarded T249654: Categorize different types of Wikidata re-use within Wikimedia projects a Like token.
Fri, Jul 31, 7:58 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T247099: SQL definition for wikidata metrics for tunning session.

For reference, I followed up here: T249654#6352573

Fri, Jul 31, 7:01 PM · Analytics-Kanban, Product-Analytics (Kanban), Patch-For-Review, Analytics
Isaac updated subscribers of T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

@Nuria: following up on T247099#6346344 here as this seems a more relevant task. I provide high-level details below regarding the nature of Wikidata transclusion on English Wikipedia. Here is a more thorough description of how I came to my conclusion regarding the importance of different types of Wikidata transclusion that occurs. @Addshore @GoranSMilovanovic @Lydia_Pintscher FYI in case you're interested as I know you're well aware of the limits of wbc_entity_usage for measuring Wikidata transclusion in articles. I'm very open to feedback so let me know if you see any mistaken assumptions etc.

Fri, Jul 31, 6:39 PM · Wikidata, Research (FY2020-21-Research-July-September)

Wed, Jul 29

Isaac added a comment to T247099: SQL definition for wikidata metrics for tunning session.

@Nuria thanks for the ping -- I finally have been making progress on this and am hoping to have some early statistics in about a week. FYI right now this has been focused on enwiki to start because the main challenge is that there isn't fine-grained data of the sort that we need for really understanding Wikidata usage. I'm aiming for this initial analysis to answer the following question: for the 62% of English Wikipedia articles that supposedly transclude Wikidata content (based on wbc_entity_usage), what is the breakdown of that transclusion into the following categories: populating a metadata template, populating external links, populating an infobox, tracking categories with no change to the page?

Wed, Jul 29, 6:47 PM · Analytics-Kanban, Product-Analytics (Kanban), Patch-For-Review, Analytics

Tue, Jul 28

Isaac updated the task description for T254289: Add wikidata to articletopic pipeline.
Tue, Jul 28, 2:11 PM · drafttopic-modeling, Scoring-platform-team (Current), Research
Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

it should be possible to test this explanation. We can make QuickSurveys use button tags rather than a tags, removing the ability to right-click + open in new tab. This should be a relatively simple change as OOUI provides consistent styling for both tags when used as buttons.

i'm certainly interested about whether this does explain the whole issue, but regardless this would be desirable if someone has the time and it fixes the right-click issue. I don't see any drawbacks to this approach and improving logging for QuickSurveys is pretty important to it being useful.

Tue, Jul 28, 12:12 PM · Analytics-Radar, MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics-EventLogging, QuickSurveys

Mon, Jul 27

Isaac updated the task description for T219903: Keep research.wikimedia.org landing page updated.
Mon, Jul 27, 5:57 PM · Research
Isaac added a comment to T255028: Move the stat1004-6-7 hosts to Debian Buster.

but if you want to make sure you can try to install them on stat1005/stat1008 that are already running debian 10 (just to double check that nothing explodes etc..)

Ahh good point -- done and no issues. Thanks!

Mon, Jul 27, 1:55 PM · Analytics-Clusters

Fri, Jul 24

Isaac added a comment to T255028: Move the stat1004-6-7 hosts to Debian Buster.

@elukey thanks for the heads up -- any expectation that any Python packages will be problematic to reinstall? The one that generally gives me the most headache btw is fasttext.

Fri, Jul 24, 4:59 PM · Analytics-Clusters
Isaac updated the task description for T258807: Entry page for Knowledge Gap Index.
Fri, Jul 24, 3:38 PM · Research
Isaac added a comment to T258807: Entry page for Knowledge Gap Index.

update:

Fri, Jul 24, 3:38 PM · Research
Isaac created T258807: Entry page for Knowledge Gap Index.
Fri, Jul 24, 3:37 PM · Research
Isaac moved T258804: Language-Agnostic Topic Modeling from Staged to In Progress on the Research board.
Fri, Jul 24, 3:35 PM · Research, Epic
Isaac moved T258805: Experimental API for exploring topic models from Staged to FY2020-21-Research-July-September on the Research board.
Fri, Jul 24, 3:35 PM · Research (FY2020-21-Research-July-September)
Isaac added a comment to T258805: Experimental API for exploring topic models.

Update:

  • Created standardized template for hosting models on Cloud VPS that handles all the setup via a simple script so pretty easily extendable to other models (already using for link-based and wikidata-based models).
  • Created UI for easily comparing models: https://wiki-topic.toolforge.org/comparison
    • You can input a language + article title to compare results for specific articles or just the language (but leave title blank) to have the UI choose a random article for you
  • Current model performance report card but I'd like to standardize this a bit more
  • Initial pass at comparing Wikidata and link-based models but need to expand this to include ORES and be more accessible
Fri, Jul 24, 3:32 PM · Research (FY2020-21-Research-July-September)
Isaac created T258805: Experimental API for exploring topic models.
Fri, Jul 24, 3:28 PM · Research (FY2020-21-Research-July-September)
Isaac created T258804: Language-Agnostic Topic Modeling.
Fri, Jul 24, 3:18 PM · Research, Epic
Isaac updated the task description for T230677: Share out results from demographics surveys.
Fri, Jul 24, 2:51 PM · Research

Wed, Jul 22

JAllemandou awarded T258514: Make Wikidata item_page_link table available publicly a Party Time token.
Wed, Jul 22, 9:47 AM · Wikidata, Analytics

Tue, Jul 21

Isaac created T258514: Make Wikidata item_page_link table available publicly.
Tue, Jul 21, 5:42 PM · Wikidata, Analytics

Mon, Jul 20

Isaac added a comment to T131288: Make labs proxies https only.

+1 to this. Discussed in IRC but having this handled by default would be hugely hugely appreciated as I definitely do not trust myself to get it right!

Mon, Jul 20, 7:39 PM · cloud-services-team (Kanban), Cloud-VPS
Isaac added a comment to T258101: actor_signature_per_project_family does not work for apps.

Thanks @Nuria! Yeah, no hurry on our end to fix either, but I know we're excited about this table in Research for its potential for speeding up a lot of the querying / session-building work we do and so I want to make sure it's eventually fixed or at least clearly documented somewhere so we don't unknowingly reach wrong conclusions when we work on multilingual reading behavior that includes the apps.

Mon, Jul 20, 5:40 PM · Analytics
Isaac added a comment to T257869: Identify approaches for defining article importance.

Weekly update: discussed two possible directions with this work with my colleagues at UMN that would provide some insight into the new article importance metrics:

  • Identify metrics that reasonably proxy the importance factor and measure how well existing approaches to importance capture these new factors -- e.g., if "political impact" is a new factor, then you might assert that one way to identify articles that would have a political impact is to find articles that are under WikiProject Politics and have page protections in place (assuming that page protections means that either the article is impactful and attracted vandalism or was deemed potentially impactful and so was protected in advance of vandalism). Then you could look at how well pageviews or inlinks capture these new importance factors.
  • Identify important measures of bias (taking care to define) such as gender bias and look at how the different article importance metrics would contribute to or reduce bias if used in recommender systems.
Mon, Jul 20, 4:00 PM · Research (FY2020-21-Research-July-September)
Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: began process of systematically identifying main ways in which Wikidata is transcluded in enwiki and determining how they affect the wbc_entity_usage table. Had been inspecting the table for various examples to identify patterns but I just realized that I could probably use a sandbox page to actually verify without being disruptive. Also coding each instance with these criteria.

Mon, Jul 20, 12:23 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T257870: Onboarding for Isaac around code / data for sockpuppet detection.

Weekly update: setup meeting for next week to start onboarding process.

Mon, Jul 20, 12:10 PM · Research (FY2020-21-Research-July-September)

Wed, Jul 15

Isaac added a comment to T258101: actor_signature_per_project_family does not work for apps.

Happily! I had done some analysis of these sorts of actor signatures a while back with app users to see how stable the signatures are (meta) and so had thought (erroneously) that accept_language wasn't stable on any device so was glad to find out that it's just the app where it switches.

Wed, Jul 15, 8:24 PM · Analytics
Isaac added a comment to T257843: Enable CI on research/landing-page repo.

The URL is seemingly to an old (or, probably more accurately out of date) version of the repo

Ooof...thanks for catching that. Easy to fix. I'll start going through some of the other package information to do some of the other cleaning too.

Wed, Jul 15, 8:16 PM · Patch-For-Review, Continuous-Integration-Config, Research
Isaac updated the task description for T219903: Keep research.wikimedia.org landing page updated.
Wed, Jul 15, 8:09 PM · Research
Isaac created T258101: actor_signature_per_project_family does not work for apps.
Wed, Jul 15, 7:33 PM · Analytics
Isaac added a comment to T257843: Enable CI on research/landing-page repo.

Ok, let's go ahead and enable it then to see!

Wed, Jul 15, 4:16 PM · Patch-For-Review, Continuous-Integration-Config, Research

Tue, Jul 14

Isaac updated subscribers of T155541: [Epic] Article importance prediction model.
Tue, Jul 14, 3:35 PM · Research, Scoring-platform-team, artificial-intelligence
Isaac added a comment to T257843: Enable CI on research/landing-page repo.

I note, after adding the CI stuff, some remedial work might be needed to get things to a better state before moving forward; whether fixing issues or changing the rules/setup used by the Gruntfile

Tue, Jul 14, 3:26 PM · Patch-For-Review, Continuous-Integration-Config, Research
Isaac added a comment to T257843: Enable CI on research/landing-page repo.

@Reedy correct me if I'm wrong -- in practice, this would not noticeably change anything about our process of pushing changes to the research page? It might fail if the node10-docker has issues with it, but that would be a larger problem and very unlikely something triggered by the research landing page (and therefore likely fixed somewhat quickly because it will affect every other code-base that uses the node10-docker)?

Tue, Jul 14, 3:10 PM · Patch-For-Review, Continuous-Integration-Config, Research

Mon, Jul 13

Isaac moved T257870: Onboarding for Isaac around code / data for sockpuppet detection from Staged to FY2020-21-Research-July-September on the Research board.
Mon, Jul 13, 7:43 PM · Research (FY2020-21-Research-July-September)
Isaac created T257870: Onboarding for Isaac around code / data for sockpuppet detection.
Mon, Jul 13, 7:42 PM · Research (FY2020-21-Research-July-September)
Isaac moved T257869: Identify approaches for defining article importance from Staged to FY2020-21-Research-July-September on the Research board.
Mon, Jul 13, 7:33 PM · Research (FY2020-21-Research-July-September)
Isaac created T257869: Identify approaches for defining article importance.
Mon, Jul 13, 7:33 PM · Research (FY2020-21-Research-July-September)
Isaac updated the task description for T155541: [Epic] Article importance prediction model.
Mon, Jul 13, 6:36 PM · Research, Scoring-platform-team, artificial-intelligence
Isaac claimed T155541: [Epic] Article importance prediction model.

I'm going to go ahead and claim this epic task as we're looking to begin work on article importance. I'm going to update the task description as well to make this a broader task for the work we're hoping to do around measuring article importance (as opposed to any specific question).

Mon, Jul 13, 5:12 PM · Research, Scoring-platform-team, artificial-intelligence

Fri, Jul 10

Isaac added a comment to T257480: Sample HTML Dumps - Request for feedback.

English Wiki has 15m articles (I believe)
a full enwiki dump is clocking in at 944gb or something insanely large

I'm pretty sure a large part of this issue is based on how you handle redirects really and not compression format. Enwiki has 9.3M redirects. Right now the HTML of an article is fully reproduced for a redirect (i.e. not just redirect to [[article]] but the full-text of that article that the reader would see). English Wikipedia has just over 6M articles in the classic sense, so reproducing the full article text in the redirects would probably be what explodes it to 15M full articles and a very large file (as opposed to 6M full articles and ~9M very tiny files that just indicate that they are redirects).

Fri, Jul 10, 4:10 PM · Analytics-Radar, Dumps-Generation

Mon, Jul 6

Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

I should be clearer: what I meant is that sendBeacon will consistently fail if and only if the browser is ad-blocking. The failure is systematic in a way that both the Initiation and Responses events will not be sent, therefore adblock cannot explain the gap here.

Mon, Jul 6, 7:22 PM · Analytics-Radar, MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics-EventLogging, QuickSurveys

Jun 25 2020

Isaac added a comment to T242176: Launch experimental API for Wikidata-based topic model.

FYI I added a row in the documentation table for this. Feel free to improve

Sounds good, I'll take a look. A good reminder too that I need to update my existing API to use the template.

Jun 25 2020, 10:35 PM · Research (FY2019-20-Research-January-March)
Isaac moved T249654: Categorize different types of Wikidata re-use within Wikimedia projects from FY2019-20-Research-April-June to FY2020-21-Research-July-September on the Research board.
Jun 25 2020, 9:31 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac closed T250088: Draft Contributor Gaps Taxonomy, a subtask of T242172: Taxonomy of Knowledge Gaps, as Resolved.
Jun 25 2020, 9:30 PM · Research, Epic
Isaac closed T250088: Draft Contributor Gaps Taxonomy as Resolved.

Resolving this task -- iteration will continue (just added the 2020 Community Insights survey!) but full draft is complete.

Jun 25 2020, 9:30 PM · Research (FY2019-20-Research-April-June)

Jun 19 2020

Isaac closed T249856: Expand accessibility portion of Readership Gaps Taxonomy, a subtask of T242172: Taxonomy of Knowledge Gaps, as Resolved.
Jun 19 2020, 10:29 PM · Research, Epic
Isaac closed T249856: Expand accessibility portion of Readership Gaps Taxonomy as Resolved.
Jun 19 2020, 10:29 PM · Research (FY2019-20-Research-April-June)
Isaac updated the task description for T249856: Expand accessibility portion of Readership Gaps Taxonomy.
Jun 19 2020, 4:34 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

Gaps write-ups from Overleaf copied below. Still some iteration likely but at this stage, I would consider this task complete. @leila let me know if you concur.

Jun 19 2020, 4:34 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T250088: Draft Contributor Gaps Taxonomy.

Weekly update:

  • Full first draft completed and added to Overleaf! Will continue to iterate with the team on this for the rest of the quarter.
Jun 19 2020, 4:09 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: no progress. End date pushed to August 31st (Betterworks updated).

Jun 19 2020, 4:07 PM · Wikidata, Research (FY2020-21-Research-July-September)

Jun 15 2020

Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

tl;dr: I'll incorporate some of the below into the literature and metrics sections around accessibility + readership.

Jun 15 2020, 9:30 PM · Research (FY2019-20-Research-April-June)

Jun 12 2020

Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

Weekly update: added draft of accessibility section to Readers taxonomy on Overleaf.

Jun 12 2020, 10:39 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T250088: Draft Contributor Gaps Taxonomy.

Weekly update:

  • Added draft of sociodemographic gaps to taxonomy
  • Began going through surveys to identify trends -- e.g., median age of editors vs. country/world population
  • Updated missing Editor/Reader Survey categories to Meta to simplify the process of identifying these surveys in the future
Jun 12 2020, 6:34 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: no progress

Jun 12 2020, 6:32 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac closed T242162: Submit paper on reader demographics surveys for peer-review, a subtask of T230677: Share out results from demographics surveys, as Resolved.
Jun 12 2020, 2:50 PM · Research
Isaac closed T242162: Submit paper on reader demographics surveys for peer-review as Resolved.
Jun 12 2020, 2:50 PM · Research (FY2019-20-Research-April-June)

Jun 11 2020

Isaac added a comment to T242162: Submit paper on reader demographics surveys for peer-review.

@leila permission to close this? if we need to adjust course etc., that can be part of the parent task: T230677

Jun 11 2020, 8:22 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T242162: Submit paper on reader demographics surveys for peer-review.

Weekly update:

  • Paper submitted!
  • Will wait to hear initial response from NHB before choosing whether to upload submission to arxiv (if positive, then upload; if negative, then decision to upload depends on what we choose to do with the paper)
Jun 11 2020, 8:04 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T252775: Write Python util for converting Wikidata claims to features for ML models.

I wanted to preserve this info somewhere. We have discussed whether or not the Wikidata statements should be ordered by mwtext (see Examples section here). Here's my current thinking:

Jun 11 2020, 7:54 PM · Research, Scoring-platform-team

Jun 9 2020

Isaac added a comment to T195880: % of "none" referers seems too high.

Another data point that is interesting in this discussion: Youtube provides Wikipedia articles as fact-checks / context for a variety of conspiracy theories / state-sponsored broadcasting companies. For all of those Wikipedia article links, regardless of platform, they also provide a URL parameter that tells us that the person is coming from Youtube. This provides a rare opportunity to compare pageviews that have Youtube as a referrer with pageviews that we know came from Youtube. On top of that, I did some of self-experimentation to see how the usage of different apps / browsers affects the Youtube referrer. Summary is that 40% of referrals from Youtube are None referrers and that this happens when the user starts in the Youtube app and switches to a mobile browser that is not Android+Chrome. This is not going to fully apply to every app as they each handle referrers differently but it does provide support that app traffic often comes through as None referrers. Hard to know how big of the pie this is though. The None traffic part is about 200 thousand per day for Youtube and other apps presumably produce similar or higher traffic counts.

Jun 9 2020, 8:56 PM · Analytics-Radar, Readers-Web-Backlog (Needs Product Owner Decisions)

Jun 8 2020

Isaac added a comment to T254322: Support new "embedded" survey type.

+1 to moving forward with page IDs and addressing the Special pages when the need actually arises. It's exciting to see this functionality be added and I will pass back the information to my team!

Jun 8 2020, 6:38 PM · Patch-For-Review, MW-1.35-notes (1.35.0-wmf.37; 2020-06-16), WMDE-QWERTY-Sprint-2020-06-10, WMDE-Technical-Wishes-Team, Readers-Web-Backlog (Needs Product Owner Decisions), QuickSurveys, WMDE-QWERTY-Sprint-2020-05-27, WMDE-Templates-FocusArea
Isaac added a comment to T254275: HTML Dumps - June/2020.

Having it in HDFS first would allow it to be more easily used by internal WMF researchers and analysts.

Speaking personally but from the Research team, I also +1 this many many times over. There is so much text-based machine learning and analytics that would be many times easier / faster if we could access HTML in HDFS (because then we can take advantage of the SWAP system). Some recent research also recreated the full parsed HTML revision history for English Wikipedia and noted for example that over half of internal article links are only evident from the parsed article and not the raw wikitext. A few current examples of modeling etc. that would benefit that I know of:

  • Parsed versions of Wikipedia articles have way more links / content in them, which can be valuable for ML models like topic classification or quality prediction
  • Studying how much and what content is transcluded (has implications for patrolling etc.): T249654
  • Measuring the consistency of content in different language versions of the same article: T243256
  • Studying citation quality / usage, especially if templates like en:Cite Q see expanded usage in the wikis
  • For link recommendation -- i.e. suggesting to a user that they should insert a wikilink into an article -- you might want to verify that the link does not already exist in the article, which would be best done against the parsed version of the article
Jun 8 2020, 1:27 PM · Analytics-Radar, Platform Engineering, Dumps-Generation

Jun 5 2020

Isaac closed T241768: Pilot social media traffic reports for English Wikipedia as Resolved.
Jun 5 2020, 6:28 PM · Research (FY2019-20-Research-April-June), Privacy Engineering
Isaac closed T242176: Launch experimental API for Wikidata-based topic model as Resolved.

Template uploaded to Github: https://github.com/wikimedia/research-api-interface-template

Jun 5 2020, 6:22 PM · Research (FY2019-20-Research-January-March)
Isaac closed T242176: Launch experimental API for Wikidata-based topic model, a subtask of T230677: Share out results from demographics surveys, as Resolved.
Jun 5 2020, 6:22 PM · Research
Isaac updated the task description for T242176: Launch experimental API for Wikidata-based topic model.
Jun 5 2020, 6:21 PM · Research (FY2019-20-Research-January-March)
Isaac added a comment to T242162: Submit paper on reader demographics surveys for peer-review.

Weekly update:

  • Paper complete. Waiting for go-ahead from all to submit and accompanying letter.
Jun 5 2020, 6:20 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

Weekly update: no progress

Jun 5 2020, 6:09 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T250088: Draft Contributor Gaps Taxonomy.

Weekly update: no progress

Jun 5 2020, 6:09 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: no progress

Jun 5 2020, 6:08 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T241768: Pilot social media traffic reports for English Wikipedia.

Weekly update:

  • Turned off public report -- haven't heard anything via email / talk pages
Jun 5 2020, 6:07 PM · Research (FY2019-20-Research-April-June), Privacy Engineering

Jun 4 2020

Isaac added a comment to T254322: Support new "embedded" survey type.

only a statement from @Isaac that Research would prefer page ID.

I don't always stand by things I said a year ago, but in this case, yeah, I would still advocate strongly for using page ID as the preferred identifier for articles. Because QuickSurveys is language-specific, I see no value to using QID and it would add an additional place where things could go wrong (e.g., Wikidata item changes). Titles aren't stable enough as page moves would break the survey logic (pretty common in breaking news topics) and introduce all the standard issues with getting the right normalization, special characters, etc. I think the main challenge with page IDs was that Special pages do not have unique page IDs so could not be sampled under that approach.

Jun 4 2020, 1:37 PM · Patch-For-Review, MW-1.35-notes (1.35.0-wmf.37; 2020-06-16), WMDE-QWERTY-Sprint-2020-06-10, WMDE-Technical-Wishes-Team, Readers-Web-Backlog (Needs Product Owner Decisions), QuickSurveys, WMDE-QWERTY-Sprint-2020-05-27, WMDE-Templates-FocusArea

Jun 2 2020

Isaac created T254289: Add wikidata to articletopic pipeline.
Jun 2 2020, 9:46 PM · drafttopic-modeling, Scoring-platform-team (Current), Research
Isaac added a subtask for T245848: Productionize Wikidata-based Topic Model on ORES: T252775: Write Python util for converting Wikidata claims to features for ML models.
Jun 2 2020, 4:15 PM · Outreachy (Round 20), Outreach-Programs-Projects
Isaac added a parent task for T252775: Write Python util for converting Wikidata claims to features for ML models: T245848: Productionize Wikidata-based Topic Model on ORES.
Jun 2 2020, 4:15 PM · Research, Scoring-platform-team

Jun 1 2020

Isaac added a comment to T155541: [Epic] Article importance prediction model.

Thanks @Nettrom for adding me to this -- I should have known to look for a task like this before :)

Jun 1 2020, 5:12 PM · Research, Scoring-platform-team, artificial-intelligence

May 29 2020

Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: no progress

May 29 2020, 7:58 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

Weekly update: no progress

May 29 2020, 7:58 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T250088: Draft Contributor Gaps Taxonomy.

Weekly update:

  • Focused on starting description of methods for executive summary / final report.
May 29 2020, 7:57 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T242162: Submit paper on reader demographics surveys for peer-review.

Weekly update:

  • Progress on writing -- goal to submit early next week
May 29 2020, 7:57 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T241768: Pilot social media traffic reports for English Wikipedia.

Weekly update:

  • Monitored talk pages / email thread but no responses yet
  • Confirmed that data could continue to be collected beyond May 31st
  • We will shut down the public-facing report though after May 31st so there is clarity that is not being maintained (we will of course still be open to feedback after that point should we hear it)
May 29 2020, 7:53 PM · Research (FY2019-20-Research-April-June), Privacy Engineering

May 28 2020

RhinosF1 awarded T219903: Keep research.wikimedia.org landing page updated a Like token.
May 28 2020, 4:10 PM · Research
Isaac renamed T219903: Keep research.wikimedia.org landing page updated from Keep research.wikipedia.org landing page updated to Keep research.wikimedia.org landing page updated.
May 28 2020, 4:07 PM · Research
Isaac added a comment to T219903: Keep research.wikimedia.org landing page updated.

The task title uses .wikipedia.org, do you mean .wikimedia.org

Hah, yes, good catch @RhinosF1 !

May 28 2020, 4:07 PM · Research
Isaac updated the task description for T219903: Keep research.wikimedia.org landing page updated.
May 28 2020, 3:49 PM · Research

May 26 2020

Isaac added a comment to T251777: Creation of canonical pageview dumps for users to download.

Makes sense @fdans -- I think page IDs + API is being tracked at T159046, so I'll hold off on creating a new task.

May 26 2020, 5:26 PM · Analytics-Kanban, Patch-For-Review, Analytics

May 22 2020

Isaac added a comment to T249856: Expand accessibility portion of Readership Gaps Taxonomy.

Weekly update: no progress.

May 22 2020, 4:30 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T250088: Draft Contributor Gaps Taxonomy.

Weekly update:

  • Updated leafs of taxonomy with the relevant surveys
  • Began writing introduction to taxonomy
May 22 2020, 4:30 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T242162: Submit paper on reader demographics surveys for peer-review.

Weekly update:

  • Continued iteration on narrative / results with team
May 22 2020, 4:26 PM · Research (FY2019-20-Research-April-June)
Isaac added a comment to T249654: Categorize different types of Wikidata re-use within Wikimedia projects.

Weekly update: no progress.

May 22 2020, 4:25 PM · Wikidata, Research (FY2020-21-Research-July-September)
Isaac added a comment to T241768: Pilot social media traffic reports for English Wikipedia.

Weekly update:

  • Did rough analysis of first two months of report
  • Sent out emails to wiki-research-l + analytics-l about ending of pilot
  • In the process of confirming with Privacy that the pilot could run beyond May 30th without raising any additional privacy concerns.
May 22 2020, 4:24 PM · Research (FY2019-20-Research-April-June), Privacy Engineering

May 21 2020

Isaac assigned T252775: Write Python util for converting Wikidata claims to features for ML models to Dibyaaaaax.
May 21 2020, 7:51 PM · Research, Scoring-platform-team