Page MenuHomePhabricator

Isaac (Isaac Johnson)
User

Projects

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Oct 1 2018, 2:19 PM (59 w, 2 d)
Availability
Available
IRC Nick
isaacj
LDAP User
Isaac Johnson
MediaWiki User
Isaac (WMF) [ Global Accounts ]

Recent Activity

Wed, Nov 13

Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Wed, Nov 13, 7:23 PM · Research
Isaac added a comment to T219903: Keep research.wikipedia.org landing page updated.

@leila I'm breaking this next update into a few. First is taking care of the smaller things that don't require reorganization. Does this team page look like what you were expecting? I'll also remove WikiCite from the events page and add the eliciting new editors blogpost. Right now we don't actually have a good spot to put the Understanding Thanks blogpost and something about Doris' Outreachy project, so I'm going to continue to think on that.

Wed, Nov 13, 2:42 PM · Research

Mon, Nov 11

Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

Not sure if we can do it in Jupyterhub, but probably we'll be able to add something to the MOTD of the stat/notebook hosts, so when people ssh they'll get instructions about what to do for kerberos, where to find docs, etc.. Nice suggestion thanks!

Mon, Nov 11, 9:20 PM · User-Elukey, Analytics-Kanban, Analytics
Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.
  • Work on this project will come to a close on December 2nd (and the students will switch to writing)
  • TFIDF-based "attention" scores works almost as well as learned attention scores but is slower in training right now. Looking into the issue -- might be a function of the TFIDF scores not being normalized to 1 for a given article.
  • For any given language, the hope is to have model performance for the following experimental setups:
    • Trained purely on that language. Aligned fastText embeddings. Randomly initialized model weights.
    • Model trained on English with all weights frozen (no fine-tuning). Aligned fastText embeddings.
    • Model trained on English with final layer fine-tuned to new language. Aligned fastText embeddings.
    • Model trained on mixture of examples from different languages. Aligned fastText embeddings. No language-specific fine-tuning.
  • For transfer learning problem (fine-tune general model to identify a specific wikiproject), examined a more difficult negative sample (positive = Human rights; negative = Politics articles) and found still quite high F1 (>0.9). Going to look into one-shot learning techniques and embedding cosine-distance as an even simpler approach.
Mon, Nov 11, 9:16 PM · Research

Thu, Nov 7

Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

The option that is currently available is a keytab

Ok, that works for me. I'll avoid it but it's good to know it's an option if needed.

Thu, Nov 7, 6:09 PM · User-Elukey, Analytics-Kanban, Analytics
Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

@elukey I played around with it and didn't run into any major issues. Thanks for the detailed notes! My only two concerns:

  • It doesn't seem that there is a good way to provide a password automatically to kinit (e.g., from a protected text file) so that long-running scripts can automatically renew credentials periodically. This is not a blocker for me but it would be nice to have the ability -- do you have any suggestions?
  • Is the suggested workflow for running a SWAP notebook to open a terminal window in JupyterHub and kinit before starting a PySpark kernel? That works but I wasn't sure if there was a way to kinit from the notebook itself or another suggested approach.
Thu, Nov 7, 12:58 PM · User-Elukey, Analytics-Kanban, Analytics

Wed, Nov 6

Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Wed, Nov 6, 11:54 PM · Research
Isaac updated subscribers of T219903: Keep research.wikipedia.org landing page updated.

@DED could you point me towards a photo that you'd like up on the website (https://research.wikimedia.org/team.html) or indicate that you want the "camera shy" default image? Easy to change later, but I should get something up there :)

Wed, Nov 6, 9:39 PM · Research
Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

Thanks @Nuria !

if your survey is happening all in desktop

The survey happens on both desktop and mobile. Unfortunately we can't tell whether the survey responses missing EventLogging were from mobile or desktop :/ But adblocking exists on both platforms.

Wed, Nov 6, 7:30 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics, Analytics-EventLogging, QuickSurveys

Tue, Nov 5

Isaac added a comment to T236713: Improve drafttopic training data pipeline.

@Halfak I didn't want to mess with the work you had done and in some cases didn't know how to merge my suggestions, so I just added them to the etherpad after your section. For posterity, here's what I had come up with from my own work. In general I think they complement your suggestions, but we might need some meeting time as a group at some point to merge everything.

Tue, Nov 5, 10:06 PM · NewcomerTasks 1.1, Research
Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

@Jdlrobson I took a look at the survey responses we had and our ability to match them up with EventLogging. High-level is we're still seeing the problem where ~10% of survey responses have no corresponding EventLogging for neither the QuickSurveysResponses nor QuickSurveyInitiation schemas. It did seem to improve for English but I don't trust that because it got worse for Russian/Polish. I split the survey responses up between before and after the deployment (2019/10/24 13:06 UTC per above). For the three surveys that we were running, this is what we get:

Tue, Nov 5, 8:10 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics, Analytics-EventLogging, QuickSurveys

Thu, Oct 31

Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.

Progress:

  • Pushed back comparison of attention to tf-idf weights to this week.
  • Label-specific scores make sense: labels with low numbers of training samples are hard. labels like History and Society are hard. Attention-based model does slightly better than baseline in most labels but no topic where it clearly excels.
  • File provided that maps page titles to page IDs (taking into account redirects). This will allow for an inlinks/outlinks-based model. An inlinks-only model is at 0.55 micro F1 score with minimal tuning, so still a ways to go (ideally above 0.7 at least)
  • Very easy to fine-tune a model to a new label -- e.g., Human Rights. 95% accuracy on this binary task (replace final layer and freeze others) with 5000 positive examples and 5000 random negative examples. We're going to look into harder versions of this task that uses negative samples that look more like human rights (e.g., History/Society or Politics/Government) and also how few positive examples can be used.
Thu, Oct 31, 8:19 PM · Research

Wed, Oct 30

Isaac updated subscribers of T236713: Improve drafttopic training data pipeline.
Wed, Oct 30, 3:35 PM · NewcomerTasks 1.1, Research

Mon, Oct 28

Isaac added a comment to T236713: Improve drafttopic training data pipeline.

@Halfak Current output bzipped JSON is on stat1007 at /home/isaacj/drafttopic/full_wptemplates.json.bz2

Mon, Oct 28, 6:01 PM · NewcomerTasks 1.1, Research
Isaac created T236713: Improve drafttopic training data pipeline.
Mon, Oct 28, 5:54 PM · NewcomerTasks 1.1, Research

Fri, Oct 25

Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.

Progress:

  • LSTM + attention model working and the attention weights make some qualitatively (high proportions assigned to words that intuitively are linked to the predicted labels and most words being assigned ~0 probability). The question was raised whether this outperforms a simple non-learned tf-idf-based weighting, which will be verified hopefully next week.
  • Baseline model was trained with English embeddings + articles and then tested with Russian embeddings + articles. It failed miserably. The model achieved comparable performance though post-fine-tuning. This week we will explore if the fine-tuning is quick or if the embeddings provide little value in transfer to other languages.
  • Some data challenges were identified with the inlinks / outlinks. I talked through namespaces and we decided to start just with namespace 0 to avoid accidentally including categories/templates in the training data that link directly to WikiProjects (and therefore the labels). I'm working on providing a mapping between page IDs, titles, and Wikidata IDs (taking into account redirects) for English Wikipedia so we can make sure that all the data is of the same type.
  • I'm building a dataset that we can use for transfer learning exploration -- e.g., train a model to predict generic labels and then fine-tune for a specific WikiProject.
Fri, Oct 25, 5:50 PM · Research
Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

As I go to do this analysis, what UTC day/hour should be my cut-off for when QuickSurveys would have switched from the old approach to the new approach?

Fri, Oct 25, 5:42 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics, Analytics-EventLogging, QuickSurveys
Isaac added a comment to T235324: Answer key research questions to inform KaiOS app development.

Yes, putting out country-specific data is still on my TODO list. Just let me know what would be helpful. Regarding KaiOS specifically, we are unfortunately past the 90-day window from the June surveys so I don't have any of the user-agent information that would let us split survey responses by OS. We are working to get out a Hindi Wikipedia survey though, which might further insight.

Fri, Oct 25, 5:41 PM · Epic, Inuka-Team, Product-Analytics

Thu, Oct 24

Isaac added a comment to T234853: Performance survey died on ruwiki on Sep 26.

Thanks, will do -- I appreciate you being part of my learning curve!

Thu, Oct 24, 2:25 PM · Performance-Team
Isaac added a comment to T234853: Performance survey died on ruwiki on Sep 26.

Ahhh apologies @Gilles -- I did not realize think anything of those surveys because the coverage was set at 0. It looks like you have made them more explicit to avoid someone like me making this error in the future but anything else I should be aware of when deploying surveys in the future? There is the hope that we will deploy a few in Hindi (hi), bahasa Indonesian (id), and Portuguese (pt) in the next month or so.

Thu, Oct 24, 1:29 PM · Performance-Team

Wed, Oct 23

Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

Additionally, the Performance team is running their Perceived Performance survey right now.

Thanks for pointing this out -- if I'm not mistaken, that's an internal survey so I'll still extend my external survey until next week. The missing data is only evident in external surveys where we somehow have people responding to the survey via Google Forms with reasonable-looking survey codes and responses but no associated initiation/response eventlogging. Presumably the loss of data happens in internal surveys as well, but we have no second source of data that indicates that we're missing responses.

Wed, Oct 23, 4:23 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics, Analytics-EventLogging, QuickSurveys
Isaac updated the task description for T218917: Improve Research presence on the web.
Wed, Oct 23, 3:23 PM · Research

Tue, Oct 22

Isaac added a comment to T220627: QuickSurveys EventLogging missing ~10% of interactions.

Do we think those were the reason for the missing events? What are the next steps? Confirming via data?

Tue, Oct 22, 7:07 PM · MW-1.35-notes (1.35.0-wmf.3; 2019-10-22), Patch-For-Review, Readers-Web-Backlog (Kanbanana-2019-20-Q2), Analytics, Analytics-EventLogging, QuickSurveys

Oct 17 2019

Isaac created T235784: Identify data / questions that we can(not) answer regarding external reuse.
Oct 17 2019, 4:23 PM · Research
Isaac created T235781: Taxonomy of re-use and current knowledge of the effect on traffic to Wikimedia.
Oct 17 2019, 4:02 PM · Research
Isaac created T235780: Literature review of external reuse of Wikimedia content.
Oct 17 2019, 3:59 PM · Research
Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

Yep, now able to access -- thanks! I'll do my best to test today and report back with an lgtm if no issues arise.

Oct 17 2019, 3:55 PM · User-Elukey, Analytics-Kanban, Analytics

Oct 16 2019

Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

@elukey I'm having trouble ssh-ing into an-tool1006.eqiad.wmnet (ssh isaacj@an-tool1006.eqiad.wmnet) where it is not letting me on the server (doesn't accept my password) -- is it possible that I need to be added as a user to the machine or is it an issue on my end? thanks!

Oct 16 2019, 11:15 PM · User-Elukey, Analytics-Kanban, Analytics

Oct 14 2019

Isaac updated subscribers of T235443: Report on State of Wikimedia Research of Knowledge Integrity.

@diego I didn't write the disinformation lit review into the initial submission but I'll make sure to work with you to get together a slide on it with some takeaways and links for people to follow up on. It might be possible to work in some of your findings into the slides on Jonathan's patrolling research too.

Oct 14 2019, 4:48 PM · Research
Isaac created T235443: Report on State of Wikimedia Research of Knowledge Integrity.
Oct 14 2019, 4:46 PM · Research
Isaac added a comment to T212258: Create test Kerberos identities/accounts for some selected users in hadoop test cluster.

@elukey : also happy to help. thanks for reaching out!

Oct 14 2019, 4:34 PM · User-Elukey, Analytics-Kanban, Analytics
Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.

Progress report:

  • Code in place to preprocess/tokenize wikitext and perform embedding lookups
  • Initial pass made at reimplementing the drafttopic model (average of an article's embeddings + simple classifier overtop). This ran into a few issues:
    • NaNs being returned during training -- turned out that some of the articles were missing labels/text and this was causing issues at the averaging step because of a divide by zero error
    • Micro stats look pretty good but macro precision/recall/f1 is quite low: not quite clear yet what is to blame here. could just be that the model needs more fine-tuning of hyperparameters and regularization but some label balancing might be necessary.
    • Slow training: might not be necessary to process the entire article
  • Basic code written for an LSTM model: slow training and some mismatch between loss and precision/recall results so we will make sure that everyone is using the same code for computing model statistics
Oct 14 2019, 4:27 PM · Research

Oct 11 2019

Isaac added a comment to T234188: Taxonomy of new user reading patterns.

Adding in a few links for reference:
Definitely want to tie this to two projects from the Growth Team: Understanding First Day and Welcome Survey. Specifically, it's worth checking out Morten's reports for these two projects:

Oct 11 2019, 5:53 PM · Analytics, Research
Isaac added a comment to T215775: Check home leftovers of ISI researchers.

Done! Thanks for the reminder @elukey !

Oct 11 2019, 3:32 PM · Research, Analytics

Oct 9 2019

Isaac closed T234099: Make recommendations regarding understanding the gender balance of our editor population , a subtask of T201707: Output 3.3: Baseline statistics on contributor diversity, as Resolved.
Oct 9 2019, 4:16 PM · Research, address-knowledge-gaps, Epic
Isaac closed T234099: Make recommendations regarding understanding the gender balance of our editor population as Resolved.
Oct 9 2019, 4:16 PM · Research

Oct 7 2019

Isaac closed T228319: Determine important article features with respect to readership, a subtask of T228285: Analyze Demographics Surveys, as Resolved.
Oct 7 2019, 8:34 PM · Research
Isaac closed T228319: Determine important article features with respect to readership as Resolved.

I wrote up a more complete report on topic modeling (for assigning topics to page views) and some of my recommendations / takeaways: https://meta.wikimedia.org/wiki/Research:Characterizing_Wikipedia_Reader_Behaviour/Demographics_and_Wikipedia_use_cases/Topic_Analysis

Oct 7 2019, 8:34 PM · Research

Oct 4 2019

Isaac closed T228279: Process Demographics Surveys, a subtask of T203042: Output 2.2: Characterizing readership by demographics, as Resolved.
Oct 4 2019, 9:35 PM · Research, address-knowledge-gaps, Epic
Isaac closed T228279: Process Demographics Surveys as Resolved.

Processing complete -- see individual subtasks for details. Focus on writing / sharing results now.

Oct 4 2019, 9:35 PM · Research
Isaac closed T228285: Analyze Demographics Surveys, a subtask of T228279: Process Demographics Surveys, as Resolved.
Oct 4 2019, 9:34 PM · Research
Isaac closed T228285: Analyze Demographics Surveys as Resolved.
Oct 4 2019, 9:34 PM · Research
Isaac updated the task description for T228285: Analyze Demographics Surveys.
Oct 4 2019, 9:34 PM · Research
Isaac added a comment to T228285: Analyze Demographics Surveys.

Analysis complete -- at this point the focus will shift to writing up and sharing the results (which has already been ongoing for several weeks).

Oct 4 2019, 9:33 PM · Research
Isaac added a comment to T234099: Make recommendations regarding understanding the gender balance of our editor population .

Draft write-up here: https://meta.wikimedia.org/wiki/Research:Surveys_on_the_gender_of_editors/Report

Oct 4 2019, 8:38 PM · Research
Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.

Some decisions:

  • The code for this project will be in PyTorch. The students have the greatest familiarity with this library and a number of popular packages are implemented in it.
Oct 4 2019, 6:01 PM · Research

Oct 2 2019

Isaac updated subscribers of T234473: Requesting access to analytics cluster for Djellel Difallah.
Oct 2 2019, 7:53 PM · Research, SRE-Access-Requests, Operations

Sep 30 2019

Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Sep 30 2019, 5:31 PM · Research
Isaac added a comment to T233646: Article Topic NYU Fall 2019 Capstone Project.

I proposed four separate tasks for each student. While all four tasks are focused on labeling a given Wikipedia article with appropriate topics, the tasks address different aspects of this problem:

  • Multilingual word embeddings: take existing drafttopic model and explore how to expand this to use multilingual word embeddings (e.g., fastText) to expand the model to languages beyond English.
  • Alternative language models: Focus on English and how more advanced language models -- e.g., recurrent neural networks, bidirectional models, or models with attention -- might improve labeling performance.
  • Leveraging domain knowledge: How can we use our knowledge of Wikipedia to improve model performance while making the model more efficient -- e.g., using just links on a page or leveraging Wikidata
  • Transfer learning: How can we efficiently tune a general topic model to label articles with a new set of labels.
Sep 30 2019, 4:32 PM · Research
Isaac moved T228319: Determine important article features with respect to readership from Staged to In Progress on the Research board.
Sep 30 2019, 4:24 PM · Research
Isaac moved T232525: Repeat demographics surveys for longer time period from Staged to In Progress on the Research board.
Sep 30 2019, 4:24 PM · Research
Isaac moved T233646: Article Topic NYU Fall 2019 Capstone Project from Staged to In Progress on the Research board.
Sep 30 2019, 4:24 PM · Research
Isaac moved T234099: Make recommendations regarding understanding the gender balance of our editor population from Staged to In Progress on the Research board.
Sep 30 2019, 4:24 PM · Research

Sep 27 2019

Isaac created T234099: Make recommendations regarding understanding the gender balance of our editor population .
Sep 27 2019, 8:14 PM · Research

Sep 26 2019

Isaac added a comment to T232525: Repeat demographics surveys for longer time period.

Had no responses to the Village Pump posts or on the meta page so deployed the surveys this morning. The intent is for one month but we will monitor response count / feedback and adjust as needed.

Sep 26 2019, 11:28 AM · Research
Isaac updated the task description for T232525: Repeat demographics surveys for longer time period.
Sep 26 2019, 11:27 AM · Research

Sep 23 2019

Isaac added a comment to T219903: Keep research.wikipedia.org landing page updated.

@MGerlach -- sounds good. It might be a useful introduction into Gerrit as well so next meeting I will walk you through getting set up to make this change.

Sep 23 2019, 6:25 PM · Research
Isaac created T233646: Article Topic NYU Fall 2019 Capstone Project.
Sep 23 2019, 6:23 PM · Research
Isaac updated the task description for T232525: Repeat demographics surveys for longer time period.
Sep 23 2019, 3:04 PM · Research
Isaac added a comment to T232525: Repeat demographics surveys for longer time period.
plwiki => [
    'enabled' => true,
    "name" => "reader-demographics-pl",
    "type" => "external",
    "description" => "Reader-demographics-1-description",
    "link" => "Reader-demographics-1-link",
    "question" => "Reader-demographics-1-message",
    "privacyPolicy" => "Reader-demographics-1-privacy",
    "coverage" => 0.005, // 1 out of 200
    "instanceTokenParameterName" => "entry.1791119923",
    "platforms" => [
        "desktop"=> ["stable"],
        "mobile"=> ["stable"]
    ],
]
Sep 23 2019, 2:19 PM · Research

Sep 20 2019

Isaac added a comment to T232525: Repeat demographics surveys for longer time period.
ruwiki => [
    'enabled' => true,
    "name" => "reader-demographics-ru",
    "type" => "external",
    "description" => "Reader-segmentation-1-description",
    "link" => "Reader-demographics-2-link",
    "question" => "Reader-demographics-1-message",
    "privacyPolicy" => "Reader-demographics-1-privacy",
    "coverage" => 0.00167, // 1 out of 600
    "instanceTokenParameterName" => "entry.1791119923",
    "platforms" => [
        "desktop"=> ["stable"],
        "mobile"=> ["stable"]
    ],
]
Sep 20 2019, 9:08 PM · Research

Sep 19 2019

Isaac added a comment to T232525: Repeat demographics surveys for longer time period.
enwiki => [
    'enabled' => true,
    "name" => "reader-demographics-en",
    "type" => "external",
    "description" => "Reader-demographics-1-description",
    "link" => "Reader-demographics-2-link",
    "question" => "Reader-demographics-1-message",
    "privacyPolicy" => "Reader-demographics-1-privacy",
    "coverage" => 0.000833, // 1 out of 1200
    "instanceTokenParameterName" => "entry.1791119923",
    "platforms" => [
        "desktop"=> ["stable"],
        "mobile"=> ["stable"]
    ],
]
Sep 19 2019, 6:53 PM · Research
Isaac updated the task description for T232525: Repeat demographics surveys for longer time period.
Sep 19 2019, 3:27 PM · Research

Sep 13 2019

Isaac added a comment to T224459: Recommend the best format to release public data lake as a dump.

this looks great -- thanks @mforns for writing this up so clearly!

Sep 13 2019, 9:02 PM · Research, Analytics

Sep 12 2019

Isaac updated the task description for T230677: Share out results from demographics surveys.
Sep 12 2019, 3:51 PM · Research
Isaac added a comment to T228319: Determine important article features with respect to readership.

I'm currently exploring how to expand the ORES drafftopic model to languages besides English. The approach I have taken to this is to build a model that predicts the ~40 categories used by the ORES drafttopic model.

Sep 12 2019, 1:44 PM · Research

Sep 11 2019

Isaac added a comment to T232525: Repeat demographics surveys for longer time period.

Regarding the sampling rate for Polish Wikipedia, their page views are generally about one third of those to Russian Wikipedia (https://tools.wmflabs.org/siteviews/?platform=all-access&source=pageviews&agent=user&range=this-year&sites=ru.wikipedia.org|pl.wikipedia.org) and unique devices are also a bit under one third (https://stats.wikimedia.org/v2/#/pl.wikipedia.org/reading/unique-devices/normal|line|2-year|~total|monthly). Based on this, I will recommend setting the sampling rate to three times that of Russian Wikipedia.

Sep 11 2019, 7:20 PM · Research
Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Sep 11 2019, 6:23 PM · Research

Sep 10 2019

Isaac updated the task description for T228319: Determine important article features with respect to readership.
Sep 10 2019, 8:09 PM · Research
Isaac created T232525: Repeat demographics surveys for longer time period.
Sep 10 2019, 7:57 PM · Research

Sep 9 2019

Isaac updated the task description for T230677: Share out results from demographics surveys.
Sep 9 2019, 2:48 PM · Research

Sep 4 2019

Isaac added a comment to T223765: Wiki Content Translation Tool Research Project.

Thanks @Pginer-WMF ! @Doriszhou1224 I'll let you have the satisfaction of closing this :)

Sep 4 2019, 5:09 PM · Research, Outreachy (Round 18)

Sep 3 2019

Isaac added a comment to T228091: Community relations support for editor gender surveys.

Thanks for the ping @Elitre -- I'm comfortable with wrapping up this task if @Trizek-WMF is. There shouldn't be any further community relations support until we start sharing out results, but we can open a new task then if it is deemed necessary.

Sep 3 2019, 6:33 PM · CommRel-Specialists-Support (Jul-Sep-2019), Readers-Community-Engagement, Research

Aug 26 2019

Isaac added a comment to T228319: Determine important article features with respect to readership.

My first attempt at building a language-independent means of representing article topic -- i.e. grouping article page views into categories regardless of which Wikipedia language edition that article was read in -- was to map each page view to its Wikidata item and then represent the item based on its instance-of / subclass-of properties. An example of how this might work is below [1]. The goal is to build a set of higher-level categories to which any Wikidata item with an instance-of property can be mapped to. This is similar to the 14 categories in the Wikidata Concepts Monitor but ideally with no overlap and full coverage -- i.e. all items map deterministically to a single category.

Aug 26 2019, 5:00 PM · Research
Isaac added a comment to T228285: Analyze Demographics Surveys.

I presented preliminary results at Wikimania: https://wikimania.wikimedia.org/wiki/2019:Research/Characterizing_Reader_Behavior_on_Wikipedia

Aug 26 2019, 4:36 PM · Research
Isaac updated the task description for T228319: Determine important article features with respect to readership.
Aug 26 2019, 3:11 PM · Research

Aug 23 2019

Isaac closed T212448: Prepare demographics survey data for analysis as Resolved.

Debiasing complete.

  • See reader behavior features under T228285 for features that were used in debiasing.
  • It was determined that a GradientBoostingClassifier performed best with respect to making the average features -- e.g., average pages viewed per session) for the survey respondents match the general population for the wiki -- though LogisticRegression also worked quite well in many cases.
  • Wikidata instance-of ended up being relatively uninformative so I might revisit that with drafttopic categories.
  • As part of this work, a few changes were required:
    • African and Worldwide surveys (english/french) were separated because I realized that weights from debiasing would not be comparable if they came from two separate models (if a single model was used for english or french, country was a very strong predictor of whether someone took the survey or not)
    • I trimmed the control sessions to exactly match the survey session timespan because the survey was launched / ended mid-day and that meant without careful control, that day of week became a strong predictor of whether someone took the survey or not.
Aug 23 2019, 10:21 PM · Research
Isaac closed T212448: Prepare demographics survey data for analysis, a subtask of T228279: Process Demographics Surveys, as Resolved.
Aug 23 2019, 10:21 PM · Research
Isaac updated the task description for T228285: Analyze Demographics Surveys.
Aug 23 2019, 10:10 PM · Research

Aug 22 2019

Isaac committed rRLP335dc47766ff: Add ICWSM award, ethical AI blogpost, update project publication sections (authored by Isaac).
Add ICWSM award, ethical AI blogpost, update project publication sections
Aug 22 2019, 11:32 PM
Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Aug 22 2019, 10:15 PM · Research

Aug 18 2019

Isaac moved T228279: Process Demographics Surveys from Staged to In Progress on the Research board.
Aug 18 2019, 10:30 AM · Research
Isaac moved T228285: Analyze Demographics Surveys from Staged to In Progress on the Research board.
Aug 18 2019, 10:30 AM · Research
Isaac moved T230677: Share out results from demographics surveys from Staged to In Progress on the Research board.
Aug 18 2019, 10:30 AM · Research
Isaac removed a project from T230677: Share out results from demographics surveys: DONOTUSE-address-knowledge-gaps.
Aug 18 2019, 10:30 AM · Research
Isaac created T230677: Share out results from demographics surveys.
Aug 18 2019, 10:29 AM · Research
Isaac closed T230675: Prototype language-neutral version of drafttopic model as Resolved.
Aug 18 2019, 10:00 AM · Research, Wikimania-Hackathon-2019
Isaac updated the task description for T230675: Prototype language-neutral version of drafttopic model.
Aug 18 2019, 9:00 AM · Research, Wikimania-Hackathon-2019
Isaac created T230675: Prototype language-neutral version of drafttopic model.
Aug 18 2019, 8:56 AM · Research, Wikimania-Hackathon-2019

Aug 13 2019

Isaac updated the task description for T230249: Wikimania Hackathon Volunteering: Opening / Closing Documentation .
Aug 13 2019, 3:50 PM · International-Developer-Events, Wikimania-Hackathon-2019-Organization, Wikimania-Hackathon-2019
Isaac updated the task description for T227793: First round editor gender surveys.
Aug 13 2019, 1:13 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research
Isaac closed T227793: First round editor gender surveys, a subtask of T201707: Output 3.3: Baseline statistics on contributor diversity, as Resolved.
Aug 13 2019, 1:11 PM · Research, address-knowledge-gaps, Epic
Isaac closed T227793: First round editor gender surveys as Resolved.

Surveys completed -- thanks @pmiazga (and any respondents)!

Aug 13 2019, 1:11 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research
Isaac updated the task description for T219903: Keep research.wikipedia.org landing page updated.
Aug 13 2019, 7:12 AM · Research
Isaac added a comment to T228091: Community relations support for editor gender surveys.

Yes, I've been compiling the feedback and will make sure to include some recommendations regarding the functionality of QuickSurveys for this type of survey. In particular:

  • Ability to remove survey without responding explicitly. This should be a relatively straightforward UI fix but would require someone to do the work obviously.
  • Ability to opt out completely from surveys like this. This would likely be more complicated as it would have to be a change to the user settings, databases, etc. and we would have to determine if it only applied to QuickSurveys or other extensions as well.
  • Confusion around survey re-appearing in new browsers (or the same browser if cookies are deleted). This is how the sampling works but this is not at all evident to respondents, so there can be confusion when someone responds and then sees the survey again. I don't believe we can change the sampling strategy to be aware of whether a user account has already taken the survey in a different browser without compromising privacy, but we should consider having a better explanation of this within the survey to head off concerns for any future deployments.
  • General frustration/anger with the presence of the survey (in some part tied to the difficulty of opting out from it).
Aug 13 2019, 7:11 AM · CommRel-Specialists-Support (Jul-Sep-2019), Readers-Community-Engagement, Research

Aug 9 2019

Isaac added a comment to T227793: First round editor gender surveys.

Sufficient responses have been reached -- plan is to undeploy these surveys in first available SWAT deployment by Tuesday (Aug 13).

Aug 9 2019, 2:50 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research

Aug 8 2019

Isaac added a comment to T227793: First round editor gender surveys.

Thanks for the notification @Jc86035

Aug 8 2019, 3:32 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research

Aug 7 2019

Isaac added a comment to T228091: Community relations support for editor gender surveys.

Sounds good -- FYI I'm responding to some comments on the meta page as I see them: https://meta.wikimedia.org/wiki/Research_talk:Surveys_on_the_gender_of_editors

Aug 7 2019, 5:00 PM · CommRel-Specialists-Support (Jul-Sep-2019), Readers-Community-Engagement, Research

Aug 6 2019

Isaac updated the task description for T227793: First round editor gender surveys.
Aug 6 2019, 1:11 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research
Isaac added a comment to T227793: First round editor gender surveys.

Per IRC conversation, leaving this open until we un-deploy the surveys and then I will sign off. Thanks!

Aug 6 2019, 1:07 PM · Readers-Web-Backlog (Tracking), Wikimedia-Site-requests, Research