Page MenuHomePhabricator

Groceryheist (Nathan TeBlunthuis)
Analysis

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Sep 19 2018, 12:07 AM (301 w, 3 d)
Availability
Available
IRC Nick
groceryheist
LDAP User
Groceryheist
MediaWiki User
Groceryheist [ Global Accounts ]

I'm Nate!

I'm PhD student at the University of Washington. I'm consulting on some data analysis and research projects at WMF this year.

I belong to the Community Data Science Collective, at the Communication Department at UW and the Department of Communication Studies at Northwestern University. I am training to be a computational social scientist of organizational communication with a focus on online collaboration.

Check out my paper “Revisiting ‘The Rise and Decline’ in a population of Peer Production Projects” For this project, I set out to replicate some of the key findings from “The Rise and Decline of an Open Collaboration System” by Aaron Halfaker, Stuart Geiger, Johnathan Morgan, and John Riedl. They argued that the decline in the number of active Wikipedia editors could be attributed to the rise of quality control systems that made it difficult for newcomers to join the community. I wanted to know if such systems create barriers for newcomers in peer production projects other than Wikipedia. I adapted Halfaker et al.’s methodological approach to analyze a set of 700 Wikia wikis. It turns out that typical wikis not only have similar mechanisms for decline as Wikipedia, but also exhibits ‘rise and decline’ patterns.

Recent Activity

May 26 2021

Groceryheist closed T224902: Fit models for revert prediction as Declined.

This project ended up going a different direction.

May 26 2021, 5:11 PM · editquality-modeling, ORES, artificial-intelligence
Groceryheist closed T224902: Fit models for revert prediction, a subtask of T224901: ORES bias analysis, as Declined.
May 26 2021, 5:11 PM · editquality-modeling, ORES, Epic, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Groceryheist closed T200898: Analyze the effects of ORES deployments on counter-vandalism behavior as Resolved.
May 26 2021, 5:10 PM · ORES, research-ideas
Groceryheist closed T221870: Why are there three Q-marks (???) in threshholds in Special:ORESModels? as Resolved.
May 26 2021, 5:10 PM · Growth-Team-Filtering, Growth-Team, Machine-Learning-Team, ORES, MediaWiki-extensions-ORES
Groceryheist changed the status of T221870: Why are there three Q-marks (???) in threshholds in Special:ORESModels? from Open to Stalled.
May 26 2021, 5:10 PM · Growth-Team-Filtering, Growth-Team, Machine-Learning-Team, ORES, MediaWiki-extensions-ORES
Groceryheist changed the status of T221890: Add wikidata ids to data lake tables from Open to Stalled.

I'm not available to work on this, @JAllemandou's data served my purpose but it seems like there was some interest in maintaining a table like this.

May 26 2021, 5:10 PM · Epic, Analytics, Product-Analytics
Groceryheist changed the status of T221890: Add wikidata ids to data lake tables, a subtask of T212172: Provide feature parity between the wiki replicas and the Analytics Data Lake, from Open to Stalled.
May 26 2021, 5:09 PM · Epic, Analytics, Product-Analytics
Groceryheist closed T225441: Qualitative data collection for ores bias analysis, a subtask of T225134: Find out what tools are used for making reverts on the ores-enabled wikis., as Resolved.
May 26 2021, 5:08 PM · editquality-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Groceryheist closed T225441: Qualitative data collection for ores bias analysis as Resolved.
May 26 2021, 5:08 PM · artificial-intelligence
Groceryheist closed T230642: Publish aggregated reading time dataset as Resolved.
May 26 2021, 5:08 PM · Analytics-Radar, Reading Depth

Oct 14 2020

Groceryheist added a comment to T264255: Review request for data export.

Great! Can I get some help transferring this data?

Oct 14 2020, 11:49 PM · Analytics-Kanban, Security, Analytics

Aug 7 2020

Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

Thank you @akosiaris. For your info, my revisions deadline was just extended from September 9 to October 15th. I'll be able to wrap up this project by the end of September. But I might still be working after the 9th.

Aug 7 2020, 5:37 PM · Analytics

Aug 4 2020

Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

I got reviews back today. They are pretty positive, but one reviewer asked for additional summary statistics that weren't originally reported. I'm also thinking of adding a small analysis to the appendix that will help address another reviewer's concern. The revisions are due September 9th. Can we renew an extension through then? Thank you!

Aug 4 2020, 12:01 AM · Analytics

Jul 27 2020

Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

This all sounds fine with me. No CSCW reviews yet. I'll update this thread when the time comes.

Jul 27 2020, 11:41 PM · Analytics

Jul 24 2020

Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

Okay,
For the project with @Halfak any risks would arise from the internal histories of historical ores scores of revisions. The rest of the data used in the project is the publicly available wikimedia histories and derived from the open source ores projects.. I expect that this will be low-risk since I'm aware of plans to release the ores scores publicly. I'm not sure, but this might relate to other fields in the ores scores table I used. In the data that I saved in my home directory, the only fields I kept from the scores table are the revision scores.

Jul 24 2020, 8:07 PM · Analytics
Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

Hi @leila,
I don't have reviews back yet. I expect them to come next week. If it's a hassle we can wait and see if the reviewers want anything. There were a few things I wanted to double-check in any case but that isn't essential. Even if I don't need access to the data to address reviews, I'd like to have a copy of my home directory on stat1006.

Jul 24 2020, 5:33 PM · Analytics

Jul 23 2020

Groceryheist updated subscribers of T256356: Check home/HDFS leftovers of nathante.

@Nuria this is work with the scoring team so I think the WMF collaborator is likely to be @calbon or someone on their team.

Jul 23 2020, 9:40 PM · Analytics
Groceryheist added a comment to T256356: Check home/HDFS leftovers of nathante.

Thanks @elukey and @Halfak for catching this. I'm not actively working on this project, but I expect that reviews will come back next week and I may have some work to do for the revisions. I also want to package up code and publicly available data for public release.

Jul 23 2020, 8:15 PM · Analytics

Dec 31 2019

Groceryheist added a project to T241651: Need new temporary kerberos password: Analytics.
Dec 31 2019, 5:25 PM · Analytics
Groceryheist created T241651: Need new temporary kerberos password.
Dec 31 2019, 5:25 PM · Analytics

Nov 18 2019

Groceryheist added a comment to T237605: Create kerberos principals for users.

Hi! here's my request for Kerberos credentials for Hadoop access on stat100X and notebook100X. My username is nathante. I'm a volunteer researcher and I've contracted with the WMF.

Nov 18 2019, 7:39 PM · Analytics-Kanban, Analytics

Oct 15 2019

Groceryheist added a comment to T235526: Update prod SSH key for nathante .

Thanks!

Oct 15 2019, 6:14 PM · SRE, SRE-Access-Requests
Groceryheist created T235526: Update prod SSH key for nathante .
Oct 15 2019, 3:14 PM · SRE, SRE-Access-Requests

Sep 19 2019

Groceryheist added a comment to T223900: Create ORES dataset for huwiki edits in the last two years or so.

Hey @Tgr. Here's my best guess at what the historical thresholds were and when they changed. Missing values indicate that no threshold was set for a given class of edit. These numbers are the result of a fairly complicated process based on parsing old configuration files and loading old versions of the models and I'm still troubleshooting some aspects of it. So I will really appreciate it if you can let me know whether this looks right to you. Thanks!

Sep 19 2019, 2:44 AM · Hungarian-Sites, artificial-intelligence, Machine-Learning-Team, editquality-modeling, User-Tgr

Sep 18 2019

Groceryheist added a comment to T223900: Create ORES dataset for huwiki edits in the last two years or so.

@Tgr, It sounds like you have the old scores right?

Sep 18 2019, 1:15 AM · Hungarian-Sites, artificial-intelligence, Machine-Learning-Team, editquality-modeling, User-Tgr
Groceryheist added a comment to T223900: Create ORES dataset for huwiki edits in the last two years or so.

Hi Tgr. I'm working on this! I should be able to send the threshholds over in the next day or so. This is very much a research project so buyer beware! You can checkout my code at https://github.com/groceryheist/ores_bias_project/blob/master/ores_archaeologist.py

Sep 18 2019, 1:13 AM · Hungarian-Sites, artificial-intelligence, Machine-Learning-Team, editquality-modeling, User-Tgr

Sep 10 2019

Groceryheist added a comment to T232068: notebook1004 - /srv is full.

Hey @Ottomata,
it turns out that I think stat1006 is a better fit for my purposes since it has ORES dependencies (mainly hunspell) that were missing on the notebook machine. So once I finish moving that part of the project over to stat1006 I can reduce my usage on notebook1004.

Sep 10 2019, 1:34 AM · SRE, Analytics-Clusters

Sep 9 2019

Groceryheist added a comment to T226426: Build tool to guess what tool was used to make reverts on Wikimedia wikis .

Yeah I would say so. There's always room to improve it i.e. to support more tools and wikis. I also haven't done quality checks for wikis that aren't in our study.

Sep 9 2019, 6:31 PM · Machine-Learning-Team (Active Tasks)
Groceryheist added a comment to T232068: notebook1004 - /srv is full.

I deleted a couple Gb that I don't need. Unfortunately most of the space I'm using is from ORES assets so I can't really store it in Hadoop. Maybe I should move this work to a different machine with more space? It's a bit inconvenient to have to move between different machines for different tasks though. Is my current usage OK for the month or so? After that I'll clean up.

Sep 9 2019, 4:54 PM · SRE, Analytics-Clusters

Aug 22 2019

Groceryheist added a comment to T229042: Reading_depth: deactivate eventlogging instrumentation.

Sorry I lost track of this bug until today. I think it is really regrettable to turn off the instrumentation. The utility of the data is greatly lessened by gaps in the collection window.My understanding is that the instrument should only send two events for each page view. The sampling rate has been quite high at 10%, explaining the high number of events.

Aug 22 2019, 7:35 AM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Performance-Team (Radar), Web-Team-Backlog, Analytics-Radar, Reading Depth, Product-Analytics

Aug 21 2019

Groceryheist added a comment to T230642: Publish aggregated reading time dataset .

Thanks Nuria!

Aug 21 2019, 10:23 AM · Analytics-Radar, Reading Depth

Aug 17 2019

Groceryheist added a comment to T230642: Publish aggregated reading time dataset .

Hi Nuria. I'm proposing to start with a one-off release that I can handle easily. I can also do some work to set up automated scheduled releases, but I don't want to commit to owning it in the long run.

Aug 17 2019, 12:24 PM · Analytics-Radar, Reading Depth
Groceryheist created T230642: Publish aggregated reading time dataset .
Aug 17 2019, 2:10 AM · Analytics-Radar, Reading Depth

Jul 26 2019

Groceryheist updated subscribers of T229042: Reading_depth: deactivate eventlogging instrumentation.

I do not think we should be in a rush to remove this instrumentation.

Jul 26 2019, 9:51 PM · MW-1.36-notes (1.36.0-wmf.9; 2020-09-15), Performance-Team (Radar), Web-Team-Backlog, Analytics-Radar, Reading Depth, Product-Analytics

Jul 3 2019

GitHub <noreply@github.com> committed rOEQ5e4744804831: Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into… (authored by Groceryheist).
Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into…
Jul 3 2019, 10:22 PM

Jun 25 2019

Groceryheist created T226574: Add feature for edit namespace to edit quality models.
Jun 25 2019, 9:00 PM · artificial-intelligence, editquality-modeling, Machine-Learning-Team

Jun 24 2019

GitHub <noreply@github.com> committed rOEQ83660b5b61ca: Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into… (authored by Groceryheist).
Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into…
Jun 24 2019, 10:52 PM
Groceryheist created T226426: Build tool to guess what tool was used to make reverts on Wikimedia wikis .
Jun 24 2019, 4:15 PM · Machine-Learning-Team (Active Tasks)

Jun 13 2019

Groceryheist added a comment to T225692: Pyarrow hdfs interface does not work in SWAP.

Thank you!

Jun 13 2019, 3:25 PM · Analytics-Kanban, Analytics
Restricted Application removed a project from T225692: Pyarrow hdfs interface does not work in SWAP: Patch-For-Review.
Jun 13 2019, 6:29 AM · Analytics-Kanban, Analytics

Jun 11 2019

Groceryheist updated subscribers of T225441: Qualitative data collection for ores bias analysis.

I'm making a list of people who helped with labeling campaigns for the different ores projects.

Jun 11 2019, 1:23 AM · artificial-intelligence

Jun 10 2019

Groceryheist awarded T186559: Provide data dumps in the Analytics Data Lake a Love token.
Jun 10 2019, 7:52 PM · Analytics
Groceryheist created T225441: Qualitative data collection for ores bias analysis.
Jun 10 2019, 4:16 PM · artificial-intelligence

Jun 7 2019

GitHub <noreply@github.com> committed rOEQ83f186b6ae3b: Merge pull request #201 from wikimedia/jawiki (authored by Groceryheist).
Merge pull request #201 from wikimedia/jawiki
Jun 7 2019, 4:34 AM

Jun 5 2019

Groceryheist added a comment to T225133: Look at recent changes filters event log to track usage.

The changeslisthighlights and changeslistfilters schemas were deleted along with the data. So we don't have the data that we would want to have for this.

Jun 5 2019, 11:02 PM · editquality-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Groceryheist created T225134: Find out what tools are used for making reverts on the ores-enabled wikis..
Jun 5 2019, 6:50 PM · editquality-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Groceryheist created T225133: Look at recent changes filters event log to track usage.
Jun 5 2019, 6:49 PM · editquality-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)

Jun 4 2019

GitHub <noreply@github.com> committed rOEQ5d2dec886e8a: Merge pull request #196 from wikimedia/zhwiki (authored by Groceryheist).
Merge pull request #196 from wikimedia/zhwiki
Jun 4 2019, 3:17 AM

Jun 3 2019

Groceryheist updated the task description for T224902: Fit models for revert prediction.
Jun 3 2019, 5:58 PM · editquality-modeling, ORES, artificial-intelligence
Groceryheist added a comment to T224901: ORES bias analysis.

I created a task T224918 for that analysis.

Jun 3 2019, 5:55 PM · editquality-modeling, ORES, Epic, artificial-intelligence, Machine-Learning-Team (Active Tasks)
Groceryheist created T224918: Visualize the relationship between the probability of reversion and ores scores .
Jun 3 2019, 5:54 PM · editquality-modeling, ORES, artificial-intelligence, Machine-Learning-Team (Active Tasks)

Jun 2 2019

GitHub <noreply@github.com> committed rOEQb0cee05a69f0: Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into… (authored by Groceryheist).
Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into…
Jun 2 2019, 4:59 AM
GitHub <noreply@github.com> committed rOEQ9ef2f950e0d4: Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into… (authored by Groceryheist).
Merge 83f8ae89af62064d03808f8e09bfb61b20e8e915 into…
Jun 2 2019, 4:58 AM
Groceryheist committed rOEQ83f8ae89af62: add the model infor for the enwiki reverted model..
add the model infor for the enwiki reverted model.
Jun 2 2019, 4:58 AM
Groceryheist committed rOEQc6e982823bc4: change enwiki.reverted model to logistic regression..
change enwiki.reverted model to logistic regression.
Jun 2 2019, 4:41 AM

May 30 2019

GitHub <noreply@github.com> committed rOEQedf3bf8b112c: Merge pull request #197 from wikimedia/nlwiki (authored by Groceryheist).
Merge pull request #197 from wikimedia/nlwiki
May 30 2019, 9:29 PM
GitHub <noreply@github.com> committed rOEQb6f4742e81c3: Merge pull request #195 from wikimedia/srwiki_goodfaith_fix (authored by Groceryheist).
Merge pull request #195 from wikimedia/srwiki_goodfaith_fix
May 30 2019, 8:51 PM
GitHub <noreply@github.com> committed rOEQ44e81bdbabf3: Merge pull request #192 from wikimedia/eswikiversity (authored by Groceryheist).
Merge pull request #192 from wikimedia/eswikiversity
May 30 2019, 8:19 PM

May 13 2019

Groceryheist updated subscribers of T222933: Upgrade R in SWAP notebooks to 3.4+.
May 13 2019, 4:08 PM · Data-Engineering-Jupyter, Analytics

May 10 2019

Groceryheist created T222933: Upgrade R in SWAP notebooks to 3.4+.
May 10 2019, 2:10 AM · Data-Engineering-Jupyter, Analytics

May 3 2019

mpopov awarded T221890: Add wikidata ids to data lake tables a Like token.
May 3 2019, 2:34 PM · Epic, Analytics, Product-Analytics
Groceryheist added a comment to T222301: Upgrade pandas in spark SWAP notebooks.

Ok I see. A hostile dependency could be a big problem. I'm not looking to argue, just sincerely curious. I get involved managing a sort of ad-hoc spark setup on the UW cluster, so maybe I can learn something useful :)

May 3 2019, 7:24 AM · Analytics-Kanban, Analytics
Groceryheist added a comment to T222301: Upgrade pandas in spark SWAP notebooks.

Having said this, Andrew is planning to work on the Spark 2.4.2 upgrade and he will take a look if pandas could be upgraded as well :)

May 3 2019, 7:05 AM · Analytics-Kanban, Analytics

May 2 2019

Groceryheist added a comment to T222301: Upgrade pandas in spark SWAP notebooks.

I see, for Python packages I usually use pip instead of Debian since python tends to move much faster than Debian. Of course, I'm just managing this for myself and not supporting a whole organization :), But I'm also curious about why you use Debian for this.

May 2 2019, 3:54 PM · Analytics-Kanban, Analytics

May 1 2019

Groceryheist created T222301: Upgrade pandas in spark SWAP notebooks.
May 1 2019, 7:49 PM · Analytics-Kanban, Analytics
Groceryheist added a comment to T222254: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow.

@elukey, thanks. It seems like I'm experiencing a regression then. I can work around it for now. See you tomorrow!

May 1 2019, 3:55 PM · Analytics-Kanban, Analytics-Clusters
Groceryheist created T222254: Pyspark on SWAP: Py4JJavaError: Import Error: no module named pyarrow.
May 1 2019, 6:48 AM · Analytics-Kanban, Analytics-Clusters
Groceryheist created T222253: Upgrade Spark to 2.4.x.
May 1 2019, 4:54 AM · Analytics-Kanban, Analytics-Clusters

Apr 29 2019

Groceryheist added a comment to T221890: Add wikidata ids to data lake tables.

@Nuria yes. My understanding is that they are when pp_propname == "wikibase_item"

Apr 29 2019, 8:21 PM · Epic, Analytics, Product-Analytics
Groceryheist added a comment to T221890: Add wikidata ids to data lake tables.

My ultimate goal is to identify, from a random sample of ~500,000 to ~50,000,000 edits from different language Wikipedias.

  1. Which edits are to biographical articles.
  2. The gender or sex of the subject of the biographical articles.
Apr 29 2019, 5:06 PM · Epic, Analytics, Product-Analytics

Apr 26 2019

Groceryheist added a comment to T221890: Add wikidata ids to data lake tables.

Thank you Nuria. Are you saying that we'll be able to sqoop the prop_tables in May at the earliest? Would it be okay to lookup a sizable number of pages in the prop_tables in the meantime? I'm thinking on the order of 20,000 pages per language.

Apr 26 2019, 4:58 AM · Epic, Analytics, Product-Analytics
Groceryheist added a comment to T221870: Why are there three Q-marks (???) in threshholds in Special:ORESModels?.

Also
https://sr.wikipedia.org/wiki/Special:ORESModels has a strange threshold (0,1) for goodfaith.

Apr 26 2019, 12:40 AM · Growth-Team-Filtering, Growth-Team, Machine-Learning-Team, ORES, MediaWiki-extensions-ORES

Apr 25 2019

Groceryheist updated the task description for T221890: Add wikidata ids to data lake tables.
Apr 25 2019, 7:51 PM · Epic, Analytics, Product-Analytics
Groceryheist created T221890: Add wikidata ids to data lake tables.
Apr 25 2019, 7:50 PM · Epic, Analytics, Product-Analytics
Groceryheist updated subscribers of T221870: Why are there three Q-marks (???) in threshholds in Special:ORESModels?.
Apr 25 2019, 4:01 PM · Growth-Team-Filtering, Growth-Team, Machine-Learning-Team, ORES, MediaWiki-extensions-ORES
Groceryheist created T221871: Non-overlapping threshholds in ORESModels on lvwiki.
Apr 25 2019, 4:00 PM · Growth-Team (Sprint 0 (Growth Team)), ORES, MediaWiki-extensions-ORES, Machine-Learning-Team
Groceryheist created T221870: Why are there three Q-marks (???) in threshholds in Special:ORESModels?.
Apr 25 2019, 3:57 PM · Growth-Team-Filtering, Growth-Team, Machine-Learning-Team, ORES, MediaWiki-extensions-ORES

Apr 23 2019

Groceryheist added a comment to T212172: Provide feature parity between the wiki replicas and the Analytics Data Lake.

Wikipedia-to-Wikidata linkage patterns (T209891#4798717, using the page_props table)

Apr 23 2019, 8:40 PM · Epic, Analytics, Product-Analytics

Apr 18 2019

Groceryheist closed T221398: Install aspell for ORES languages on STAT1006 as Resolved.
Apr 18 2019, 7:00 PM · ORES, Machine-Learning-Team
Groceryheist claimed T221398: Install aspell for ORES languages on STAT1006.
Apr 18 2019, 7:00 PM · ORES, Machine-Learning-Team
Groceryheist added a comment to T221398: Install aspell for ORES languages on STAT1006.

@Ladsgroup Oh sweet thanks I'll do that.

Apr 18 2019, 6:59 PM · ORES, Machine-Learning-Team
Groceryheist created T221398: Install aspell for ORES languages on STAT1006.
Apr 18 2019, 6:50 PM · ORES, Machine-Learning-Team
GitHub <noreply@github.com> committed rORES099794334c6b: Merge 2ed740a6a142e4587c87a5b5f3944c3625445b0a into… (authored by Groceryheist).
Merge 2ed740a6a142e4587c87a5b5f3944c3625445b0a into…
Apr 18 2019, 12:38 AM
Groceryheist committed rORES2ed740a6a142: Fix for #325: Score_revisions.py doesn't respect output parameter..
Fix for #325: Score_revisions.py doesn't respect output parameter.
Apr 18 2019, 12:38 AM

Apr 9 2019

Groceryheist claimed T200898: Analyze the effects of ORES deployments on counter-vandalism behavior.

@Harej Indeed. I was already planning to do something very similar to this in the course of my project. I may be actively working on some of these subtasks starting next week.

Apr 9 2019, 10:21 PM · ORES, research-ideas

Nov 20 2018

Groceryheist added a comment to T209051: ReadingDepth schema is whitelisting both session ids and page ids.

A handful of thoughts:

Nov 20 2018, 1:23 AM · Analytics-Radar

Nov 18 2018

Groceryheist updated the task description for T160492: Conduct further data quality checks on the ReadingDepth schema.
Nov 18 2018, 12:09 AM · Web-Team-Backlog (Tracking), Reading Depth, Product-Analytics, Reading-analysis

Nov 17 2018

Groceryheist updated the task description for T160492: Conduct further data quality checks on the ReadingDepth schema.
Nov 17 2018, 9:59 PM · Web-Team-Backlog (Tracking), Reading Depth, Product-Analytics, Reading-analysis

Nov 2 2018

Groceryheist added a comment to T208275: Add revision ID to ReadingDepth Schema and Data.

Good question. I don't think so, unless there are additional schemas that we might need to join with that have keys other than page_id or revision_id. We already record namespace.

Nov 2 2018, 5:41 AM · Web-Team-Backlog

Nov 1 2018

Groceryheist created T208478: Red links in ReadingDepth data.
Nov 1 2018, 5:05 AM · Web-Team-Backlog

Oct 31 2018

Groceryheist added a comment to T208275: Add revision ID to ReadingDepth Schema and Data.

A related problem is that pages can move. Right now we record page_title, but different pages can have the same_page title at different times. It would also make downstream analysis much more convenient to have page_id in the schema.

Oct 31 2018, 4:29 AM · Web-Team-Backlog

Oct 29 2018

Groceryheist renamed T208275: Add revision ID to ReadingDepth Schema and Data from Add revision ID to ReadingDepth Schema to Add revision ID to ReadingDepth Schema and Data.
Oct 29 2018, 11:10 PM · Web-Team-Backlog
Groceryheist created T208275: Add revision ID to ReadingDepth Schema and Data.
Oct 29 2018, 11:09 PM · Web-Team-Backlog

Sep 26 2018

Groceryheist updated the task description for T160492: Conduct further data quality checks on the ReadingDepth schema.
Sep 26 2018, 11:20 PM · Web-Team-Backlog (Tracking), Reading Depth, Product-Analytics, Reading-analysis
Groceryheist updated the task description for T160492: Conduct further data quality checks on the ReadingDepth schema.
Sep 26 2018, 11:14 PM · Web-Team-Backlog (Tracking), Reading Depth, Product-Analytics, Reading-analysis

Sep 25 2018

Groceryheist closed T204790: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users as Resolved.

Created task https://phabricator.wikimedia.org/T205454 for LDAP access

Sep 25 2018, 4:20 PM · Patch-For-Review, SRE, SRE-Access-Requests
Groceryheist created T205454: LDAP Access request for Nathan TeBlunthuis (groceryheist / nathante).
Sep 25 2018, 4:19 PM · LDAP-Access-Requests
Groceryheist reopened T204790: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users as "Open".

I still don't have access to SWAP. I understand that I need to be added to the nda LDAP group.

Sep 25 2018, 4:13 PM · Patch-For-Review, SRE, SRE-Access-Requests

Sep 19 2018

Groceryheist added a comment to T204790: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users.

@RobH: Great. Thanks!

Sep 19 2018, 8:01 PM · Patch-For-Review, SRE-Access-Requests, SRE
Groceryheist added a comment to T204790: nathante/groceryheist shell request for researchers, statistics-privatedata-users, analytics-privatedata-users.

Here's the contract that I signed and sent to @ovasileva : REMOVED
It includes a "Contractor Confidentiality Agreement. Is this the NDA we are looking for?
Per the contract, the end date is November 16th 2018.

Sep 19 2018, 7:57 PM · Patch-For-Review, SRE-Access-Requests, SRE