Page MenuHomePhabricator

Generate and share data for "The geography of deletion"
Closed, ResolvedPublic


From Isaac Johnson (GroupLens)
... we would like the final version of all of the deleted Wikipedia articles from the last three years.


  1. Determine which of the deleted articles are spatial. That is, determine how to geotag (i.e. provide coordinates) for articles that were deleted before coordinates were assigned but could have very easily been tied to a latitude-longitude pair. As a side note, hopefully whatever method is developed for this would have broader use. Jake mentioned that it might be used to assist in expanding the project of crowdsourcing photos for geotagged articles.
  2. Analyze the distribution of deleted articles to understand if there is any pattern where certain areas are seeing higher rates of deleted articles than others. This could provide insight into how the definition/application of notability varies spatially.

Trello card: YieDKXlh

  • column: In Progress
  • labels: Community (blue)

Event Timeline

Trellimport added a project: Research.
Trellimport added a subscriber: Halfak.
Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 14 2015, 11:29 PM
ggellerman moved this task from Staged to In Progress on the Research board.May 15 2015, 12:07 AM

Comments from Trello:

2015-04-23 Halfak:
Got some scripts together.

Just waiting for them to finish.

2015-04-08 Halfak:

Response received and greenlight from legal. I'm moving forward.

2015-04-06 Halfak:
Email sent to Isaac explaining the reason for the NDA & MOU. Waiting in his response.

Got forms from Manprit. Sending them on to Isaac.

2015-04-02 Halfak:
I've contacted Michelle in legal to see what it would take to be able to share this deleted text.

ggellerman triaged this task as Low priority.Jun 11 2015, 8:25 PM
ggellerman set Security to None.
ggellerman moved this task from In Progress to Paused on the Research board.Jul 2 2015, 10:16 PM
Halfak moved this task from Paused to In Progress on the Research board.Jul 9 2015, 10:10 PM
Halfak renamed this task from The geography of deletion to Generate and share data for "The geography of deletion".Aug 6 2015, 10:47 PM

Encrypted dataset has been transferred. I confirmed with Isaac that he will not show the dataset to someone else that is not under an NDA with the WMF (e.g he works in the same lab as Morten -- one of our research fellows) and that he plans to keep the data encrypted and only decrypt for processing.

ggellerman closed this task as Resolved.Mar 24 2016, 10:20 PM