Page MenuHomePhabricator

ANI data: report of policy links
Closed, ResolvedPublic3 Estimated Story Points


On most wikis, policies/guidelines/rules/etc are kept on wiki pages within the project namespace (#4). On English Wikipedia this is Wikipedia: For example, the policy on civility can be found at

Users often link to these pages when they are discussing a case on ANI. These links can be formatted or abbreviated in several ways, but for our needs it is safe to assume that [[Wikipedia:*]], [[WP:*]], [[Wikipedia:*| and [[WP:*| (with the asterisk acting as a wild card of any number of characters) are the only types you need to identify. We do not need to know the display text — just the link target. (Alternatively, you may find that parsing the HTML of the archive pages may be simpler. 🤷🏻‍♂️)

We would like to know more about how these links are used. Frequency and popularity. We would like to parse the archived cases from April 1 through May 31 2017 (950 through 956.)

Requested deliverable

  • A report of the distribution of which links are used the most (and how many times they are used
  • TBD, if feasible: A report of:
    • Case title
    • anchored URL to case
    • resolved? (boolean)
    • list of which links are used in the case

Don't worry about deduplicating the links.

Event Timeline

TBolliger added a subscriber: dmaza.

@dmaza — Please review this as well.

dbarratt set the point value for this task to 3.Aug 31 2017, 4:57 PM

Responded in email. LGTM, but checking with Caroline.