On most wikis, policies/guidelines/rules/etc are kept on wiki pages within the project namespace (#4). On English Wikipedia this is Wikipedia: For example, the policy on civility can be found at https://en.wikipedia.org/wiki/Wikipedia:Civility.
Users often link to these pages when they are discussing a case on ANI. These links can be formatted or abbreviated in several ways, but for our needs it is safe to assume that [[Wikipedia:*]], [[WP:*]], [[Wikipedia:*| and [[WP:*| (with the asterisk acting as a wild card of any number of characters) are the only types you need to identify. We do not need to know the display text — just the link target. (Alternatively, you may find that parsing the HTML of the archive pages may be simpler. 🤷🏻♂️)
We would like to know more about how these links are used. Frequency and popularity. We would like to parse the archived cases from April 1 through May 31 2017 (950 through 956.)
Requested deliverable
- A report of the distribution of which links are used the most (and how many times they are used
- TBD, if feasible: A report of:
- Case title
- anchored URL to case
- resolved? (boolean)
- list of which links are used in the case
Don't worry about deduplicating the links.