websites, by WikiProject for WikiProject X
Open, MediumPublic
Actions

Assigned To

None

Authored By

	Sadads
	Sep 1 2015, 3:28 PM

Description

At Wikimania, James and I talked about the possibility of using WikiProject X to create recommendations. @Halfak said that his current capabilities for the tool: it would just be a matter of engineering it.

The tool would a)screen the references in top class articles for a WikiProject (likely FA, GA, B), b) identify the most frequently used sources in that topic through comparing either urls, or titles of certain reference fields- whether by journal (i.e. The Lancet), website (i.e. Newspapers.com), identifiers (doi, for example https://gist.github.com/hubgit/5974843) or publisher/via (i.e. JSTOR, Project Muse, etc), and c) recommend those resources to editors as places to start their research -> rather than what is happening now which is either manually curated lists, or relying heavily on editors previous knowledge of a field or research -> neither of which are reliable "guarantees" of quality research strategies.

The main risk here, is that the tool isn't used and that the recommendations tend to be very generic (such as Google Books).

Additional potential use cases: recommending research starting points in unreferenced tags, based on WikiProject or categories; recommending TWL and/or open access sources to newish editors.

Useful links:
*Capability to figure out article quality in WP articles: https://meta.wikimedia.org/wiki/ORES/wp10
*Cability to extract structured citation information: https://meta.wikimedia.org/wiki/Research:Scholarly_article_citations_in_Wikipedia

Related Objects
Search...

Status	Assigned	Task
Open	None	T111066 Create recommendations for databases/journals/websites, by WikiProject for WikiProject X
Resolved	yuvipanda	T111141 Request for Labs project LibraryBase
Declined	None	T120115 Initial population of Librarybase
Open	None	T99046 Retrieve DOI metadata and identify non-resolving DOIs.

Event Timeline

Sadads created this task.Sep 1 2015, 3:28 PM

Sadads raised the priority of this task from to Needs Triage.

Sadads updated the task description. (Show Details)

Sadads added a project: WikiProject-X.

Sadads added subscribers: Sadads, Halfak, Harej.

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptSep 1 2015, 3:28 PM

Harej triaged this task as Medium priority.Sep 1 2015, 10:19 PM

Harej set Security to None.

Thanks @Harej!

I would like to create a structured database of Wikipedia citations. This includes the citation data itself (think OCLC on steroids), but also information about where the citation appears on Wikipedia. Implementation-wise this could manifest itself as a MediaWiki+Wikibase instance on Labs combined with a script that pulls citation data from Wikipedia. The script would (a) pull anything with a citation template and parse it; (b) pull anything between <ref> tags (and run incomplete citations through Parsoid), and (c) map out the data according to a schema. In principle this could be done on Wikidata but the level of granularity I want may be overkill for Wikidata's purposes.

With that infrastructure in place, a script could compare its index of WikiProjects and articles to entries in this database and provide information on the sources that are used the most by the highest-quality articles. The list could then be coupled with links to the Wikipedia Library "library card" system, itself containing a workflow encouraging people to sign up. (Or if the text is available on Wikisource or Commons, we could link to that. Paging @Daniel_Mietchen)

This approach would allow for applications far beyond WikiProjects, and it would provide a long-term solution to other citation-related issues. For example, when a journal article is retracted, it would be useful to see which articles cite that journal article. It also would help us get insight on sources used on other projects, since different language versions of articles are linked through their Wikidata item.

The Wikipedia Library is very much interested in the historical changes in citation data as well (especially when it comes to most cited works, and our particular partner sources). See https://phabricator.wikimedia.org/T102064

Sadads added a project: The-Wikipedia-Library.Sep 2 2015, 4:38 AM

Harej mentioned this in T111141: Request for Labs project LibraryBase.Sep 2 2015, 4:39 AM

Harej added a parent task: T111141: Request for Labs project LibraryBase.

Harej removed a parent task: T111141: Request for Labs project LibraryBase.

Harej added a subtask: T111141: Request for Labs project LibraryBase.

yuvipanda closed subtask T111141: Request for Labs project LibraryBase as Resolved.Sep 2 2015, 2:57 PM

Harej moved this task from Needs Triage to In Progress on the WikiProject-X board.Sep 2 2015, 9:15 PM

@Halfak Do you know if anyone else would be interested in working with @Harej on the reference database? Do we need to be reaching out to anyone else to include on working on that?

Moreover, does your current strategy for extracting the reference data pair well with his intended use of the data?

Also adding @Mvolz , who might be interested in this for Citoid: we could be, for instance, fixing repeated citation scraping errors coming out of Zotero as editors manually create good citations for that source.

/me puts on his volunteer hat

I'm working on some utilities now that will likely be relevant to extracting and processing <ref>s and identifiers historically. See https://github.com/mediawiki-utilities/python-mwrefs and https://github.com/mediawiki-utilities/python-mwcites . I was just discussing plans with @Harej in Research. I plan to prioritize the tooling you guys need in those utilities.

Thanks @Halfak! Looking forward to you working on this.

I also wanted to add @Jdforrester-WMF . This might be of interest to Citoid, especially if https://phabricator.wikimedia.org/T111141 is the strategy used to back the recommendations.

@Harej what is the timeline or next steps beyond the Wikibase and @Halfak's work?

@Harej Also, just discovered: https://www.wikidata.org/wiki/Wikidata:WikiProject_Source_MetaData Wouldn't want to be replicating too much their effort.

I'm familiar with that project and will be working with them.

Another use for a data set like this: http://arxiv.org/pdf/1509.05631v1.pdf , Verifiability metrics :)

Harej mentioned this in T99046: Retrieve DOI metadata and identify non-resolving DOIs..Oct 2 2015, 3:55 AM

• DarTar subscribed.Oct 2 2015, 2:21 PM

Tarrow subscribed.Oct 14 2015, 10:40 AM

Harej moved this task from In Progress to Stalled on the WikiProject-X board.Oct 20 2015, 5:46 AM

Harej moved this task from In Progress to Stalled on the WikiProject-X board.

Harej moved this task from Stalled to Requests on the WikiProject-X board.Oct 26 2015, 7:06 PM

Hi,

I'd be interested in working on creating a reference database. I'm particularly interested in tracking the usage of PMIDs (I'm about to start a short project with EuropePMC) rather than DOIs but it seems silly to replicate work.

I've got the output of @Halfak 's mwcites on a recent dump of enwiki on tools and am just thinking about importing the results into a wikibase install.

@Tarrow, I'm considering adding some generalized metadata extraction to mwcites (and integrating it into the more general mwrefs) at the hackathon. See https://phabricator.wikimedia.org/T114247 I've already got some people working on DOIs. Maybe we could work together on making the metadata extractor for PMIDs easier to use at the hacka-summit. :)

Hello @Tarrow! I would be happy to work with you on integrating your work into Librarybase, a Wikibase instance I set up for exactly this kind of thing: http://librarybase.wmflabs.org

• DarTar added a project: WikiCite.Nov 1 2015, 6:48 PM

ThatAndromeda subscribed.Nov 6 2015, 4:09 PM

EdErhart-WMF subscribed.Nov 6 2015, 7:35 PM

jrbs subscribed.Nov 17 2015, 8:59 PM

Hey @Harej, @Tarrow, @Halfak whats the status on this? Is there a direction and/or progress? Can I help with anything?

I see this as blocked on an initial import to LibraryBase. @Harej, what's the most apt. card for that?

@Halfak, that would be this one ---> T120115

Very exciting! Keep up the good work!

Harej moved this task from Requests to Stalled on the WikiProject-X board.Jan 5 2016, 12:57 AM

For presentation, it is worth looking at how they are done for "WikiProject libraries": https://en.wikipedia.org/wiki/Category:WikiProject_libraries

Harej edited projects, added Reports-bot; removed WikiProject-X.Apr 20 2016, 1:22 AM

Harej moved this task from Backlog to Requests on the Reports-bot board.Apr 26 2016, 3:40 AM

• Samwalton9-WMF subscribed.Jul 4 2016, 6:11 PM

Harej added a project: VPS-project-Librarybase.Aug 3 2016, 7:31 PM

Harej moved this task from Backlog to Radar on the VPS-project-Librarybase board.Aug 4 2016, 3:29 AM

I have submitted a grant proposal relevant to this task: https://meta.wikimedia.org/wiki/Grants:Project/Harej/Librarybase:_an_online_reference_library

Quiddity mentioned this in T102064: Create a tool to display and filter data from Schema:ExternalLinksChange.Sep 7 2016, 4:12 PM

Quiddity updated the task description. (Show Details)Sep 9 2016, 7:22 PM

Quiddity mentioned this in T120502: Tools for dealing with citations of withdrawn academic journal articles.

Fuzheado subscribed.Nov 18 2016, 5:00 AM

Harej mentioned this in T155846: Reference recommender system.Feb 2 2017, 5:55 AM

Harej removed a project: VPS-project-Librarybase.Jun 21 2018, 8:24 PM

Harej closed subtask T120115: Initial population of Librarybase as Declined.

• Samwalton9-WMF removed a project: The-Wikipedia-Library.Jun 3 2020, 8:15 AM

Edgars2007 subscribed.Dec 7 2020, 9:15 PM

Aklapper added a subtask: T99046: Retrieve DOI metadata and identify non-resolving DOIs..Dec 5 2022, 3:33 PM

Create recommendations for databases/journals/websites, by WikiProject for WikiProject X Open, MediumPublicActions

Description

Related ObjectsSearch...

Event Timeline

Create recommendations for databases/journals/websites, by WikiProject for WikiProject X
Open, MediumPublic
Actions

Related Objects
Search...