Page MenuHomePhabricator

Initial population of Librarybase
Closed, DeclinedPublic

Description

Goal: To have an initial population of useful citation metadata to do some preliminary analyses of source use on Wikipedia. Each source item should include:

  • Which Wikimedia project article(s) it appears on
  • If part of a broader publication, an accompanying item on that publication.
  • IDs for other databases (e.g. DOI) so that Librarybase can be hooked into other services.

This initial population will be done by importing @Halfak's DOI dataset. Once that dataset is ready, we can begin a mass import into Librarybase.

Event Timeline

Harej raised the priority of this task from to Medium.
Harej updated the task description. (Show Details)
Harej added subscribers: jrbs, EdErhart-WMF, ThatAndromeda and 9 others.
Halfak added a subscriber: Aubrey.

I've already imported a couple of hundred items about articles from PMC into librarybase and hope to add the remaining ones that are cited in en-wiki in the next few days.

@jayvdb offered us a dataset he already has in a private wikibase install that comprises (correct me if I'm wrong) the entire publication output of Australian public universities. Would any one (particularly @Harej) mind importing this as well?

At [[w:en:Template:Cite doi]] there are ~65k subpages of DOIs that are relevant to WP articles' citations. The template itself is deprecated for a number of reasons, but it might be a useful sampling to draw upon if it can be imported somehow.

At WikiCite @Halfak came up with a strategy for mass-collecting metadata based on the DOIs extracted from Wikipedia articles. I also worked on matching the Librarybase schema with the Wikidata schema through a Librarybase property P22 that associated a Wikidata entity with a Librarybase entity. I asked @Tarrow to mass-create the missing Librarybase properties.