Page MenuHomePhabricator

Compile data about books referenced on svwp
Closed, ResolvedPublic

Description

Compile data about books referenced on Swedish wikipedia.

Dumps: https://dumps.wikimedia.org/svwiki/

First extract all uses of Template:Bokref https://sv.wikipedia.org/wiki/Template:Bokref

Then try to identify unique books used. Obviously the same book can be refered to with different template usages, so we need to identify them via ISBN and Libris no.

Look at T205384 which probably includes large chunks of code/workflow to do this. https://figshare.com/articles/Wikipedia_Scholarly_Article_Citations/1299540 was used previously to grab ISBN sources, but it's old :)

Event Timeline

Downloaded all the instances of {{bokref}} from the Oct 1 dump and ran some quick stats.

All template instances: 60,661
Using libris: 13,038
Unique libris id's used: 5,567

Using isbn: 27,937
Unique isbn: 13,511

Of the 5 thousands that use libris, some will not be included in NB. Assuming half of them are NB, that's still about 2,500–3,000 entries. There's also an unknown number of instances where only isbn, not libris, was used, and the book is in fact included in NB. Those can be resolved to Libris ID's using the api, e.g. http://api.libris.kb.se/xsearch?query=ISBN:9163050757&format=json