Page MenuHomePhabricator

Understand steps to get structured data from Excel-workbook into Wikidata
Closed, ResolvedPublic

Assigned To
None
Authored By
OlafJanssen
Oct 11 2018, 11:26 AM
Referenced Files
F26889828: afbeelding.png
Oct 27 2018, 2:21 PM
F26889597: afbeelding.png
Oct 27 2018, 2:18 PM
F26887125: Knipsel.PNG
Oct 27 2018, 1:48 PM
F26887174: openRefine schema.png
Oct 27 2018, 1:43 PM
F26519479: 0003-BookSpot-ECI-AKO.xlsx
Oct 11 2018, 11:33 AM

Description

{F26519479}I have an Excel workbook (with 5 sheets) containing structured data about Dutch literary awards

For each award:

  1. Overall info about the award: name, description, years, initiator, sponsors, organizors, namegivers, Wikipedia-URL, Wikidata-Qnumbers etc
  2. Detailed info per year: prize money, date and venue of award ceremony, related WP and WD URLs
  3. Nominees (short list): author names, title of nominated work, related WP and WD URLs
  4. Winners: author names, title of winning work, related WP and WD URLs
  5. Jury members: names, roles, related WP and WD URLs

See this sample Excel to get a better understanding, it describes the BookSpot award (and its predecessors)

This is a much richer dataset than currently listed in most Wikidata-items about Dutch literary awards (like in this item about the BookSpot award)

I want to gain a better understanding of the steps needed to bring the data from the Excel into Wikidata

Event Timeline

Harmonia_Amanda subscribed.

Come see me at some point, I can help you!
Basically:

  • formating your spreadsheet in a Wikidata-friendly way
  • use CSV2QS to convert to QuickStatements (if the data is too complex to be handled directly by QS)
  • run QuickStatements

Edit: the most difficult part is usually not to learn how to format the spreadsheet, it's identifying all relevant Wikidata items.

OlafJanssen updated the task description. (Show Details)

Thanks for the very fast response Harmonia_Amanda.

I'm not yet familiar with QS, but I'll check it out before the event.

I'll also gather a list of all relevant WD-items (both already exisiting and yet to be made) related to that Bookspot award (actually, they are already in the Excel..)

I seemed to have removed you from this task, which I did not intend. I'll assign it to you again...

This tool (https://tools.wmflabs.org/ash-dev/wdutils/csv2quickstatements.php) has examples on how to format things. The missing items can be created using the same method then used to complete the rest.

Last week I discovered OpenRefine (www.openrefine.org), which was pretty much exactly the tool I was looking for. It has built in support for Wikidata and QuickStatement, and you can very easily publish data in an Excel to Wikidata.

Hi guys,

I would very much like to join you, since I have a similar task I want to work on.
See T207839
Olaf, I have started working with OpenRefine to, but if you could show me how reconciliation work I would really appreciate it!

See you on Friday!

Hi SIryn,

reconciliation is exactly what I'm finding/fiddling out at this moment as well. I find these manuals quite useful:

Hi!

I will be available to help with OpenRefine. It is exactly designed for this workflow indeed so I hope it will be a match :)
For reconciliation help, have you seen this page?
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation

If there is anything unclear there I would be interested to know how to improve it :)

Antonin

Great, thank you @Pintoch and @OlafJanssen! I look forward to meeting you!

uploaded the first 50 writers from the 6000+ on the file!

This item can be closed, I know by now how to to achieve the goals mentioned using OpenRefine indeed.