Page MenuHomePhabricator

Understand steps to get structured data from Excel-workbook into Wikidata
Closed, ResolvedPublic

Description

{F26519479}I have an Excel workbook (with 5 sheets) containing structured data about Dutch literary awards

For each award:

  1. Overall info about the award: name, description, years, initiator, sponsors, organizors, namegivers, Wikipedia-URL, Wikidata-Qnumbers etc
  2. Detailed info per year: prize money, date and venue of award ceremony, related WP and WD URLs
  3. Nominees (short list): author names, title of nominated work, related WP and WD URLs
  4. Winners: author names, title of winning work, related WP and WD URLs
  5. Jury members: names, roles, related WP and WD URLs

See this sample Excel to get a better understanding, it describes the BookSpot award (and its predecessors)

This is a much richer dataset than currently listed in most Wikidata-items about Dutch literary awards (like in this item about the BookSpot award)

I want to gain a better understanding of the steps needed to bring the data from the Excel into Wikidata

Event Timeline

OlafJanssen updated the task description. (Show Details)
Harmonia_Amanda claimed this task.EditedOct 11 2018, 11:29 AM
Harmonia_Amanda added a subscriber: Harmonia_Amanda.

Come see me at some point, I can help you!
Basically:

  • formating your spreadsheet in a Wikidata-friendly way
  • use CSV2QS to convert to QuickStatements (if the data is too complex to be handled directly by QS)
  • run QuickStatements

Edit: the most difficult part is usually not to learn how to format the spreadsheet, it's identifying all relevant Wikidata items.

OlafJanssen removed Harmonia_Amanda as the assignee of this task.Oct 11 2018, 11:33 AM
OlafJanssen updated the task description. (Show Details)
OlafJanssen updated the task description. (Show Details)Oct 11 2018, 11:38 AM
OlafJanssen added a comment.EditedOct 11 2018, 11:43 AM

Thanks for the very fast response Harmonia_Amanda.

I'm not yet familiar with QS, but I'll check it out before the event.

I'll also gather a list of all relevant WD-items (both already exisiting and yet to be made) related to that Bookspot award (actually, they are already in the Excel..)

I seemed to have removed you from this task, which I did not intend. I'll assign it to you again...

This tool (https://tools.wmflabs.org/ash-dev/wdutils/csv2quickstatements.php) has examples on how to format things. The missing items can be created using the same method then used to complete the rest.

Last week I discovered OpenRefine (www.openrefine.org), which was pretty much exactly the tool I was looking for. It has built in support for Wikidata and QuickStatement, and you can very easily publish data in an Excel to Wikidata.

SIryn added a subscriber: SIryn.Oct 24 2018, 10:07 AM

Hi guys,

I would very much like to join you, since I have a similar task I want to work on.
See T207839
Olaf, I have started working with OpenRefine to, but if you could show me how reconciliation work I would really appreciate it!

See you on Friday!

Hi SIryn,

reconciliation is exactly what I'm finding/fiddling out at this moment as well. I find these manuals quite useful:

Hi!

I will be available to help with OpenRefine. It is exactly designed for this workflow indeed so I hope it will be a match :)
For reconciliation help, have you seen this page?
https://github.com/OpenRefine/OpenRefine/wiki/Reconciliation

If there is anything unclear there I would be interested to know how to improve it :)

Antonin

SIryn added a comment.Oct 24 2018, 2:51 PM

Great, thank you @Pintoch and @OlafJanssen! I look forward to meeting you!

Dja added a subscriber: Dja.Oct 27 2018, 7:12 AM
Dja added a comment.Oct 27 2018, 1:08 PM

uploaded the first 50 writers from the 6000+ on the file!

Woooohooooo!!!!

Pintoch rescinded a token.

This item can be closed, I know by now how to to achieve the goals mentioned using OpenRefine indeed.

OlafJanssen closed this task as Resolved.Jun 17 2019, 10:56 AM