Page MenuHomePhabricator

Run "DCAT-AP for Wikibase" after creating Wikidata dumps and make the resulting RDF publicly available
Closed, ResolvedPublic

Description

This basically means:

Event Timeline

hoo raised the priority of this task from to Needs Triage.
hoo updated the task description. (Show Details)
hoo updated the task description. (Show Details)
hoo set Security to None.
hoo added subscribers: hoo, ArielGlenn, Lydia_Pintscher, Aklapper.
hoo added a subscriber: Lokal_Profil.

so let's get the code into gerrit so it can be reviewed; I can be one reviewer, but if you could add a WMDE php dev that would be good too.

I suggest the script go into modules/snapshot/files where the dumpwikidata*sh scripts are.

Just double-checking:

So I should stick it into a folder in modules/snapshot/files in the operations/puppet repo?

Also I'm assuming squashing everything into one commit?

Yes and yes, that would be best.

Now up at gerrit:219800

Just a reflection(from looking at the other scripts):
The alternative would be to stick the php in mediawiki/extensions/Wikibase/repo/maintenance and then stick the config.json in modules/snapshot/files and call it from a bash script in modules/snapshot/files.

@hoo? Would this make more sense as a maintenance script for Wikibase? Let's continue the review where it is anyways, and then if it needs to be moved we'll move it.

Change 219800 had a related patch set uploaded (by Lokal Profil):
Add DCAT-AP for Wikibase

https://gerrit.wikimedia.org/r/219800

I'll be the Wikimania hackathon so if anyone has the time to do a code review during (or before) the hackathon I should be able to iterate on the feedback straight away.

I'll be the Wikimania hackathon so if anyone has the time to do a code review during (or before) the hackathon I should be able to iterate on the feedback straight away.

I'm up for that! Feel free to poke me there.

I'll be the Wikimania hackathon so if anyone has the time to do a code review during (or before) the hackathon I should be able to iterate on the feedback straight away.

I'm up for that! Feel free to poke me there.

Perfect! I'll add the hack tag to this in case.

A new patch is out (don't know why gerritbot hasn't pinged here).

Additionally https://validator.dcat-editor.com/ can be used to validate the DCAT-AP output (although it is a bit off on the warnings related to optional parameters)
Example output based on the latest build is available at lokal-profil / dcat-wikidata.rdf

I'm a bit unsure about whether the i18n files can live in their current place and still be translated via e.g. translatewiki. Maybe @siebrand can chip in on that?

What is the status of this task, now that Wikimania 2015 is over? As this task is in the "Backlog" column of the #Wikimania-Hackathon-2015 project's workboard: Did this task take place and was successfully finished? If yes: Please provide an update (and if the task is not completely finished yet, please move the project to the "Work continues after Mexico City" column on the #Wikimania-Hackathon-2015 workboard). If no: Please edit this task by removing the #Wikimania-Hackathon-2015 project from this task. Thanks for your help and keeping this task updated!

What is the status of this task, now that Wikimania 2015 is over? As this task is in the "Backlog" column of the #Wikimania-Hackathon-2015 project's workboard: Did this task take place and was successfully finished? If yes: Please provide an update (and if the task is not completely finished yet, please move the project to the "Work continues after Mexico City" column on the #Wikimania-Hackathon-2015 workboard). If no: Please edit this task by removing the #Wikimania-Hackathon-2015 project from this task. Thanks for your help and keeping this task updated!

We poked at the script a lot and @Lokal_Profil added i18n via translatewiki to it. I still want someone to have a look at the output (I'm not really an expert in that field and don't have the time to read the specs… I guess @daniel could do that). Despite of that we're done… the change just needs to be merged and the script needs to be invoked after dump creations.

Change 219800 merged by ArielGlenn:
Add DCAT-AP for Wikibase

https://gerrit.wikimedia.org/r/219800

hoo claimed this task.

Done: https://dumps.wikimedia.org/wikidatawiki/entities/dcatap.rdf

The RDF will be updated after each Wikidata dump run.

I just spotted that no nodes embedding information from the i18n directory (e.g. dcterms:description) were included in the rdf. This includes English.

Was there maybe an issue with access restriction to that directory during the run?

I just spotted that no nodes embedding information from the i18n directory (e.g. dcterms:description) were included in the rdf. This includes English.

Was there maybe an issue with access restriction to that directory during the run?

No… I guess the problem is that you assume that the i18n is in the working directory, but that's not the case for our script runs. You should change that to use the absolute path (__DIR__) to get the i18n files.

I just spotted that no nodes embedding information from the i18n directory (e.g. dcterms:description) were included in the rdf. This includes English.

Was there maybe an issue with access restriction to that directory during the run?

No… I guess the problem is that you assume that the i18n is in the working directory, but that's not the case for our script runs. You should change that to use the absolute path (__DIR__) to get the i18n files.

Ah. I'll update and submit a new patch.

Change 229128 had a related patch set uploaded (by Lokal Profil):
Look for i18n using absolute path

https://gerrit.wikimedia.org/r/229128

Change 229128 merged by ArielGlenn:
Look for i18n using absolute path

https://gerrit.wikimedia.org/r/229128

hoo removed a project: Patch-For-Review.

Hello,

I just noticed your project. Accordingly to the DCAT-AP specifications (https://joinup.ec.europa.eu/node/146653) the properties issued and modified should have datatype date or dateTime.

Best,

Emidio

Hello,

I just noticed your project. Accordingly to the DCAT-AP specifications (https://joinup.ec.europa.eu/node/146653) the properties issued and modified should have datatype date or dateTime.

Best,

Emidio

Looks like this was something we (and my reference work) had missed. I've broken this out as T117533.