Page MenuHomePhabricator

Change management of entities created and deleted in Europeana
Closed, DeclinedPublic

Assigned To
Authored By
Apr 28 2020, 8:20 AM
Referenced Files
F31934878: image.png
Jul 14 2020, 10:04 AM
F31863783: image.png
Jun 13 2020, 7:07 AM
F31802183: image.png
May 4 2020, 4:43 PM
F31791345: image.png
May 2 2020, 6:24 AM
F31791343: image.png
May 2 2020, 6:24 AM
F31785246: image.png
Apr 28 2020, 4:05 PM
F31784103: image.png
Apr 28 2020, 10:05 AM
F31784100: image.png
Apr 28 2020, 10:05 AM


Email sent to Antoine

Hi Antoine
Now we need somewhere to discuss entity management between two loosely coupled systems like Wikidata and Europeana 
I am now trying to push the Europeana concept to nl:Wikipedia together with a new link modell for and then I start seeing

that Wikidata P7704 has same as Europeana Agent but the target at Europeana is deleted,....

My suggestion: as I have done this week with the people see link is that have had a lot of link root and now we agree that take responsibility for fixing this by creating an unique id for every Nobelprize winner and also maintain correct html links see T251055: Template Nobelprize winners Wikidata property P8024 and new link model to
I suggest that Europeana take responsibility for the change management of the Europeana entity in Wikidata Property:P7704 i.e. when you delete / create a new entity you also update Wikidata 

Examples of issues

Regards Magnus Sälgö

Event Timeline

Salgo60 updated the task description. (Show Details)
Salgo60 updated the task description. (Show Details)

Answer Europeana @Hmangas Hugo Manguinhas Platform Services Product Manager API

image.png (730×1 px, 142 KB)

I have asked others for opinions in Telegram Wikidata

image.png (468×1 px, 173 KB)

  • Wikidata can be accessed in different ways
    • SPARQL federation see Nobel prize
    • Video how you easy from a SPARQL query generates code in Python, JavaScript, PHP, Java, Perl, Ruby....
    • when I updated Wikidata I did a Python program that did webscrape Europeana and then I updated wikidata in batches using a tool Quickstatement see my latest batches
    • a new approach I have seen by @Andrawaag ( ) is using Wikibase and create a staging area that is just updated by Europeana and Wikidata is then populated from that.... then the Wikibase installation is write protected and you have versioning and SPARQL with federation....

@David_Haskiya_WMSE @Jopparn do you have any suggestions for dataroundtrip process and tools of entity management between an open system like Wikidata and a source like Europeana

Other tools:

image.png (1×1 px, 77 KB)

see also EPIC T202530: [Epic] Feedback processes and tools for data-providers

ping @Mohammed_Sadat_WMDE (webpage) to get you in the loop if using Wikibase could be a possibility as a staging area between Wikidata and Europeana. In version one it is for managing entities of people (Europeana agents)

  • Scenarios: create, merge, delete
    1. Europeana creates a new agent that already exist in Wikidata but don't have Property 7704 set
    2. Europeana deletes an agent that exist in Wikidata and has Property 7704 set
    3. Wikidata merges objects that exist in Europeana and has same as Wikidata
    4. Wikidata deletes an objects that exist in Europeana and has same as Wikidata
    5. Europeana creates an new Agent that doesnt exist in Wikidata

If this is the way forward using Wikibase I don't know but I like the concept more and more having a "staging area" using Wikibase were e.g. Europeana can upload the data, we can compare it and also comment individual records using Wikibase standard functionality with version history....

Why not have WikibaseP7704 that is a staging for WD property P7704 ....

Todays solution

  • inventing the wheel for every new dataprovider doesnt scale ( we have > 4000 5962 external properties ) well with quality is my guess
  • my experience with OCLC, Nobelprize, ... they doesnt have version history as we have in a Wiki or possibility to discuss items... and ping people...
  • by having a staging area outside Wikidata we will also have federation and easy compare the delivered data to Wikibase what is in Wikidata --> we can easier detect vandalization of Wikidata by comparing what is delivered...
    • I mentioned in Wikidata Berlin 2017 (video at 42 min) a vision that in the Wikipedia articles infobox we should indicate if data displayed is the same as a "trusted" source delivered maybe this is step 1 in a solution like that .

Linked data needs Linked people is my guess I raised this problem at SWIBL18

Next step: can be

Example of Quickstatement that can be used to loaded a Wikibase instance / Wikidata, I guess when we have defined the entity management change process we need to set some flags if its a new item or old one....

the lines below says Q759804 has P7704 agent/base/34971. S813 is optional but sets a timestamp as source

Q759804 P7704 "agent/base/34971" S813 +2019-12-1225T00:00:00Z/11
Q206820 P7704 "agent/base/61829" S813 +2019-12-1225T00:00:00Z/11
Q730008 P7704 "agent/base/73049" S813 +2019-12-1225T00:00:00Z/11

Another alternative used by OCLC in the evaluation of WIkibase is Pywikibot see task T251625: Purge Europeana entries used for purging WD objects

Try to get feedback from WikiTree developer that has done data synch with Wikitree for the last 3 years and have a framework for improving the quality in WikiTree based on data from Wikidata / Findagrave.and also logic defined in > 200 rules. They also have a very active community running weekends cleaning sessions, creating videos etc... the site was created by Chris Whitten to order his family tree before a wedding.... today it has 23 million profiles....

image.png (1×2 px, 730 KB)

image.png (1×2 px, 602 KB)

Using Wikibase as a staging area is an interesting idea, but it is an idea that seems to only make sense after WMDE’s work to enable Wikidata/Wikibase federation is completed. This idea should be revisited then, and we can follow up on the topic with Europeana at that point.

@Salgo60 As a reminder, please do not share contents (or screenshots) of emails/private conversations in tickets as this creates a privacy issue. We cannot be sure of the impact that sharing these messages publicly may have on the original sender.

@Mohammed_Sadat_WMDE I think we have different level of staging and the federation possibility is I guess useful later. My feeling is that this could be something like mix-and-match 2.0 were we also get

  • communication - using talk pages
  • handling duplicates / mismatches / data round trips
  • with > 4000 external identifiers we cant invent the wheel everytime as we also have 14000 active editors we need an easy way for everyone involved to communicate better than today with external sources and also understand open issues and status. Listen to Lydia 2017 video at 42 min were she agree we need a an echosystem maybe this is a building block for organisation not having public work boards/trackable issues / visibility of Sprints delivered ....
  • getting a controlled change stream and possibilities to handle/document mismatches better than today . In Sweden we nearly never get helpdesk ID from organisations when we have an issue and its very seldom easy to track the status of a request or subscribe to an issue. Doing linked data without the correct tools and "connected people" will never work. Good video link about how loosely coupled networks needs to have trusts to work together and one piece is having tools...

My understanding is that Europeana plans a sprint thinking about how they would like to work together with an open platform like Wikidata and I can try to write down some user cases with my experience importing data for 10+ WD properties...

I think finding good patterns working together between loosely coupled platforms will be a key for achieving good things.... Europeana with a very loosely coupled network of I guess more than 6000 museums is a great test for trying in the first phase just getting the European <-> Wikidata <-> 205 language Wikipedia working together...

As Europeana works direction linked data video I think they look into using Wikidata as a possibility for museums to reference WD when they upload data to Europeana see MET-2410...

image.png (876×1 px, 892 KB)

  • Good video from Future Learn about Collaboration and Trust between loosely coupled networks
Salgo60 changed the task status from Open to Stalled.May 20 2020, 4:24 PM
Salgo60 moved this task from Backlog to Nice to have on the Magnus Sälgö board.

No activities seen from the Europeana people moved to stalled and Nice to have

Discussion "Wikidata:Requests for comment/Handling of stored IDs after they've been deleted or redirected in the external database"

I informed about this thought with a staging area

image.png (442×2 px, 127 KB)

Status 2020 jun 13: no feedback Europeana and I havnt pushed it and I dont know were in the Europeana backlog integration with Wikidata is

More feedback that WIkidata <-> Europeana data is not in sync see Property_talk:P7704#Linkrot

"Any update on this, I have been getting no found errors for quite sometime with this property"

image.png (1×2 px, 357 KB)

Salgo60 removed Salgo60 as the assignee of this task.
Salgo60 removed a subscriber: Salgo60.