Page MenuHomePhabricator

Change management of entities created and deleted in Europeana
Open, Stalled, Needs TriagePublic

Description

Email sent to Antoine

Hi Antoine
Now we need somewhere to discuss entity management between two loosely coupled systems like Wikidata and Europeana 
I am now trying to push the Europeana concept to nl:Wikipedia together with a new link modell for Nobelprize.org and then I start seeing

that Wikidata P7704 has same as Europeana Agent but the target at Europeana is deleted,....

My suggestion: as I have done this week with the Nobelprize.org people see link is that Nobelprize.org have had a lot of link root and now we agree that Nobelprize.org take responsibility for fixing this by creating an unique id for every Nobelprize winner and also Nobelprize.org maintain correct html links see T251055: Template Nobelprize winners Wikidata property P8024 and new link model to Nobelprize.org
I suggest that Europeana take responsibility for the change management of the Europeana entity in Wikidata Property:P7704 i.e. when you delete / create a new entity you also update Wikidata 

Examples of issues

Regards Magnus Sälgö
++46705937579
salgo60@msn.com

Event Timeline

Salgo60 created this task.Apr 28 2020, 8:20 AM
Salgo60 updated the task description. (Show Details)
Salgo60 updated the task description. (Show Details)Apr 28 2020, 8:24 AM
Salgo60 updated the task description. (Show Details)
Salgo60 added subscribers: Hmangas, Andrawaag, DivadH.EditedApr 28 2020, 10:05 AM

Answer Europeana @Hmangas Hugo Manguinhas Platform Services Product Manager API

I have asked others for opinions in Telegram Wikidata

  • Wikidata can be accessed in different ways
    • SPARQL federation see Nobel prize
    • Video how you easy from a SPARQL query generates code in Python, JavaScript, PHP, Java, Perl, Ruby....
    • when I updated Wikidata I did a Python program that did webscrape Europeana and then I updated wikidata in batches using a tool Quickstatement see my latest batches
    • a new approach I have seen by @Andrawaag ( http://www.micelio.be/ ) is using Wikibase and create a staging area that is just updated by Europeana and Wikidata is then populated from that.... then the Wikibase installation is write protected and you have versioning and SPARQL with federation....

@David_Haskiya_WMSE @Jopparn do you have any suggestions for dataroundtrip process and tools of entity management between an open system like Wikidata and a source like Europeana

Other tools:

see also EPIC T202530: [Epic] Feedback processes and tools for data-providers

Salgo60 added a subscriber: Mohammed_Sadat_WMDE.EditedMay 1 2020, 10:27 AM

ping @Mohammed_Sadat_WMDE (webpage) to get you in the loop if using Wikibase could be a possibility as a staging area between Wikidata and Europeana. In version one it is for managing entities of people (Europeana agents)

  • Scenarios: create, merge, delete
    1. Europeana creates a new agent that already exist in Wikidata but don't have Property 7704 set
    2. Europeana deletes an agent that exist in Wikidata and has Property 7704 set
    3. Wikidata merges objects that exist in Europeana and has same as Wikidata
    4. Wikidata deletes an objects that exist in Europeana and has same as Wikidata
    5. Europeana creates an new Agent that doesnt exist in Wikidata

If this is the way forward using Wikibase I don't know but I like the concept more and more having a "staging area" using Wikibase were e.g. Europeana can upload the data, we can compare it and also comment individual records using Wikibase standard functionality with version history....

Why not have WikibaseP7704 that is a staging for WD property P7704 ....

Todays solution

  • inventing the wheel for every new dataprovider doesnt scale ( we have > 4000 external properties) well with quality is my guess
  • my experience with OCLC, Nobelprize, ... they doesnt have version history as we have in a Wiki or possibility to discuss items... and ping people...
  • by having a staging area outside Wikidata we will also have federation and easy compare the delivered data to Wikibase what is in Wikidata --> we can easier detect vandalization of Wikidata by comparing what is delivered...
    • I mentioned in Wikidata Berlin 2017 (video at 42 min) a vision that in the Wikipedia articles infobox we should indicate if data displayed is the same as a "trusted" source delivered maybe this is step 1 in a solution like that .

Linked data needs Linked people is my guess I raised this problem in SWIBL18 amd

Next step: can be

Salgo60 added a comment.EditedMay 1 2020, 12:17 PM

Example of Quickstatement that can be used to loaded a Wikibase instance / Wikidata, I guess when we have defined the entity management change process we need to set some flags if its a new item or old one....

the lines below says Q759804 has P7704 agent/base/34971. S813 is optional but sets a timestamp as source

Q759804 P7704 "agent/base/34971" S813 +2019-12-1225T00:00:00Z/11
Q206820 P7704 "agent/base/61829" S813 +2019-12-1225T00:00:00Z/11
Q730008 P7704 "agent/base/73049" S813 +2019-12-1225T00:00:00Z/11

Another alternative used by OCLC in the evaluation of WIkibase is Pywikibot see task T251625: Purge Europeana entries used for purging WD objects

Masssly added a subscriber: Masssly.May 1 2020, 5:56 PM
Masssly removed a subscriber: Masssly.
Salgo60 added a comment.EditedSat, May 2, 6:24 AM

Try to get feedback from WikiTree developer that has done data synch with Wikitree for the last 3 years and have a framework for improving the quality in WikiTree based on data from Wikidata / Findagrave.and also logic defined in > 200 rules. They also have a very active community running weekends cleaning sessions, creating videos etc... the site was created by Chris Whitten to order his family tree before a wedding.... today it has 23 million profiles....


Using Wikibase as a staging area is an interesting idea, but it is an idea that seems to only make sense after WMDE’s work to enable Wikidata/Wikibase federation is completed. This idea should be revisited then, and we can follow up on the topic with Europeana at that point.

@Salgo60 As a reminder, please do not share contents (or screenshots) of emails/private conversations in tickets as this creates a privacy issue. We cannot be sure of the impact that sharing these messages publicly may have on the original sender.

Salgo60 added a comment.EditedMon, May 4, 4:43 PM

@Mohammed_Sadat_WMDE I think we have different level of staging and the federation possibility is I guess useful later. My feeling is that this could be something like mix-and-match 2.0 were we also get

  • communication - using talk pages
  • handling duplicates / mismatches / data round trips
  • with > 4000 external identifiers we cant invent the wheel everytime as we also have 14000 active editors we need an easy way for everyone involved to communicate better than today with external sources and also understand open issues and status. Listen to Lydia 2017 video at 42 min were she agree we need a an echosystem maybe this is a building block for organisation not having public work boards/trackable issues / visibility of Sprints delivered ....
  • getting a controlled change stream and possibilities to handle/document mismatches better than today . In Sweden we nearly never get helpdesk ID from organisations when we have an issue and its very seldom easy to track the status of a request or subscribe to an issue. Doing linked data without the correct tools and "connected people" will never work. Good video link about how loosely coupled networks needs to have trusts to work together and one piece is having tools...

My understanding is that Europeana plans a sprint thinking about how they would like to work together with an open platform like Wikidata and I can try to write down some user cases with my experience importing data for 10+ WD properties...

I think finding good patterns working together between loosely coupled platforms will be a key for achieving good things.... Europeana with a very loosely coupled network of I guess more than 6000 museums is a great test for trying in the first phase just getting the European <-> Wikidata <-> 205 language Wikipedia working together...

As Europeana works direction linked data video I think they look into using Wikidata as a possibility for museums to reference WD when they upload data to Europeana see MET-2410...

  • Good video from Future Learn about Collaboration and Trust between loosely coupled networks
Salgo60 changed the task status from Open to Stalled.Wed, May 20, 4:24 PM
Salgo60 moved this task from Backlog to Nice to have on the Magnus Sälgö board.

No activities seen from the Europeana people moved to stalled and Nice to have