[Story] Decide the back-end re-implementation
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	Hjfocs
	May 29 2017, 11:13 AM

Description

Propose to the community a set of solutions to ensure the self-sustainability of the back-end.
The current implementation is in C++.

Possible alternatives, ordered by priority:

PHP, which would enable the re-use of the Wikidata data model;
Node.js
a WDQS instance, cf. T166501#3320547

Related Objects
Search...

Status	Assigned	Task
Resolved	Hjfocs	T166497 [Epic] Back-end redesign
Resolved	Hjfocs	T166501 [Story] Decide the back-end re-implementation
Resolved	Afnecors	T167025 Install and play with Wikibase
Resolved	Hjfocs	T167030 Find a Wikidata RDF data model parser/validator

Event Timeline

Hjfocs created this task.May 29 2017, 11:13 AM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 29 2017, 11:13 AM

Hjfocs updated the task description. (Show Details)May 29 2017, 11:16 AM

Hjfocs added a parent task: T166497: [Epic] Back-end redesign.

Hjfocs moved this task from To do to Doing on the Wikidata-primary-sources board.Jun 5 2017, 12:51 PM

Hjfocs moved this task from Doing to Upcoming on the Wikidata-primary-sources board.

Hjfocs claimed this task.Jun 5 2017, 2:10 PM

Hjfocs mentioned this in T167016: Describe the new back-end.Jun 5 2017, 2:49 PM

Hjfocs moved this task from Upcoming to Doing on the Wikidata-primary-sources board.Jun 5 2017, 2:53 PM

Hjfocs added a subtask: T167025: Install and play with Wikibase.Jun 5 2017, 2:57 PM

Hjfocs added a subtask: T167030: Find a Wikidata RDF data model parser/validator.Jun 5 2017, 3:03 PM

Hjfocs awarded a token.Jun 5 2017, 3:41 PM

Hjfocs rescinded a token.

see T167025#3318577 for how to set up a local Wikidata instance with Vagrant.

It seems that the Wikidata Query Service can be a good fit too, for the following reasons:

uses Blazegraph as the storage engine, cf. T166503;
has facilities to load & upload datasets in Wikidata RDF dump format;
exposes APIs to access data via SPARQL (specifically useful for both the domain filter and the query text box, cf. T166512).

Instructions to install, between the WDQS user manual and the getting started:

Download the latest packaged version at Maven Central and unzip it:*

wget -O wdqs.zip http://search.maven.org/remotecontent?filepath=org/wikidata/query/rdf/service/0.2.4/service-0.2.4-dist.zip
unzip -d wdqs wdqs.zip
cd wdqs

Download the latest Wikidata RDF Turtle gzipped dump:

mkdir -p data/chunks
wget -O data/wikidata.ttl.gz https://dumps.wikimedia.org/wikidatawiki/entities/20170529/wikidata-20170529-all-BETA.ttl.gz

Pre-process the dump:

./munge.sh -f data/wikidata.ttl.gz -d data/chunks -l it -s

Start Blazegraph (in the background):

./runBlazegraph.sh &

Load a data chunk (loading the whole Wikidata dump is computationally cumbersome)

./loadRestAPI.sh -n wdq -d `pwd`/data/chunks/wikidump-000000001.ttl.gz

Blazegraph is ready for query at its SPARQL endpoint: http://localhost:9999/bigdata/#query

*N.B.: as of today, compiling the source code will fail due to missing blazegraph-2.1.5-SNAPSHOT dependencies on remote repositories.

Hjfocs closed subtask T167030: Find a Wikidata RDF data model parser/validator as Resolved.Jun 8 2017, 10:49 AM

Hjfocs updated the task description. (Show Details)

If we use RDF, we could feed the primary sources list/filter sub-tool with truthy statements: once the sanity of a given RDF dataset is checked via the data model validator, we can then serialize the response of the queried WDQS SPARQL endpoint into a HTML table.

On the other hand, the per-item tool would support full statements.

We still need to investigate which Wikibase data model implementation to use, basically for the data model validator implementation, cf. T167030.

We assume that the following objects are considered stable, as they are claimed to be subject to the Wikidata stable interface policy:

On the other hand, there is no guarantee for the following Wikibase data model implementations:

We assume however that they can at least cater for a sub-set of the data model, cf. the extensibility principle.

Hjfocs mentioned this in T167419: Decide the data format for third-party datasets.Jun 8 2017, 2:02 PM

The proposed solution is a WDQS instance with RDF data model validation.
T167014 will include the full proposal.

Hjfocs moved this task from Doing to Done on the Wikidata-primary-sources board.Jun 19 2017, 8:03 AM

Afnecors closed subtask T167025: Install and play with Wikibase as Resolved.Jun 23 2017, 2:22 PM

Ricordisamoa subscribed.Jul 7 2017, 5:56 PM

Aklapper removed a subscriber: Wikidata-primary-sources.May 16 2023, 10:30 AM

Maintenance_bot added a project: Wikidata.May 16 2023, 10:46 AM

[Story] Decide the back-end re-implementationClosed, ResolvedPublicActions

Description

Related ObjectsSearch...

Event Timeline

[Story] Decide the back-end re-implementation
Closed, ResolvedPublic
Actions

Related Objects
Search...