Page MenuHomePhabricator
Feed Advanced Search

Oct 16 2023

VladimirAlexiev added a comment to T270764: Wikidata Truthy dump is missing important metadata triples.

I used queries like this to compare counts between wikidata and our wdtruthy service.

  • For the first 2 queries we use count(distinct ?x): they have prop path but a small result population
  • For the other queries we use count(*) because it's much faster
Oct 16 2023, 1:35 PM · Wikidata

Oct 5 2023

VladimirAlexiev added a comment to T270764: Wikidata Truthy dump is missing important metadata triples.

I see the count triples on recently modified entities:

image.png (369×1 px, 58 KB)

Oct 5 2023, 12:27 PM · Wikidata

Oct 3 2023

VladimirAlexiev added a comment to T270764: Wikidata Truthy dump is missing important metadata triples.

The full dump is 15B triples (you can see this here https://query.wikidata.org/bigdata/ldf).
WDtruthy is 6.5B triples (we have it in GraphDB, continuously updating).
Adding the counts will add 320M, or 5%.

Oct 3 2023, 5:18 PM · Wikidata

Sep 27 2023

VladimirAlexiev added a comment to T270764: Wikidata Truthy dump is missing important metadata triples.

The following query (from https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/query_optimization#Distinct_term_scan,_and_group_by_and_count_optimization)

SELECT ?p (COUNT(?p) AS ?count)
WHERE { [] ?p [] . }
GROUP BY ?p order by desc(?count)

when ran at the WD query service shows the interesting counts:

  • wikibase:statements 107093728
  • wikibase:identifiers 105929733
  • wikibase:sitelinks 105928930
Sep 27 2023, 2:33 PM · Wikidata

Sep 15 2023

VladimirAlexiev added a comment to T235540: StackOverflowError when SPARQL query uses same variable name before and after aggregation.

I posted https://github.com/w3c/sparql-dev/issues/192 to collect info about what various SPARQL processors do in such cases.
Blazegraph allows it for certain situations:

  • COUNT
  • SAMPLE
  • identity rebinding
  • expression rebinding (not aggregate): but returns no rows
Sep 15 2023, 3:24 PM · Wikidata, Wikidata-Query-Service
VladimirAlexiev added a comment to T346420: query causes Stack Overflow.

Reading through the spec:

  • https://www.w3.org/TR/sparql11-query/#bind: "The variable introduced by the BIND clause must not have been used in the group graph pattern up to the point of use in BIND."
  • https://www.w3.org/TR/sparql11-query/#selectExpressions: "The rules of assignment in SELECT expression are the same as for assignment in BIND. The expression combines variable bindings already in the query solution, or defined earlier in the SELECT clause. The variable may be used in an expression later in the same SELECT clause and may not be be assigned again in the same SELECT clause."
    • This says you can't "assign" the same var twice in SELECT and that vars are brought forward from BIND, but not explicitly that you can't "reassign" from BIND to SELECT
Sep 15 2023, 3:09 PM · Wikidata-Query-Service, Wikidata
VladimirAlexiev created T346420: query causes Stack Overflow.
Sep 15 2023, 7:23 AM · Wikidata-Query-Service, Wikidata

Nov 24 2022

VladimirAlexiev updated the task description for T323774: some `wdtn:` values have disappeared.
Nov 24 2022, 3:56 PM · Wikidata
VladimirAlexiev created T323774: some `wdtn:` values have disappeared.
Nov 24 2022, 3:56 PM · Wikidata

Sep 21 2022

VladimirAlexiev updated the task description for T318219: Timeline view: vertical and horizontal controls.
Sep 21 2022, 10:21 AM · Wikidata, Wikidata Query UI
VladimirAlexiev created T318219: Timeline view: vertical and horizontal controls.
Sep 21 2022, 10:20 AM · Wikidata, Wikidata Query UI

Jul 26 2022

VladimirAlexiev added a comment to T207705: Implement the Extended Date/Time Format Specification.

Thanks everyone and especially @Jheald for the valuable info.

Jul 26 2022, 10:55 PM · Wikidata data quality and trust, Wikidata
VladimirAlexiev awarded T207705: Implement the Extended Date/Time Format Specification a Like token.
Jul 26 2022, 10:48 PM · Wikidata data quality and trust, Wikidata

Feb 17 2022

VladimirAlexiev added a comment to T204045: Support GeoSPARQL in Wikidata Query Service.

https://github.com/Sophox/sophox/issues/17 is about GeoSPARQL in Sophox, the OpenStreetMap SPARQL endpoint.
It uses Blazegraph (and Wikibase for OSM tags.keys).

Feb 17 2022, 8:26 AM · Wikidata-Query-Service, Wikidata

Dec 6 2021

VladimirAlexiev renamed T297096: add prefix `s:` or replace it with `wds:` from add prefix `s:` to add prefix `s:` or replace it with `wds:`.
Dec 6 2021, 11:52 AM · User-Addshore, Wikibase-Quality-Constraints, User-mobrovac, [DEPRECATED] wdwb-tech, Wikibase-Quality, Wikidata
VladimirAlexiev created T297097: sort WDQ prefixes.
Dec 6 2021, 11:49 AM · Wikidata Query UI, Wikidata
VladimirAlexiev created T297096: add prefix `s:` or replace it with `wds:`.
Dec 6 2021, 11:48 AM · User-Addshore, Wikibase-Quality-Constraints, User-mobrovac, [DEPRECATED] wdwb-tech, Wikibase-Quality, Wikidata
VladimirAlexiev added a comment to T297075: wrong/missing formatted URL when 2 props given in Property Example.

Turns out that VIAF pages (from which I copied the values) do have such weird invisible Unicode chars. Printed in hex:

3337 3237 37e2 808f e280 8f
Dec 6 2021, 8:59 AM · Wikidata
VladimirAlexiev created T297075: wrong/missing formatted URL when 2 props given in Property Example.
Dec 6 2021, 8:44 AM · Wikidata

Dec 3 2021

VladimirAlexiev added a comment to T257415: Language code zh-classical is invalid.

@Mahir256 : see T30443 for a validation of the WD dump with rdf4j, and it still appears.

Dec 3 2021, 3:51 PM · Language codes, Wikidata
VladimirAlexiev added a comment to T30443: Rename zh-classical -> lzh (invalid lang tag format).

Bart Hanssens tried to validate the WD dump with rdf4j:
https://github.com/barthanssens/rdf4j-bigfile-validator/blob/main/log.txt
'孟慶雲' was not recognised as a language literal, and could not be verified, with language zh-classical

Dec 3 2021, 3:49 PM · Wikidata-Query-Service, Wikidata, Wiki-Setup (Rename), Community-consensus-needed, Wikimedia-Language-setup

Nov 17 2021

VladimirAlexiev created T295866: improve identifier search on Wikidata.
Nov 17 2021, 11:36 AM · Elasticsearch, Wikidata

Sep 20 2021

VladimirAlexiev added a comment to T290961: rewrite KrBot to publish Constraint Violation pages.

@Ivan

see updated report immediately after fixing several items

Sep 20 2021, 6:53 AM · Wikidata