Page MenuHomePhabricator

WDQS | Prefix showing http instead of https
Open, MediumPublicBUG REPORT

Description

Steps to Reproduce:
WDQS and Updater are running in docker containers separate from wikibase which is running in K8S. Deployed with the following compose file.

version: '3.2'

services:
  wdqs:
    image: wikibase/wdqs
    deploy:
      resources:
        limits:
          memory: 2G
    restart: unless-stopped

    volumes:
      - type: bind
        source: /wdqs-data
        target: /wdqs/data
    command: /runBlazegraph.sh
    ports:
      - "9999:9999"
    environment:
      - BLAZEGRAPH_OPTS="-DwikibaseConceptUri=https://xxxxxxxxxxxxxxxxxx" bash ./runBlazegraph.sh
      - WIKIBASE_SCHEME=https
      - WIKIBASE_HOST=xxxxxxxxxxxxxxxxx
      - WDQS_HOST=wdqs
      - WDQS_PORT=9999
      - MEMORY=-Xms2G -Xmx2G
      - EXTRA-JVM-OPTS="-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap"
  wdqs-updater:
    image: xxxxxxxxxxxxxxxxxxxxx/wdqs:concept
    restart: unless-stopped
    command: /runUpdate.sh
    depends_on:
    - wdqs
    environment:
     - WIKIBASE_SCHEME=https
     - CONCEPT_SCHEME=https
     - WIKIBASE_HOST=xxxxxxxxxxxxxxxxxxxxxx
     - WDQS_HOST=wdqs
     - WDQS_PORT=9999

Actual Results:
Running the following query the wdt prefix comes back as http.

SELECT ?exampleProp (STR(wdt:) AS ?wdtDefault)
WHERE {
  ?exampleProp a wikibase:Property .
}
LIMIT 1


Also logs for updater show an error for Unrecognized subjects and the Special:Entity url is http and host is missing. Not sure if this is related to the issue.

Expected Results:
The scheme of the property and the scheme of wdt should both be https.

Event Timeline

@Gehel so does that mean the error I showed in the log is a separate non-bug issue?

The whole system has been designed from scratch to support wikidata which is using http and it's likely that some parts of the code still boldly assumes that http is being used for IRIs.

Looking at what you pasted I see at least two issues:

  • the GUI for wdqs is hardcoding wikidata prefixes to http ((STR(wdt:) AS ?wdtDefault) => http://www.wikidata.org/prop/direct/.
  • for the error in the munger could you paste the RDF of one entity so that we can try to reproduce and see whether the problem is in wikibase or the updater (wikidata equivalent link https://www.wikidata.org/wiki/Special:EntityData/Q3181360.ttl?flavor=dump)

Thanks

Does this help?

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix wikibase: <http://wikiba.se/ontology#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .
@prefix schema: <http://schema.org/> .
@prefix cc: <http://creativecommons.org/ns#> .
@prefix geo: <http://www.opengis.net/ont/geosparql#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix v: <https://xxxxxxxxxxxxxxx/value/> .
@prefix wd: <https://xxxxxxxxxxxxxxx/entity/> .
@prefix data: <http:///wiki/Special:EntityData/> .
@prefix s: <https://xxxxxxxxxxxxxxx/entity/statement/> .
@prefix ref: <https://xxxxxxxxxxxxxxx/reference/> .
@prefix wdt: <https://xxxxxxxxxxxxxxx/prop/direct/> .
@prefix wdtn: <https://xxxxxxxxxxxxxxx/prop/direct-normalized/> .
@prefix p: <https://xxxxxxxxxxxxxxx/prop/> .
@prefix ps: <https://xxxxxxxxxxxxxxx/prop/statement/> .
@prefix psv: <https://xxxxxxxxxxxxxxx/prop/statement/value/> .
@prefix psn: <https://xxxxxxxxxxxxxxx/prop/statement/value-normalized/> .
@prefix pq: <https://xxxxxxxxxxxxxxx/prop/qualifier/> .
@prefix pqv: <https://xxxxxxxxxxxxxxx/prop/qualifier/value/> .
@prefix pqn: <https://xxxxxxxxxxxxxxx/prop/qualifier/value-normalized/> .
@prefix pr: <https://xxxxxxxxxxxxxxx/prop/reference/> .
@prefix prv: <https://xxxxxxxxxxxxxxx/prop/reference/value/> .
@prefix prn: <https://xxxxxxxxxxxxxxx/prop/reference/value-normalized/> .
@prefix wdno: <https://xxxxxxxxxxxxxxx/prop/novalue/> .
data:Q178 a schema:Dataset ;
	schema:about wd:Q178 ;
	cc:license <http://creativecommons.org/publicdomain/zero/1.0/> ;
	schema:softwareVersion "1.0.0" ;
	schema:version "736"^^xsd:integer ;
	schema:dateModified "2020-07-09T16:43:58Z"^^xsd:dateTime ;
	wikibase:statements "0"^^xsd:integer ;
	wikibase:identifiers "0"^^xsd:integer ;
	wikibase:sitelinks "0"^^xsd:integer .
wd:Q178 a wikibase:Item ;
	rdfs:label "book"@en ;
	skos:prefLabel "book"@en ;
	schema:name "book"@en ;
	schema:description "medium for recording information in the form of writing or images"@en .

@Headingtona thanks this helps a lot, so I believe there is either a misconfiguration of your wikibase setup or a bug in the way Wikibase generates the RDF output. Could you paste the part of the configuration you changed? The fact that your wikibase setup is unable to infer the proper hostname for the data prefix is likely the cause of the updater issues.
Thanks!

I am not sure this will be as helpful.

apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/psp: eks.privileged
  creationTimestamp: "2020-07-21T14:58:37Z"
  generateName: xxxxxxxxxxxxxxxxxxxxx
  labels:
    app.kubernetes.io/instance: xxxxxxxxxxxxxxxxxxxxx
    app.kubernetes.io/name: wikibase
    pod-template-hash: 56d9d86b69
  name: xxxxxxxxxxxxxxxxxxxxx
  namespace:  xxxxxxxxxxxxxxxxxxxxx
  ownerReferences:
  - apiVersion: apps/v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicaSet
    name: xxxxxxxxxxxxxxxxxxxxx
    uid: a841a780-cb62-11ea-bb26-06b280d34a54
  resourceVersion: "35404500"
  selfLink: xxxxxxxxxxxxxxxxxxxxx
  uid: a844adaf-cb62-11ea-bb26-06b280d34a54
spec:
  containers:
  - command:
    - /install.sh
    env:
    - name: MW_ADMIN_EMAIL
      value: admin@example.com
    - name: MW_ADMIN_NAME
      value: WikibaseAdmin
    - name: MW_ELASTIC_HOST
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: MW_ELASTIC_PORT
      value: "9200"
    - name: MW_WG_SECRET_KEY
      value: secretkey
    - name: QS_PUBLIC_SCHEME_HOST_AND_PORT
      value: http://localhost:9191
    - name: WIKIBASE_SCHEME
      value: https
    - name: WG_SERVER
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: MW_SITE_NAME
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_NAME
      value: my_wiki
    - name: DB_PASS
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_SERVER
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_HOST
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_USER
      value: wikiuser
    - name: MW_ADMIN_PASS
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: ServerName
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: ServerAlias
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: SSLCRT
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: SSLKEY
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: CRTCHAIN
      value: xxxxxxxxxxxxxxxxxxxxx
    image: xxxxxxxxxxxxxxxxxxxxx
    imagePullPolicy: Always
    name: wikibase
    ports:
    - containerPort: 80
      name: http
      protocol: TCP
    resources:
      requests:
        cpu: "3"
        memory: 8Gi
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/www/html/images
      name: mediawiki-images-data
    - mountPath: /quickstatements/data
      name: quickstatements-data
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-68v25
      readOnly: true
  dnsPolicy: ClusterFirst
  enableServiceLinks: true
  imagePullSecrets:
  - name: regcred
  initContainers:
  - command:
    - sh
    - -c
    - apk add mysql-client && until mysql -u $DB_USER -p$DB_PASS -h $DB_HOST $DB_NAME
      -e "\q"; do sleep 15; done
    env:
    - name: MW_ADMIN_EMAIL
      value: admin@example.com
    - name: MW_ADMIN_NAME
      value: WikibaseAdmin
    - name: MW_ELASTIC_HOST
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: MW_ELASTIC_PORT
      value: "9200"
    - name: MW_WG_SECRET_KEY
      value: secretkey
    - name: QS_PUBLIC_SCHEME_HOST_AND_PORT
      value: http://localhost:9191
    - name: WIKIBASE_SCHEME
      value: https
    - name: WG_SERVER
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: MW_SITE_NAME
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_NAME
      value: my_wiki
    - name: DB_PASS
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_SERVER
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_HOST
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: DB_USER
      value: wikiuser
    - name: MW_ADMIN_PASS
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: ServerName
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: ServerAlias
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: SSLCRT
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: SSLKEY
      value: xxxxxxxxxxxxxxxxxxxxx
    - name: CRTCHAIN
      value: xxxxxxxxxxxxxxxxxxxxx
    image: alpine
    imagePullPolicy: Always
    name: init
    resources: {}
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-68v25
      readOnly: true
  nodeName: xxxxxxxxxxxxxxxxxxxxx
  priority: 0
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext: {}
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  tolerations:
  - effect: NoExecute
    key: node.kubernetes.io/not-ready
    operator: Exists
    tolerationSeconds: 300
  - effect: NoExecute
    key: node.kubernetes.io/unreachable
    operator: Exists
    tolerationSeconds: 300
  volumes:
  - name: mediawiki-images-data
    persistentVolumeClaim:
      claimName: mediawiki-images-data
  - name: quickstatements-data
    persistentVolumeClaim:
      claimName: quickstatements-data
  - name: default-token-68v25
    secret:
      defaultMode: 420
      secretName: default-token-68v25
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: "2020-07-21T14:58:41Z"
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: "2020-07-21T14:58:46Z"
    status: "True"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: "2020-07-21T14:58:46Z"
    status: "True"
    type: ContainersReady
  - lastProbeTime: null
    lastTransitionTime: "2020-07-21T14:58:37Z"
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://905c417406a935983b8645e5a81662f4bf4f010a9d7611e822de9a783a616240
    image: xxxxxxxxxxxxxxxxxxxxx
    imageID: xxxxxxxxxxxxxxxxxxxxx
    lastState: {}
    name: wikibase
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: "2020-07-21T14:58:45Z"
  hostIP: xxxxxxxxxxxxxxxxxxxxx
  initContainerStatuses:
  - containerID: docker://c54569ef4228c058187d4b57196523c912d6873a05a489192dd24dbcc5df9561
    image: alpine:latest
    imageID: docker-pullable://alpine@sha256:185518070891758909c9f839cf4ca393ee977ac378609f700f60a771a2dfe321
    lastState: {}
    name: init
    ready: true
    restartCount: 0
    state:
      terminated:
        containerID: docker://c54569ef4228c058187d4b57196523c912d6873a05a489192dd24dbcc5df9561
        exitCode: 0
        finishedAt: "2020-07-21T14:58:40Z"
        reason: Completed
        startedAt: "2020-07-21T14:58:39Z"
  phase: Running
  podIP: xxxxxxxxxxxxxxxxxxxxx
  qosClass: Burstable
  startTime: "2020-07-21T14:58:37Z"

@Headingtona thanks this helps a lot, so I believe there is either a misconfiguration of your wikibase setup or a bug in the way Wikibase generates the RDF output. Could you paste the part of the configuration you changed? The fact that your wikibase setup is unable to infer the proper hostname for the data prefix is likely the cause of the updater issues.
Thanks!

@dcausse did my wikibase configs help at all?

@Addshore any clues why this setup would use "wiki" for \Wikibase\Repo\WikibaseRepo::getCanonicalDocumentUrls instead of the hostname configured?

@Addshore any clues why this setup would use "wiki" for \Wikibase\Repo\WikibaseRepo::getCanonicalDocumentUrls instead of the hostname configured?

@Addshore any ideas?

@prefix data: <http:///wiki/Special:EntityData/> .

That looks very wrong.

  • Which version on wikibase are you running?
  • What is your wikibase repo conceptBaseUri setting set to?
@prefix data: <http:///wiki/Special:EntityData/> .

That looks very wrong.

  • Which version on wikibase are you running?
  • What is your wikibase repo conceptBaseUri setting set to?

wikibase: 1.34.1
wikibase_scheme: https

does conceptBaseURI need added? is that in wdqs or wikibase?

That is in Wikibase.
https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_options.html#conceptBaseUri

If you are using the default docker images for their basic purpose then the default configs will work.
If you start doing anything more fancy then you'll need to add more settings, such as this one.

The default is calculated from wgServer in mediawiki https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/358aa1bfd02a0b84cc8c7d19a86f627b34615d40/repo/config/Wikibase.default.php#125

That is in Wikibase.
https://doc.wikimedia.org/Wikibase/master/php/md_docs_topics_options.html#conceptBaseUri

If you are using the default docker images for their basic purpose then the default configs will work.
If you start doing anything more fancy then you'll need to add more settings, such as this one.

The default is calculated from wgServer in mediawiki https://gerrit.wikimedia.org/g/mediawiki/extensions/Wikibase/+/358aa1bfd02a0b84cc8c7d19a86f627b34615d40/repo/config/Wikibase.default.php#125

I set concetBaseURI which is identitcal to wgServer. Still I am getting

Unrecognized statement: s:http:///wiki/Special:EntityData/Q188
Gehel triaged this task as Medium priority.Sep 15 2020, 7:46 AM