Page MenuHomePhabricator

test infra: deploy main and scholarly blazegraph instances
Closed, ResolvedPublic1 Estimated Story Points

Description

We need a static baseline to compare the output of rewritten queries and document/triage differences if they emerge.

This index should be static and should be compared against similarly static snapshots.

AC:

  • A blazegraph wikidata main graph (read only) is available internally
  • A blazegraph wikidata scholarly graph (read only) is available internally

Event Timeline

To keep things as simple as possible, I'm spinning up a blazegraph docker container

We don't have a system user on these nodes, so I'll run as root.

docker run --user=0 --entrypoint=/runBlazegraph.sh -d -e HEAP_SIZE="110g" -p 9999:9999 --name wdqs\
   -v /srv/wdqs/blazegraph:/wdqs/data:z \
   -v /srv/tmp/main-20260209/:/wdqs/munge:z \
  wikibase/wdqs:wdqs0.3.156

loadData.sh is a bit underdocumented, but I get input params from https://gerrit.wikimedia.org/g/operations/cookbooks/+/9d48e6b04c9e6c5b4cac812473c6ff5d0ed93f00/cookbooks/sre/wdqs/data-reload.py

sudo docker exec -it wdqs bash /wdqs/loadData.sh -n wdq -d /wdqs/munge -f wikidata_main.%04d.nt.gz

Pulling wikibase/wdqs from dockerhub required setting proxies:

$ cat /etc/systemd/system/docker.service.d/http-proxy.conf
[Service]
Environment="HTTP_PROXY=http://webproxy.eqiad.wmnet:8080"
Environment="HTTPS_PROXY=http://webproxy.eqiad.wmnet:8080"
Environment="NO_PROXY=localhost,127.0.0.1,.wmnet,.wikimedia.org

$ sudo systemctl daemon-reload
$ sudo systemctl restart docker
gmodena changed the task status from Open to In Progress.Wed, Apr 15, 1:12 PM
gmodena set the point value for this task to 1.Wed, Apr 15, 2:17 PM

The 20260209 has been indexed as is available in Blazegraph:

wikidata main graph on wdqs1031
curl -s -X POST   -H "Content-Type: application/sparql-query"  --data-binary "SELECT (COUNT(*) AS ?count) WHERE {?s ?p ?o}"  http://localhost:9999/bigdata/namespace/wdq/sparql  -H "Accept: application/sparql-results+json" | jq
{
  "head": {
    "vars": [
      "count"
    ]
  },
  "results": {
    "bindings": [
      {
        "count": {
          "datatype": "http://www.w3.org/2001/XMLSchema#integer",
          "type": "literal",
          "value": "8662009796"
        }
      }
    ]
  }
}
sholarly graph on wdqs1032
$ curl -s -X POST   -H "Content-Type: application/sparql-query"  --data-binary "SELECT (COUNT(*) AS ?count) WHERE {?s ?p ?o}"  http://localhost:9999/bigdata/namespace/wdq/sparql  -H "Accept: application/sparql-results+json" | jq
{
  "head": {
    "vars": [
      "count"
    ]
  },
  "results": {
    "bindings": [
      {
        "count": {
          "datatype": "http://www.w3.org/2001/XMLSchema#integer",
          "type": "literal",
          "value": "8794607950"
        }
      }
    ]
  }
}
RKemper renamed this task from test infra: deloy main and scholarly blazegraph instances to test infra: deploy main and scholarly blazegraph instances.Tue, Apr 21, 8:48 PM