Page MenuHomePhabricator

Import WDQS subgraphs to production nodes
Open, HighPublic

Description

The split graph updater is running and populating its kafka topics since 2024-07-18T09:00.

The earliest dumps usable for a data-reload should be tagged with the snapshot date 20240722 which should instruct the data-reload cookbook to position kafka offsets to 2024-07-19T23:00:00Z.
Given previous runs of the airflow dag to import dumps into HDFS these snapshots should be available around 2024-07-26T10:00:00.
Update from above: the latest snapshots available are now tagged with 20240729.

Given the above the data-reload arguments should be:

  • main graph:
cookbook sre.wdqs.data-reload \
 --task-id T370754 \
 --reason "WDQS main subgraph" \
 --reload-data wikidata_main \
 --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240729/ \
 --stat-host stat1009.eqiad.wmnet \
 wdqs_host_main
  • scholarly graph:
cookbook sre.wdqs.data-reload \
 --task-id T370754 \
 --reason "WDQS scholarly subgraph" \
 --reload-data scholarly_articles \
 --from-hdfs hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240729/ \
 --stat-host stat1009.eqiad.wmnet \
 wdqs_host_scholarly

Pre-requisites:

  • The target WDQS node must have its topic properly configured in puppet with (https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060049):
    • profile::query_service::streaming_updater::kafka_topic: eqiad.rdf-streaming-updater.mutation-main for an eqiad node imported with the profile wikidata_main
    • profile::query_service::streaming_updater::kafka_topic: eqiad.rdf-streaming-updater.mutation-scholarly for an eqiad node imported with the profile scholarly_articles
    • profile::query_service::streaming_updater::kafka_topic: codfw.rdf-streaming-updater.mutation-main for a codfw node imported with the profile wikidata_main
    • profile::query_service::streaming_updater::kafka_topic: codfw.rdf-streaming-updater.mutation-scholarly for a codfw node imported with the profile scholarly_articles
  • The wdqs version 0.3.145 must be deployed (https://gerrit.wikimedia.org/r/c/wikidata/query/deploy/+/1056125)
  • partition hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/main/20240722/ is available
  • partition hdfs:///wmf/data/discovery/wikidata/munged_n3_dump/wikidata/scholarly/20240722/ is available

Event Timeline

dcausse removed a project: Epic.
dcausse updated the task description. (Show Details)

Change #1056125 had a related patch set uploaded (by DCausse; author: DCausse):

[wikidata/query/deploy@master] deploy version 0.3.145

https://gerrit.wikimedia.org/r/1056125

Change #1056125 merged by Ryan Kemper:

[wikidata/query/deploy@master] deploy version 0.3.145

https://gerrit.wikimedia.org/r/1056125

dcausse updated the task description. (Show Details)

Change #1059963 had a related patch set uploaded (by Ryan Kemper; author: Ryan Kemper):

[operations/puppet@production] wdqs: add wdqs1021 to scap targets

https://gerrit.wikimedia.org/r/1059963

Change #1059963 merged by Ryan Kemper:

[operations/puppet@production] wdqs: add wdqs1021 to scap targets

https://gerrit.wikimedia.org/r/1059963

Unchecked the prerequisite regarding kafka topics, the split graph hosts are currently configured to consume from the full graph topic, the reload should not start (probably be stopped/restarted on wdqs1021) before https://gerrit.wikimedia.org/r/c/operations/puppet/+/1060049 is merged and applied to the corresponding hosts.

Reloads on 1021 (main) and 1023 (scholarly) are in progress. 1021 will probably finish in the next day or so, 1023 probably not until ~Monday.

Working on kicking off reloads of main&scholarly on codfw hosts as well.

Mentioned in SAL (#wikimedia-operations) [2024-08-13T19:27:28Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-13T19:29:53Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-13T19:57:47Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-13T20:51:26Z] <ryankemper@cumin2002> END (FAIL) - Cookbook sre.wdqs.data-transfer (exit_code=99) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:15Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:53:55Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-15T21:54:27Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-15T22:42:57Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal to codfw host) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs2021.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Some basic spot checks:

wdqs1021.eqiad.wmnet (main)

wdqs1021.eqiad.wmnet
2024-08-16 1816 UTC
select (count(*) as ?count) where { ?s ?p ?o }
8243688752
wdqs1021.eqiad.wmnet
2024-08-16 1817 UTC
select (count(*) as ?count) where { ?s ?p ?o }
8243690043
Good, this is increasing over time.
wdqs1021.eqiad.wmnet
2024-0816 1830 UTC
SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P31 wd:Q146. # Must be a cat
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default values for all languages, then en language
}

214 results (same as query.wikidata.org around the same time, and this is good, as we expect to see cats in the main graph)
wdqs1021.eqiad.wmnet
2024-0816 1843 UTC
SELECT ?s ?p ?o where {
  ?s wdt:P31 wd:Q13442814 
}
limit 10

0 results (as expected, this is the main graph and we're looking for scholarly article subjects and expect to see none)

wdqs1023.eqiad.wmnet (scholarly)

wdqs1023.eqiad.wmnet
2024-08-16 1821 UTC
select (count(*) as ?count) where { ?s ?p ?o }
8203755230
wdqs1023.eqiad.wmnet
2024-08-16 1822 UTC
select (count(*) as ?count) where { ?s ?p ?o }
8203765313
Good, this is increasing over time.
wdqs1023.eqiad.wmnet
2024-0816 1835 UTC
SELECT ?item ?itemLabel
WHERE
{
  ?item wdt:P31 wd:Q146. # Must be a cat
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],mul,en". } # Helps get the label in your language, if not, then default values for all languages, then en language
}

0 results (as expected, this is the scholarly graph and we don't expect to see cats)
wdqs1023.eqiad.wmnet
2024-0816 1842 UTC
SELECT ?s ?p ?o where {
  ?s wdt:P31 wd:Q13442814 
}
limit 10

10 results (as expected, this is the scholarly graph and we're looking for scholarly article subjects here)

query.wikidata.org (full)

query.wikidata.org
2024-08-16 1850 UTC
select (count(*) as ?count) where { ?s ?p ?o }
16260739264
Okay, this is close to the sum of the two graphs. There is a modest amount of necessary values duplication across the split graphs, so their sum is a bit larger than the centralized non-split full graph.

A manual local /etc/hosts entry on wdqs1021 was set up to allow federation to wdqs1023 via an allowlisted entry identified in wdqs1021:/etc/wdqs/allowlist-wdqs-blazegraph.txt (https://wdqs-scholarly.discovery.wmnet/sparql- so host of wdqs-scholarly.discovery.wmnet mapped to wdqs1023 IP address).

Use of a direct query against https://query.wikidata.org/ listed at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Simple_lookup_by_object_on_the_truthy_graph as follows

SELECT ?x ?xLabel ?r ?relLabel {
  ?x ?r wd:Q1542532 .
  ?rel wikibase:directClaim ?r
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel

we see 1759 results.

Using the same query on wdqs1021 via http://localhost:9999/bigdata/#query on tunnel ssh -N wdqs1021.eqiad.wmnet -L 9999:127.0.0.1:9999 returns just 1 result.

But using the recommended federated query and setting the SERVICE explicitly, it results in the expected 1759 results. Here's the adapted query:

SELECT ?x ?xLabel ?r ?relLabel {
  VALUES (?work) {(wd:Q1542532)}
  {
    ?x ?r ?work .
  } UNION {
    SERVICE <https://wdqs-scholarly.discovery.wmnet/sparql> {
      ?x ?r ?work .
      BIND(?xLabel AS ?xLabel)
      SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
    }
  }
  ?rel wikibase:directClaim ?r
  SERVICE wikibase:label { bd:serviceParam wikibase:language 'en,de,fr,ru,nl,it,ja,uk,cs,sk,be,ca'. }
} ORDER BY ?r ?xLabel

The line has been removed from /etc/hosts for now, but it may be useful to temporarily reinstate for some additional manual tests. If https://gerrit.wikimedia.org/r/c/operations/dns/+/1051446 is deployed, the Java process should be bounced if the Java DNS cache for Blazegraph hasn't already flushed the entry so that the discovery URL's corresponding host DNS entry points to the right place (it presently doesn't resolve via the DNS server, of course).

Mentioned in SAL (#wikimedia-operations) [2024-08-19T16:38:51Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-19T16:41:34Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-19T16:42:15Z] <ryankemper@cumin2002> END (ERROR) - Cookbook sre.wdqs.data-transfer (exit_code=97) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-19T16:42:41Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-19T17:33:36Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling neither afterwards

I worked the rest of the examples at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples with the /etc/hosts again. Note the wdsubgraph:scholarly_articles works as well as the full qualified graph URL.

They queries generally worked. There is one case where the total number of results is different by 1 - 1528 vs 1529 (but then replication happened and it got down to 1528 and 1528!).

The example here is:

URL

<number from query.wikidata.org> <number from federated query example>

unless otherwise noted.

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Simple_lookup_by_object_on_the_truthy_graph

1749 1749

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Simple_lookup_by_subject

3 3


https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Lookup_from_mwapi_results

10 10 (same kind of results for House of Medici related things)



https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Simple_count

Count returns...

653  653 (federated query style 1) 653 (federated query style 2)



https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Joining_papers_and_authors

0 0

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Finding_duplicated_external_ids_with_a_group_by

1529    1528 (very close, and then after some replication, this got down to 1528 again, matching what query.wikidata.org reported just 30 seconds earlier)

And running the second query that shows a union (which tries to address timeout issues), it works, with 10224 results.

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Property_paths

155 155

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Recent_publications

Using the simplified query:

28 (same as non-simplified)  28

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Number_of_articles_with_CiTO-annotated_citations_by_year

with the adapted query

59 59

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Publications_in_a_WikiProject_(Q16695773)_that_have_a_main_subject_that_is_an_instance_of_a_person

First query limit 100 returns 100 on query.wikidata.org

100 returned on the federated general form of the query with some overlap of mainSubject

And the final federated query works fine, pulling 100 records that look sensible

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Federated_Queries_Examples#Publications_in_a_WikiProject_(Q16695773)_but_where_the_linked_author_is_not_in_that_project

3 3

(also works if scoped only to scholarly subgraph, as per final query example - returns same 3 records)

And I worked the examples at https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide. They worked.

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#How_do_I_use_federation?

Same results in the federation applied in both directions, as expected.


https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#How_to_deal_with_linked_entities_spread_across_multiple_graphs?


Same results in the federation applied in both directions, as expected.

https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Wrapping_a_federated_query_with_a_SELECT

Second query fixes the mistake in the first query, as expected.



https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Returning_variables_bound_by_OPTIONAL

Second query runs faster and has a smaller result set, fixing the first query runs for a very long time.


https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/WDQS_graph_split/Internal_Federation_Guide#Misplacing_the_label_service

The second query gets the correct label, unlike the first query, as expected.

Mentioned in SAL (#wikimedia-operations) [2024-08-20T06:40:02Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-20T07:25:24Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1021.eqiad.wmnet -> wdqs1022.eqiad.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-21T21:25:20Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-21T21:39:00Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-21T22:09:13Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs1023.eqiad.wmnet -> wdqs2023.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-21T22:30:06Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-main journal) xfer wikidata_main from wdqs1022.eqiad.wmnet -> wdqs2022.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-22T18:36:26Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Mentioned in SAL (#wikimedia-operations) [2024-08-22T19:17:59Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer fresh wdqs-scholarly journal) xfer scholarly_articles from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling neither afterwards

Moved to Done. Will save marking it as Resolved for @Gehel since it's in the Needs Reporting column of the Discovery-Search workboard.

Mentioned in SAL (#wikimedia-operations) [2024-09-17T17:54:06Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-09-17T18:04:45Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs2024.codfw.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-09-17T18:05:43Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs1023.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-09-17T18:16:24Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs1023.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-09-17T18:16:57Z] <ryankemper@cumin2002> START - Cookbook sre.wdqs.data-transfer (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards

Mentioned in SAL (#wikimedia-operations) [2024-09-17T18:27:52Z] <ryankemper@cumin2002> END (PASS) - Cookbook sre.wdqs.data-transfer (exit_code=0) (T370754, transfer categories jnl) xfer categories from wdqs2023.codfw.wmnet -> wdqs1024.eqiad.wmnet w/ force delete existing files, repooling both afterwards