Maniphest T280640

[EPIC] Refine WDQS queries analysis
Closed, ResolvedPublic
Actions

Assigned To

Authored By

	JAllemandou
	Apr 20 2021, 9:38 AM

Tags

Referenced Files

None

Subscribers

Description

The current analysis parses queries and extracts:

Operators (list, and map with number of usage)
Nodes (variables, URIs, literals, blanck nodes) map with number of usage
Prefixes (map with number of usage)
Services (map with number of usage)
Wikidata names (URIs with main value matching regex "^[QP]\\d+$")
Expressions
Paths

The values used to identify operators, expressions, path or nodes are string, either the detailed name (for operators or nodes for instance), or the full print of the subtree portion (for path or expressions for instance).

One thing we badly miss for our analysis is triple-pattern-matching information: when a triple-pattern is met , which form is it in ( <? - P - O>, <S - P - ?> for instance), and what are the defined value it embeds (URIs, literals etc). With that information we should be able to be more precise in term of triple-pattern usages in queries, possibly also getting a better feel of subgraphs heavily used.

Related Objects
Search...

		Status	Subtype	Assigned	Task
		Resolved		AKhatun_WMF	T280640 [EPIC] Refine WDQS queries analysis
		Resolved		AKhatun_WMF	T282127 Add unit-tests to WDQS analysis toolkit
		Resolved		AKhatun_WMF	T282129 Test triple-analysis functions over a large dataset with Spark
		Resolved		AKhatun_WMF	T282130 Provide a way to save extracted query-information in parquet format
		Resolved		AKhatun_WMF	T283255 Create CLI job extracting info from wdqs queries
		Declined		AKhatun_WMF	T283256 Extract operator/nodes/triples/paths/exprs list from queries
		Resolved		AKhatun_WMF	T273854 Automate regular WDQS query parsing and data-extraction
		Resolved		AKhatun_WMF	T283258 Provide a job regularly deleting wdqs processed query after 90 days
		Resolved		AKhatun_WMF	T285465 Document and analyze the number of parsing errors for parsed WDQS queries
		Resolved		AKhatun_WMF	T287225 Add all prefixes defined in Blazegraph

Event Timeline

JAllemandou created this task.Apr 20 2021, 9:38 AM

Restricted Application added a project: Wikidata. · View Herald TranscriptApr 20 2021, 9:38 AM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

AKhatun_WMF subscribed.Apr 20 2021, 9:40 AM

tanny411 unsubscribed.Apr 20 2021, 9:42 AM

AKhatun_WMF claimed this task.Apr 23 2021, 10:50 AM

Change 684346 had a related patch set uploaded (by AKhatun; author: AKhatun):

[wikidata/query/rdf@master] Analyze sparql triple

https://gerrit.wikimedia.org/r/684346

gerritbot added a project: Patch-For-Review.May 3 2021, 11:31 AM

• MPhamWMF moved this task from Incoming to Current work on the Wikidata-Query-Service board.May 3 2021, 3:18 PM

• MPhamWMF added a project: Discovery-Search (Current work).

Gehel moved this task from Incoming to In Progress on the Discovery-Search (Current work) board.May 3 2021, 3:19 PM

JAllemandou added a subtask: T282127: Add unit-tests to WDQS analysis toolkit.May 6 2021, 1:28 PM

JAllemandou added a subtask: T282129: Test triple-analysis functions over a large dataset with Spark.May 6 2021, 1:31 PM

JAllemandou added a subtask: T282130: Provide a way to save extracted query-information in parquet format.May 6 2021, 1:34 PM

AKhatun_WMF closed subtask T282127: Add unit-tests to WDQS analysis toolkit as Resolved.May 19 2021, 10:35 AM

JAllemandou closed subtask T282130: Provide a way to save extracted query-information in parquet format as Resolved.May 20 2021, 11:59 AM

JAllemandou closed subtask T282129: Test triple-analysis functions over a large dataset with Spark as Resolved.May 20 2021, 12:26 PM

JAllemandou added a subtask: T273854: Automate regular WDQS query parsing and data-extraction.May 20 2021, 4:24 PM

AKhatun_WMF removed a project: Patch-For-Review.May 25 2021, 8:47 AM

CBogen moved this task from Current work to Analysis on the Wikidata-Query-Service board.May 27 2021, 1:51 PM

AKhatun_WMF closed subtask T283255: Create CLI job extracting info from wdqs queries as Resolved.Jun 4 2021, 7:20 AM

• MPhamWMF triaged this task as Medium priority.Jun 10 2021, 1:42 PM

JAllemandou added a subtask: T285465: Document and analyze the number of parsing errors for parsed WDQS queries.Jun 24 2021, 11:06 AM

• MPhamWMF renamed this task from Refine WDQS queries analysis to [EPIC] Refine WDQS queries analysis.Jun 24 2021, 1:39 PM

• MPhamWMF moved this task from Analysis to Epics on the Wikidata-Query-Service board.

Gehel closed subtask T283258: Provide a job regularly deleting wdqs processed query after 90 days as Resolved.Jul 26 2021, 12:14 PM

Gehel closed subtask T273854: Automate regular WDQS query parsing and data-extraction as Resolved.

Gehel closed subtask T285465: Document and analyze the number of parsing errors for parsed WDQS queries as Resolved.Jul 26 2021, 12:21 PM

CBogen moved this task from In Progress to Epics on the Discovery-Search (Current work) board.Aug 5 2021, 1:36 PM

Gehel closed this task as Resolved.Feb 22 2022, 8:39 PM

Gehel closed subtask T283256: Extract operator/nodes/triples/paths/exprs list from queries as Declined.