Page MenuHomePhabricator

AKhatun_WMF (Aisha Khatun)
Contract Data Analyst @ WDQS

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Wednesday

  • Clear sailing ahead.

User Details

User Since
Apr 20 2021, 8:39 AM (14 w, 6 d)
Availability
Available
IRC Nick
tanny411
LDAP User
AKhatun
MediaWiki User
AKhatun (WMF) [ Global Accounts ]

Personal Accounts:

Check out my website/blog: http://tanny411.github.io/

Recent Activity

Mon, Jul 26

AKhatun_WMF moved T287225: Add all prefixes defined in Blazegraph from All WDQS-related tasks to Analysis on the Wikidata-Query-Service board.
Mon, Jul 26, 11:25 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T286436: Deduplicate triples when loading the wikibase RDF dumps into hive.
Mon, Jul 26, 11:24 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T286436: Deduplicate triples when loading the wikibase RDF dumps into hive.

Joseph will suggest an optimization to this task when he is back. For now a simple .distinct() has been done on Spark dataframe to facilitate analysis on Wikidata dumps.

Mon, Jul 26, 11:23 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sat, Jul 24

AKhatun_WMF added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

Some of the statistics that is wanted are listed on Scholia, currently on the frontpage: https://scholia.toolforge.org/ (UPDATE: now here: https://scholia.toolforge.org/statistics)

"percentage, number of Wikidata entities that are scholarly article":
37.246.721 Scholarly articles, so 37/97 ~ 40% are scholarly articles.

Sat, Jul 24, 10:24 AM · Wikidata, Wikidata-Query-Service

Fri, Jul 23

AKhatun_WMF created T287225: Add all prefixes defined in Blazegraph.
Fri, Jul 23, 4:26 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Mon, Jul 19

AKhatun_WMF added a comment to T285465: Document and analyze the number of parsing errors for parsed WDQS queries.

@dcausse: Yes, just adding the prefix declaration in Jena parser is what we want to do.
@JAllemandou: Should I add the other prefixes as well?

Mon, Jul 19, 2:04 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Fri, Jul 16

AKhatun_WMF updated subscribers of T285465: Document and analyze the number of parsing errors for parsed WDQS queries.
Fri, Jul 16, 1:35 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T285465: Document and analyze the number of parsing errors for parsed WDQS queries.
  • For June, the average daily successful parsing rate was ~85%. Ranging from 75% to 90%. Note that this only includes queries with status 200 and 500.
  • 11% of the distinct queries ran into errors related to prefixes. The number of distinct queries due to each prefix is shown below. By adding the first 4 prefixes (mwapi, geof, foaf, gas) into the query processors' prefix list the average daily successful parsing rate was >96%. A few prefixes were off slightly (data instead of wdata, ref instead of wdref. These account for very few queries, but I fixed them nevertheless.)
prefix_namecount
mwapi7419357
geof54183
foaf17198
gas13753
wds2761
wdv216
fn62
dc50
mediawiki23
wdref22
wdata3

Total distinct queries: 67467327

  • Other errors included:
    • Variable used when already in-scope. This happened when the same variable was reused in a query. Testing such queries in WDQS returns results nicely. These form 2% of the errors in distinct queries.
    • Another notable error is the WITH clause. Although it runs well in WDQS, parser doesn't accept it. These form 2.5% of the distinct queries.
Fri, Jul 16, 1:34 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Tue, Jul 13

AKhatun_WMF moved T285465: Document and analyze the number of parsing errors for parsed WDQS queries from Analysis to Current work on the Wikidata-Query-Service board.
Tue, Jul 13, 10:22 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T285465: Document and analyze the number of parsing errors for parsed WDQS queries.
Tue, Jul 13, 10:22 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sun, Jul 11

AKhatun_WMF updated the task description for T286410: Requesting update to SSH key for Aisha Khatun.
Sun, Jul 11, 11:29 AM · SRE, SRE-Access-Requests
AKhatun_WMF created T286410: Requesting update to SSH key for Aisha Khatun.
Sun, Jul 11, 11:25 AM · SRE, SRE-Access-Requests
AKhatun_WMF added a comment to T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.

Thanks!

Sun, Jul 11, 11:05 AM · SRE, SRE-Access-Requests
AKhatun_WMF added a comment to T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.

Hi @akosiaris, I had to fresh install OS and lost my ssh keys. Is it possible to change it so I can regain access? Should I put on a new public key here?

Sun, Jul 11, 9:02 AM · SRE, SRE-Access-Requests
AKhatun_WMF reopened T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun as "Open".
Sun, Jul 11, 9:00 AM · SRE, SRE-Access-Requests

Jun 23 2021

AKhatun_WMF added a comment to T282790: Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.

Some of the vertical analyses were done as a part of familiarizing with wikidata. See the findings in Wikidata_Vertical_Analysis. Will get back to this ticket when done with T282139.

Jun 23 2021, 9:16 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jun 22 2021

AKhatun_WMF moved T282139: Provide a quantitative description of the Wikidata-triples dataset from Incoming to In Progress on the Discovery-Search (Current work) board.
Jun 22 2021, 8:23 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T282139: Provide a quantitative description of the Wikidata-triples dataset.
Jun 22 2021, 7:47 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T273854: Automate regular WDQS query parsing and data-extraction from Current work to Analysis on the Wikidata-Query-Service board.
Jun 22 2021, 7:47 AM · Discovery-Search (Current work), Wikidata-Query-Service, Analytics, Wikidata
AKhatun_WMF moved T282139: Provide a quantitative description of the Wikidata-triples dataset from Analysis to Current work on the Wikidata-Query-Service board.
Jun 22 2021, 7:46 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T283258: Provide a job regularly deleting wdqs processed query after 90 days from Current work to Analysis on the Wikidata-Query-Service board.
Jun 22 2021, 7:46 AM · Discovery-Search (Current work), Patch-For-Review, Wikidata, Wikidata-Query-Service

Jun 21 2021

AKhatun_WMF committed rWDAN58fc22bef150: Airflow dag to extract and process sparql queries (authored by AKhatun_WMF).
Airflow dag to extract and process sparql queries
Jun 21 2021, 5:43 PM

Jun 4 2021

AKhatun_WMF triaged T283256: Extract operator/nodes/triples/paths/exprs list from queries as Low priority.
Jun 4 2021, 7:24 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T273854: Automate regular WDQS query parsing and data-extraction.
Jun 4 2021, 7:22 AM · Discovery-Search (Current work), Wikidata-Query-Service, Analytics, Wikidata
AKhatun_WMF closed T283255: Create CLI job extracting info from wdqs queries as Resolved.
Jun 4 2021, 7:21 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF closed T283255: Create CLI job extracting info from wdqs queries, a subtask of T280640: [EPIC] Refine WDQS queries analysis, as Resolved.
Jun 4 2021, 7:20 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jun 3 2021

AKhatun_WMF added a comment to T282139: Provide a quantitative description of the Wikidata-triples dataset.

Some of the suggested information to analyse or extract through this analysis are:

  • Top items
  • Top properties
  • Top subject, object types
  • Top property types
  • Top wikidata vs other predicates
  • Number of S, P, O that don't involve wikidata
    • The aim is to find the size of the subgraph not concerning wikidata, i.e size of leaves. They are leaves because once they point to something outside of wikidata, they are not expanded within wikidata. Some things are not even exapandable like literals. If we have too many leaves, we may consider using property graphs (where leaves will be listed as properties of a node).
Jun 3 2021, 6:53 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jun 1 2021

AKhatun_WMF added a comment to T283256: Extract operator/nodes/triples/paths/exprs list from queries.

Update 1 June 2021:

Jun 1 2021, 9:59 AM · Wikidata, Wikidata-Query-Service

May 27 2021

AKhatun_WMF moved T273854: Automate regular WDQS query parsing and data-extraction from Tracking to Analysis on the Wikidata-Query-Service board.
May 27 2021, 2:07 PM · Discovery-Search (Current work), Wikidata-Query-Service, Analytics, Wikidata

May 25 2021

AKhatun_WMF removed a project from T280640: [EPIC] Refine WDQS queries analysis: Patch-For-Review.
May 25 2021, 8:47 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

May 24 2021

AKhatun_WMF added a comment to T283256: Extract operator/nodes/triples/paths/exprs list from queries.

Idea on how to store the SPARQL query as a list:
Let's make a list of generic custom class QueryElem[T]. QueryElem contains elemType: String and elem: T.

May 24 2021, 10:53 AM · Wikidata, Wikidata-Query-Service

May 21 2021

dcausse awarded T282129: Test triple-analysis functions over a large dataset with Spark a Love token.
May 21 2021, 7:19 AM · Wikidata, Wikidata-Query-Service

May 20 2021

AKhatun_WMF added a comment to T282130: Provide a way to save extracted query-information in parquet format.

@AKhatun_WMF That's great! could you please provide some info on expected data-size in parquet (for daily data for instance)? Many thanks.

May 20 2021, 9:24 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF updated the task description for T282130: Provide a way to save extracted query-information in parquet format.
May 20 2021, 9:23 AM · Wikidata, Wikidata-Query-Service

May 19 2021

AKhatun_WMF claimed T282130: Provide a way to save extracted query-information in parquet format.
May 19 2021, 11:34 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T282129: Test triple-analysis functions over a large dataset with Spark.
May 19 2021, 11:15 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF closed T282127: Add unit-tests to WDQS analysis toolkit, a subtask of T280640: [EPIC] Refine WDQS queries analysis, as Resolved.
May 19 2021, 10:36 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF closed T282127: Add unit-tests to WDQS analysis toolkit as Resolved.
May 19 2021, 10:36 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T282127: Add unit-tests to WDQS analysis toolkit.

Unit tests done, patch merged!

  • Created a file containing queries that pass and also a file containing queries that don't pass. Those are checked for correctness in the unit tests.
  • Checked correctness of extracted nodes for 2 examples queries written inline in the code.
May 19 2021, 10:35 AM · Wikidata, Wikidata-Query-Service

May 7 2021

AKhatun_WMF claimed T282127: Add unit-tests to WDQS analysis toolkit.
May 7 2021, 1:43 PM · Wikidata, Wikidata-Query-Service

Apr 23 2021

AKhatun_WMF added a comment to T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.

Thanks, I've updated it. The only thing thats left in the 'All of the above' section on Requesting_access page is 'An ssh key for your shell user'. Let me know if I should I add that as well.

Apr 23 2021, 12:55 PM · SRE, SRE-Access-Requests
AKhatun_WMF updated the task description for T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.
Apr 23 2021, 12:52 PM · SRE, SRE-Access-Requests
AKhatun_WMF claimed T280640: [EPIC] Refine WDQS queries analysis.
Apr 23 2021, 10:50 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF created T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.
Apr 23 2021, 10:14 AM · SRE, SRE-Access-Requests

Apr 20 2021

AKhatun_WMF updated AKhatun_WMF.
Apr 20 2021, 8:58 AM