Page MenuHomePhabricator

AKhatun_WMF (Aisha Khatun)
Contract Data Analyst @ WDQS

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 20 2021, 8:39 AM (67 w, 6 d)
Availability
Available
IRC Nick
tanny411
LDAP User
AKhatun
MediaWiki User
AKhatun (WMF) [ Global Accounts ]

Personal Accounts:

Check out my website/blog: http://tanny411.github.io/

Recent Activity

Mon, Jul 11

AKhatun_WMF added a project to T279416: Deploy Image content filtration model for Wikimedia Commons: WMF-Inspiration-Week-2022-ML-Collab.
Mon, Jul 11, 8:58 AM · WMF-Inspiration-Week-2022-ML-Collab, artificial-intelligence

Jul 8 2022

AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

In terms of the exact code causing this, spark is terrible at telling us exactly where but trying to infer from the SparkUI output i think it's this join:

def getTopSubgraphItems(topSubgraphs: DataFrame): DataFrame = {
  wikidataTriples
    .filter(s"predicate='<$p31>'")
    .selectExpr("object as subgraph", "subject as item")
    .join(topSubgraphs.select("subgraph"), Seq("subgraph"), "right")
Jul 8 2022, 5:47 AM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)

Jul 7 2022

AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

Update:
I tested a few options in the statbox, I am not sure how much this will represent the prod env, but here goes:

Jul 7 2022, 12:20 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

the airflow patch is deployed but i only turned on *_init dags and subgraph_mapping_weekly today (ran out of time, will do rest tomorrow).

subgraph_mapping_weekly failed the first time through. I updated executor memory from 8g to 12g but the second execution is still failing. something is quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs varying from 100MB to 450MB giving executions times of ~30s on the small ones and ~8m before the final one fails.

Not specifically related to this patch, but i wonder if we could change up the SparkUtils.saveTables method to somehow take parameters in the path to specify coalesce vs repartition and the number of partitions to save by, so we only have to update the airflow invocation and not the jar as well to test variations there.

Jul 7 2022, 7:41 AM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)

Jun 5 2022

AKhatun_WMF placed T271400: Collect analytics data such as pageview up for grabs.
Jun 5 2022, 12:56 PM · Abstract Wikipedia team

Mar 15 2022

AKhatun_WMF moved T303831: Productionize Wikidata subgraph analysis from Incoming to In Progress on the Discovery-Search (Current work) board.
Mar 15 2022, 2:10 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF moved T303831: Productionize Wikidata subgraph analysis from Incoming to Current work on the Wikidata-Query-Service board.
Mar 15 2022, 2:10 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF created T303831: Productionize Wikidata subgraph analysis.
Mar 15 2022, 2:08 PM · Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF placed T299921: Estimate benefits of splitting and federating Wikidata subgraphs up for grabs.
Mar 15 2022, 1:52 PM · Wikidata, Wikidata-Query-Service

Feb 10 2022

AKhatun_WMF updated the task description for T299453: Coordinate Wikimedia's participation in GSoC 2022 and Outreachy Round 24.
Feb 10 2022, 8:45 AM · Developer-Advocacy (Jul-Sep 2022), Outreachy (Round 24), Google-Summer-of-Code (2022)

Jan 31 2022

AKhatun_WMF moved T299921: Estimate benefits of splitting and federating Wikidata subgraphs from Analysis to Current work on the Wikidata-Query-Service board.
Jan 31 2022, 2:02 PM · Wikidata, Wikidata-Query-Service

Jan 20 2022

AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis

Jan 20 2022, 3:24 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 18 2022

AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@AKhatun_WMF: You mention on the wiki that some Items don't have an ORES score. All Items should have one 😬 Do you have an example of one that does not?

Jan 18 2022, 5:44 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant when I wrote the question about "optimal separation" is given some distribution of ORES scores (e.g. a normal distribution), is it clear what the threshold is for what qualifies as a "high" vs "low" score: e.g. anything over .75 is a high score. But that's assuming the scores are continuous. I guess it's moot if they're binary (I don't actually know).

If this isn't a sensible way of thinking about the issue, let me know if there's a better way.

Jan 18 2022, 3:16 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated subscribers of T288262: Estimate how many Wikidata items have low/no ORES score.

@MPhamWMF Hi, could you please clarify the question Is there an optimal separation between high/low ORES scores?. Separation in what respect? To my mind comes the separation of items with respect to the subgraph it is part of.

Jan 18 2022, 6:52 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 12 2022

AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@ACraze Indeed! I was confusing the models for revision (item quality) with edits (damaging/good faith). The latest revision is all I will need. Thank you!

Jan 12 2022, 4:02 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 10 2022

AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from Incoming to In Progress on the Discovery-Search (Current work) board.
Jan 10 2022, 7:28 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from Analysis to Current work on the Wikidata-Query-Service board.
Jan 10 2022, 7:28 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 6 2022

AKhatun_WMF moved T288257: Get estimates for size of astronomical objects and queries in Wikidata graph from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

Counts of queries and triples for astronomical objects were done here: Wikidata_Subgraph_Query_Analysis, along with the top ~300 large subgraphs.
For the specific case of Astronomical objects (and only astronomical objects), a list of all its subclasses was obtained and manually inspected for relevance to astronomical objects. The subclass list also consists of subclasses of subclasses and so on.

Jan 6 2022, 6:12 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Details can be found here: Wikidata_Subgraph_Query_Analysis

Jan 6 2022, 5:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:45 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T288257: Get estimates for size of astronomical objects and queries in Wikidata graph from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:44 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Jan 6 2022, 5:39 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a project to T293631: Get estimates for splitting other large subgraphs from Wikidata: Discovery-Search (Current work).

With the completion of T293632 and T293636, this task is complete.

Jan 6 2022, 5:39 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Jan 6 2022, 5:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a project to T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: Discovery-Search (Current work).
Jan 6 2022, 5:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:36 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from incoming to in progress on the Wikidata board.

With the completion of all subtasks, this task is complete.

Jan 6 2022, 5:35 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

The analysis was completed and documented here: Wikidata_Subgraph_Query_Analysis

Jan 6 2022, 5:33 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 15 2021

AKhatun_WMF claimed T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake.
Nov 15 2021, 4:15 AM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, Product-Analytics

Nov 11 2021

AKhatun_WMF moved T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake from Analysis to Current work on the Wikidata-Query-Service board.
Nov 11 2021, 9:21 AM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, Product-Analytics
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from Analysis to Current work on the Wikidata-Query-Service board.
Nov 11 2021, 9:21 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 9 2021

AKhatun_WMF moved T291205: Analysis: Property usage by items' P31 from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Nov 9 2021, 1:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a project to T291205: Analysis: Property usage by items' P31: Discovery-Search (Current work).

Some analysis was done here:

Nov 9 2021, 1:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Nov 9 2021, 1:07 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a comment to T293632: Analysis of large subgraphs in Wikidata.

The analysis was completed and documented here: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis

Nov 9 2021, 1:06 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 8 2021

AKhatun_WMF added a comment to T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure.
Nov 8 2021, 9:30 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure.
Nov 8 2021, 8:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Oct 19 2021

AKhatun_WMF added a comment to T288264: Get estimates for all Wikidata statements of a specific datatype.

Basically Wikidata's Properties have a datatype.

Ah, datatype of properties.

I am not seeing that in the analysis you linked but maybe I am overlooking something.

The one I listed is for datatype of objects, so you didn't miss anything.
Thank you for clarifying! It should be fairly easy to find out as well :)

Oct 19 2021, 4:20 PM · Wikidata, Wikidata-Query-Service

Oct 18 2021

AKhatun_WMF updated subscribers of T288264: Get estimates for all Wikidata statements of a specific datatype.

@Lydia_Pintscher
Is this ticket asking for counts of various datatype used in WIkidata? Both URI and literals.
Does wikitech:User:AKhatun/Wikidata_Basic_Analysis#Object help?

Oct 18 2021, 5:11 PM · Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Oct 18 2021, 2:40 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:39 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293636: Identify and analyze queries that touch on various large subgraphs.
Oct 18 2021, 2:23 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF updated the task description for T293632: Analysis of large subgraphs in Wikidata.
Oct 18 2021, 2:20 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293632: Analysis of large subgraphs in Wikidata.
Oct 18 2021, 2:18 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293631: Get estimates for splitting other large subgraphs from Wikidata.
Oct 18 2021, 2:12 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF updated the task description for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 2:01 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF removed a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.
Oct 18 2021, 1:58 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a parent task for T288257: Get estimates for size of astronomical objects and queries in Wikidata graph: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Oct 18 2021, 1:58 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a parent task for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T288257: Get estimates for size of astronomical objects and queries in Wikidata graph: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Oct 18 2021, 1:57 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:57 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T291205: Analysis: Property usage by items' P31.
Oct 18 2021, 1:54 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T291205: Analysis: Property usage by items' P31: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:54 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:52 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a parent task for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:52 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:51 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Oct 4 2021

AKhatun_WMF added a comment to T292306: [DSE Hackathon] Sounds of the Commons: Neural Audio Mashups.

Interested in playing with autoencoders.

write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files

Does this entail we train the autoencoder with the dataset we curated from commons and then have it generate a sample audio file from random numbers? Maybe I'm a bit confused about what 'randomly combining' audio files means here.

Oct 4 2021, 11:47 AM · Machine-Learning-Team

Sep 27 2021

AKhatun_WMF moved T291205: Analysis: Property usage by items' P31 from Analysis to Current work on the Wikidata-Query-Service board.
Sep 27 2021, 10:28 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T291205: Analysis: Property usage by items' P31.
Sep 27 2021, 10:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sep 24 2021

AKhatun_WMF added a comment to T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.

Astronomical objects are structured hierarchically and so not everything is direct instance of Q6999 (unlike scholarly articles).

Sep 24 2021, 12:08 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated the task description for T291205: Analysis: Property usage by items' P31.
Sep 24 2021, 11:56 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T291190: Determine cost-benefit of doing vertical data slicing on WDQS from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Sep 24 2021, 11:44 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF edited projects for T291190: Determine cost-benefit of doing vertical data slicing on WDQS, added: Discovery-Search (Current work); removed Discovery-Search.
Sep 24 2021, 11:43 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF added a project to T291190: Determine cost-benefit of doing vertical data slicing on WDQS: Discovery-Search.
Sep 24 2021, 11:40 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF moved T291190: Determine cost-benefit of doing vertical data slicing on WDQS from Analysis to Current work on the Wikidata-Query-Service board.
Sep 24 2021, 11:32 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF added a comment to T291190: Determine cost-benefit of doing vertical data slicing on WDQS.

Query analysis report for some vertical slices of Wikidata: Wikidata_Vertical_Analysis#Query_Analysis
Summary: Wikidata_Vertical_Analysis#TL;DR

Sep 24 2021, 11:31 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF moved T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Sep 24 2021, 11:25 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

Here is the analysis done on scholarly articles in Wikidata and WDQS queries related to them: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis

Sep 24 2021, 11:23 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated the task description for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Sep 24 2021, 11:14 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sep 17 2021

AKhatun_WMF added a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T291190: Determine cost-benefit of doing vertical data slicing on WDQS.
Sep 17 2021, 7:31 AM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a parent task for T291190: Determine cost-benefit of doing vertical data slicing on WDQS: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Sep 17 2021, 7:31 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata

Aug 26 2021

AKhatun_WMF created T289754: Triple level deduplication.
Aug 26 2021, 6:02 AM · Wikidata, Wikidata-Query-Service
AKhatun_WMF created T289753: Optimize deduplication of triples when loading into wikibase RDF dumps.
Aug 26 2021, 5:25 AM · Wikidata, Wikidata-Query-Service

Aug 10 2021

AKhatun_WMF added a comment to T287225: Add all prefixes defined in Blazegraph.

This is now deployed, the first hour of processing it applies to should be 2021-08-10T14:00Z

Aug 10 2021, 4:52 PM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Aug 9 2021

So9q awarded T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata a Burninate token.
Aug 9 2021, 12:10 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Aug 6 2021

AKhatun_WMF updated the task description for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Aug 6 2021, 1:18 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

@AKhatun_WMF, when you write "authors connected to other subgraphs", do you mean subgraphs within Wikidata (so, excluding external identifiers), or also graphs from other resources part of, for example, the Linked Open Data Cloud?

Aug 6 2021, 1:18 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated the task description for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Aug 6 2021, 12:49 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T287225: Add all prefixes defined in Blazegraph from Analysis to Current work on the Wikidata-Query-Service board.
Aug 6 2021, 6:35 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Aug 6 2021, 6:35 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 26 2021

AKhatun_WMF moved T287225: Add all prefixes defined in Blazegraph from Incoming to Analysis on the Wikidata-Query-Service board.
Jul 26 2021, 11:25 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T286436: Deduplicate triples when loading the wikibase RDF dumps into hive.
Jul 26 2021, 11:24 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T286436: Deduplicate triples when loading the wikibase RDF dumps into hive.

Joseph will suggest an optimization to this task when he is back. For now a simple .distinct() has been done on Spark dataframe to facilitate analysis on Wikidata dumps.

Jul 26 2021, 11:23 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 24 2021

AKhatun_WMF added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

Some of the statistics that is wanted are listed on Scholia, currently on the frontpage: https://scholia.toolforge.org/ (UPDATE: now here: https://scholia.toolforge.org/statistics)

"percentage, number of Wikidata entities that are scholarly article":
37.246.721 Scholarly articles, so 37/97 ~ 40% are scholarly articles.

Jul 24 2021, 10:24 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 23 2021

AKhatun_WMF created T287225: Add all prefixes defined in Blazegraph.
Jul 23 2021, 4:26 AM · Patch-For-Review, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 19 2021

AKhatun_WMF added a comment to T285465: Document and analyze the number of parsing errors for parsed WDQS queries.

@dcausse: Yes, just adding the prefix declaration in Jena parser is what we want to do.
@JAllemandou: Should I add the other prefixes as well?

Jul 19 2021, 2:04 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 16 2021

AKhatun_WMF updated subscribers of T285465: Document and analyze the number of parsing errors for parsed WDQS queries.
Jul 16 2021, 1:35 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T285465: Document and analyze the number of parsing errors for parsed WDQS queries.

@JAllemandou @dcausse

  • For June, the average daily successful parsing rate was ~85%. Ranging from 75% to 90%. Note that this only includes queries with status 200 and 500.
  • 11% of the distinct queries ran into errors related to prefixes. The number of distinct queries due to each prefix is shown below. By adding the first 4 prefixes (mwapi, geof, foaf, gas) into the query processors' prefix list the average daily successful parsing rate was ~95% (93% to 97%). A few prefixes were off slightly (data instead of wdata, ref instead of wdref. These account for very few queries, but I fixed them nevertheless.)
prefix_namecount
mwapi7419357
geof54183
foaf17198
gas13753
wds2761
wdv216
fn62
dc50
mediawiki23
wdref22
wdata3

Total distinct queries: 67467327

  • Other errors included:
    • Variable used when already in-scope. This happened when the same variable was reused in a query. Testing such queries in WDQS returns results nicely. These form 2% of the errors in distinct queries.
    • Another notable error is the WITH clause. Although it runs well in WDQS, parser doesn't accept it. These form 2.5% of the distinct queries.
Jul 16 2021, 1:34 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 13 2021

AKhatun_WMF moved T285465: Document and analyze the number of parsing errors for parsed WDQS queries from Analysis to Current work on the Wikidata-Query-Service board.
Jul 13 2021, 10:22 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T285465: Document and analyze the number of parsing errors for parsed WDQS queries.
Jul 13 2021, 10:22 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jul 11 2021

AKhatun_WMF updated the task description for T286410: Requesting update to SSH key for Aisha Khatun.
Jul 11 2021, 11:29 AM · SRE, SRE-Access-Requests
AKhatun_WMF created T286410: Requesting update to SSH key for Aisha Khatun.
Jul 11 2021, 11:25 AM · SRE, SRE-Access-Requests
AKhatun_WMF added a comment to T280967: Requesting access to Wikimedia Analytics Data for Aisha Khatun.

Thanks!

Jul 11 2021, 11:05 AM · SRE, SRE-Access-Requests