Page MenuHomePhabricator

AKhatun_WMF (Aisha Khatun)
Contract Data Analyst @ WDQS

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Saturday

  • Clear sailing ahead.

User Details

User Since
Apr 20 2021, 8:39 AM (111 w, 2 d)
Availability
Available
IRC Nick
tanny411
LDAP User
AKhatun
MediaWiki User
AKhatun (WMF) [ Global Accounts ]

Personal Accounts:

Check out my website/blog: http://tanny411.github.io/

Recent Activity

Sat, Jun 3

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 29/5/23 - 4/6/23 Update:

  • Fixed template parsing to accommodate the use of lang param in template
  • Parsed and saved all language wiktionary misspelling.
  • Did some analysis, de-duplication, and saved all_wiki combined wiktionary misspellings.
  • More errors found in template parsing (named params occur before un-named params causing incorrect parsing)
Sat, Jun 3, 5:23 AM · Research (FY2022-23-Research-April-June)

Sun, May 28

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 22/5/23 - 28/5/23 Update:

  • Updated MR9 with summary
  • Created Issue 14. Changed wiktionary parser script to make it work with all languages. Need to figure out some changes in template params.
Sun, May 28, 11:41 PM · Research (FY2022-23-Research-April-June)

Mon, May 22

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 15/5/23 - 21/5/23 Update:

  • Checked example use of misspelling of templates in all the collected 16 languages. All languages look similar to enwiktionary except trwiktionary (small change) and viwiktionary
Mon, May 22, 3:32 AM · Research (FY2022-23-Research-April-June)

Fri, May 12

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 8/5/23 - 14/5/23 Update:

  • Created Issues 12 and 13. Started working on them: identify misspelling of templates in other languages and find usage of these templates. The templates would be collected from Q50368067, misspelling of named templates in other languages, and their redirects.
Fri, May 12, 10:29 PM · Research (FY2022-23-Research-April-June)
AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 1/5/23 - 7/5/23 Update:

  • Incorporated feedback and had MR7 merged (refactor repo)
  • Analysis done on extracted misspellings, sent MR8. Based on feedback, some more analysis done.
Fri, May 12, 3:05 AM · Research (FY2022-23-Research-April-June)

Apr 30 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 24/4/23 - 30/4/23 Update:

  • Incorporate Isaacs feedback for MR5 and 6. All MRs merged after some editing and discussion.
  • Created MR7 to Refactored repo
  • extracted misspelling from all language wikipedias.
    • Todo: analysis on extracted data
Apr 30 2023, 11:48 PM · Research (FY2022-23-Research-April-June)

Apr 24 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 10/4/23 - 16/4/23 Update:

  • Add info on language detection (language, confidence, text sent to model). Analyze examples.
  • Add proxy tables: tables that were not detected by mwparserfromhell.
  • Separate cell data of tables: each cell in table is now a node. Stuck with cell data/paragraph text to send to model.
Apr 24 2023, 4:19 PM · Research (FY2022-23-Research-April-June)

Apr 8 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 3/4/23 - 9/4/23 Update:

  • Pushed revised code that includes all additional formatting as a list (as discussed).
  • Fixed quotations detected. Added fasttext language detection.
  • Analysed collected misspellings from context. Some work need to be done to increase precision of detected language.
Apr 8 2023, 8:06 PM · Research (FY2022-23-Research-April-June)

Apr 1 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 27/3/23 - 2/4/23 Update:

  • Apply additional filter information to extracted misspellings: Capitalization, word length, part of a list item, inside of quotations (in any language)
  • Still need to figure out the data's structure and add fasttext detected language information
Apr 1 2023, 7:26 PM · Research (FY2022-23-Research-April-June)

Mar 25 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 20/3/23 - 26/3/23 Update:

Mar 25 2023, 7:30 PM · Research (FY2022-23-Research-April-June)

Mar 18 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 13/3/23 - 19/3/23 Update:

Mar 18 2023, 4:16 AM · Research (FY2022-23-Research-April-June)

Mar 11 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 6/3/23 - 12/3/23 Update:

  • Compared collected en and fr misspellings with AutowikiBrowser Typo list. Merge requested. Summary here
  • Started working on extracting wikipedia text to find the ratio of misspellings
Mar 11 2023, 2:46 AM · Research (FY2022-23-Research-April-June)
AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Mar 11 2023, 2:42 AM · Research (FY2022-23-Research-April-June)

Mar 4 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 27/2/23 - 5/3/23 Update:

  • Address comments for Issue #5
    • Parse sections line by line, consider templates in # items (numbered list)
    • Count the number of definitions by # count, excluding ## #: #; and #*
    • Also change the data format a bit to make it more readable
  • To address Issue 6: get list of misspellings from another Language and compare the collected lists to existing approaches
    • collected bnwiktionary templates. It does not have much Bangla words. Its the same as present in enwiktionary. Will work with existing collected Spanish misspellings instead.
    • for English, compared collected list with enwiki Lists_of_common_misspellings
Mar 4 2023, 1:30 AM · Research (FY2022-23-Research-April-June)

Mar 3 2023

AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Mar 3 2023, 1:59 AM · Research (FY2022-23-Research-April-June)

Feb 25 2023

AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Feb 25 2023, 6:22 AM · Research (FY2022-23-Research-April-June)
AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 20/2/23 - 26/2/23 Update:

Feb 25 2023, 6:20 AM · Research (FY2022-23-Research-April-June)

Feb 18 2023

AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Feb 18 2023, 4:40 AM · Research (FY2022-23-Research-April-June)
AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 13/2/23 - 19/2/23 Update:

Feb 18 2023, 4:40 AM · Research (FY2022-23-Research-April-June)

Feb 16 2023

AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Feb 16 2023, 9:12 PM · Research (FY2022-23-Research-April-June)
AKhatun_WMF updated the task description for T328742: Generate list of common misspellings from wiktionary.
Feb 16 2023, 6:50 PM · Research (FY2022-23-Research-April-June)
AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 6/2/23 - 12/2/23 Update:

  • Set up jupyter notebook (fix issues with getting spark3)
  • Get list of enwiktionary pages that use missplelling_of template using the following tables:
    • mediawiki_templatelinks, mediawiki_linktarget, mediawiki_wikitext_current
  • Parsed enwiktionary pages to get heading name (typically POS: Noun, Adj, etc), language of misspelling, and the correct spelling from the template
  • Some analysis on parsed wikis to get languauge and heading distribution
Feb 16 2023, 6:45 PM · Research (FY2022-23-Research-April-June)

Feb 9 2023

AKhatun_WMF added a comment to T328733: Requesting access to analytics-privatedata-users for Aisha Khatun.

Thank you, accessed!

Feb 9 2023, 5:41 PM · SRE, SRE-Access-Requests

Feb 6 2023

AKhatun_WMF added a comment to T328742: Generate list of common misspellings from wiktionary.

Week 1/2/23 - 5/2/23 Update:

  • Caught up on previous work on copy editing both in research team and growth team
  • Learned about templates in Wiktionary in different langauges and the possible categories they may be in
Feb 6 2023, 10:57 PM · Research (FY2022-23-Research-April-June)

Feb 3 2023

AKhatun_WMF updated the task description for T328733: Requesting access to analytics-privatedata-users for Aisha Khatun.
Feb 3 2023, 9:21 PM · SRE, SRE-Access-Requests

Jul 11 2022

AKhatun_WMF added a project to T279416: Deploy Image content filtration model for Wikimedia Commons: WMF-Inspiration-Week-2022-ML-Collab.
Jul 11 2022, 8:58 AM · WMF-Inspiration-Week-2022-ML-Collab, artificial-intelligence

Jul 8 2022

AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

In terms of the exact code causing this, spark is terrible at telling us exactly where but trying to infer from the SparkUI output i think it's this join:

def getTopSubgraphItems(topSubgraphs: DataFrame): DataFrame = {
  wikidataTriples
    .filter(s"predicate='<$p31>'")
    .selectExpr("object as subgraph", "subject as item")
    .join(topSubgraphs.select("subgraph"), Seq("subgraph"), "right")
Jul 8 2022, 5:47 AM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)

Jul 7 2022

AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

Update:
I tested a few options in the statbox, I am not sure how much this will represent the prod env, but here goes:

Jul 7 2022, 12:20 PM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF added a comment to T303831: Productionize Wikidata subgraph analysis.

the airflow patch is deployed but i only turned on *_init dags and subgraph_mapping_weekly today (ran out of time, will do rest tomorrow).

subgraph_mapping_weekly failed the first time through. I updated executor memory from 8g to 12g but the second execution is still failing. something is quite unbalanced about the topSubgraphItems, of the 8 shards they have inputs varying from 100MB to 450MB giving executions times of ~30s on the small ones and ~8m before the final one fails.

Not specifically related to this patch, but i wonder if we could change up the SparkUtils.saveTables method to somehow take parameters in the path to specify coalesce vs repartition and the number of partitions to save by, so we only have to update the airflow invocation and not the jar as well to test variations there.

Jul 7 2022, 7:41 AM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)

Jun 5 2022

AKhatun_WMF placed T271400: Collect analytics data such as pageview up for grabs.
Jun 5 2022, 12:56 PM · Abstract Wikipedia team

Mar 15 2022

AKhatun_WMF moved T303831: Productionize Wikidata subgraph analysis from Incoming to In Progress on the Discovery-Search (Current work) board.
Mar 15 2022, 2:10 PM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF moved T303831: Productionize Wikidata subgraph analysis from Incoming to Current work on the Wikidata-Query-Service board.
Mar 15 2022, 2:10 PM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF created T303831: Productionize Wikidata subgraph analysis.
Mar 15 2022, 2:08 PM · Patch-For-Review, Wikidata, Wikidata-Query-Service, Discovery-Search (Current work)
AKhatun_WMF placed T299921: Estimate benefits of splitting and federating Wikidata subgraphs up for grabs.
Mar 15 2022, 1:52 PM · Wikidata, Wikidata-Query-Service

Feb 10 2022

AKhatun_WMF updated the task description for T299453: Coordinate Wikimedia's participation in GSoC 2022 and Outreachy Round 24.
Feb 10 2022, 8:45 AM · Developer-Advocacy (Oct-Dec 2022), Outreachy (Round 24), Google-Summer-of-Code (2022)

Jan 31 2022

AKhatun_WMF moved T299921: Estimate benefits of splitting and federating Wikidata subgraphs from Analysis to Current work on the Wikidata-Query-Service board.
Jan 31 2022, 2:02 PM · Wikidata, Wikidata-Query-Service

Jan 20 2022

AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

The analysis is done here (for Q-ids): Wikidata_Item_ORES_Score_Analysis

Jan 20 2022, 3:24 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 18 2022

AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@AKhatun_WMF: You mention on the wiki that some Items don't have an ORES score. All Items should have one 😬 Do you have an example of one that does not?

Jan 18 2022, 5:44 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@AKhatun_WMF , sorry, it's been a while since I wrote this, but I think what I meant when I wrote the question about "optimal separation" is given some distribution of ORES scores (e.g. a normal distribution), is it clear what the threshold is for what qualifies as a "high" vs "low" score: e.g. anything over .75 is a high score. But that's assuming the scores are continuous. I guess it's moot if they're binary (I don't actually know).

If this isn't a sensible way of thinking about the issue, let me know if there's a better way.

Jan 18 2022, 3:16 PM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated subscribers of T288262: Estimate how many Wikidata items have low/no ORES score.

@MPhamWMF Hi, could you please clarify the question Is there an optimal separation between high/low ORES scores?. Separation in what respect? To my mind comes the separation of items with respect to the subgraph it is part of.

Jan 18 2022, 6:52 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 12 2022

AKhatun_WMF added a comment to T288262: Estimate how many Wikidata items have low/no ORES score.

@ACraze Indeed! I was confusing the models for revision (item quality) with edits (damaging/good faith). The latest revision is all I will need. Thank you!

Jan 12 2022, 4:02 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 10 2022

AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from Incoming to In Progress on the Discovery-Search (Current work) board.
Jan 10 2022, 7:28 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T288262: Estimate how many Wikidata items have low/no ORES score from Analysis to Current work on the Wikidata-Query-Service board.
Jan 10 2022, 7:28 AM · ORES, Machine-Learning-Team, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Jan 6 2022

AKhatun_WMF moved T288257: Get estimates for size of astronomical objects and queries in Wikidata graph from Incoming to Needs Reporting on the Discovery-Search (Current work) board.

Counts of queries and triples for astronomical objects were done here: Wikidata_Subgraph_Query_Analysis, along with the top ~300 large subgraphs.
For the specific case of Astronomical objects (and only astronomical objects), a list of all its subclasses was obtained and manually inspected for relevance to astronomical objects. The subclass list also consists of subclasses of subclasses and so on.

Jan 6 2022, 6:12 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

Details can be found here: Wikidata_Subgraph_Query_Analysis

Jan 6 2022, 5:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:45 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T288257: Get estimates for size of astronomical objects and queries in Wikidata graph from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:44 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Jan 6 2022, 5:39 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a project to T293631: Get estimates for splitting other large subgraphs from Wikidata: Discovery-Search (Current work).

With the completion of T293632 and T293636, this task is complete.

Jan 6 2022, 5:39 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Jan 6 2022, 5:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a project to T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: Discovery-Search (Current work).
Jan 6 2022, 5:37 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Jan 6 2022, 5:36 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from incoming to in progress on the Wikidata board.

With the completion of all subtasks, this task is complete.

Jan 6 2022, 5:35 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from In Progress to Needs Reporting on the Discovery-Search (Current work) board.

The analysis was completed and documented here: Wikidata_Subgraph_Query_Analysis

Jan 6 2022, 5:33 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 15 2021

AKhatun_WMF claimed T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake.
Nov 15 2021, 4:15 AM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, Product-Analytics

Nov 11 2021

AKhatun_WMF moved T258834: Create a Commons equivalent of the wikidata_entity table in the Data Lake from Analysis to Current work on the Wikidata-Query-Service board.
Nov 11 2021, 9:21 AM · Data-Engineering-Kanban, Patch-For-Review, Data-Engineering, Wikidata, Wikidata-Query-Service, Structured-Data-Backlog, Product-Analytics
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from Analysis to Current work on the Wikidata-Query-Service board.
Nov 11 2021, 9:21 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 9 2021

AKhatun_WMF moved T291205: Analysis: Property usage by items' P31 from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Nov 9 2021, 1:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a project to T291205: Analysis: Property usage by items' P31: Discovery-Search (Current work).

Some analysis was done here:

Nov 9 2021, 1:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Nov 9 2021, 1:07 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a comment to T293632: Analysis of large subgraphs in Wikidata.

The analysis was completed and documented here: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Subgraph_Analysis

Nov 9 2021, 1:06 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Nov 8 2021

AKhatun_WMF added a comment to T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure.
Nov 8 2021, 9:30 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T295188: Create aggregate list of potential Blazegraph data deletion sources in case of catastrophic failure.
Nov 8 2021, 8:48 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Oct 19 2021

AKhatun_WMF added a comment to T288264: Get estimates for all Wikidata statements of a specific datatype.

Basically Wikidata's Properties have a datatype.

Ah, datatype of properties.

I am not seeing that in the analysis you linked but maybe I am overlooking something.

The one I listed is for datatype of objects, so you didn't miss anything.
Thank you for clarifying! It should be fairly easy to find out as well :)

Oct 19 2021, 4:20 PM · Wikidata, Wikidata-Query-Service

Oct 18 2021

AKhatun_WMF updated subscribers of T288264: Get estimates for all Wikidata statements of a specific datatype.

@Lydia_Pintscher
Is this ticket asking for counts of various datatype used in WIkidata? Both URI and literals.
Does wikitech:User:AKhatun/Wikidata_Basic_Analysis#Object help?

Oct 18 2021, 5:11 PM · Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Analysis to Current work on the Wikidata-Query-Service board.
Oct 18 2021, 2:40 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:39 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293631: Get estimates for splitting other large subgraphs from Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293632: Analysis of large subgraphs in Wikidata from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF moved T293636: Identify and analyze queries that touch on various large subgraphs from Incoming to Analysis on the Wikidata-Query-Service board.
Oct 18 2021, 2:38 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293636: Identify and analyze queries that touch on various large subgraphs.
Oct 18 2021, 2:23 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF updated the task description for T293632: Analysis of large subgraphs in Wikidata.
Oct 18 2021, 2:20 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293632: Analysis of large subgraphs in Wikidata.
Oct 18 2021, 2:18 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293631: Get estimates for splitting other large subgraphs from Wikidata.
Oct 18 2021, 2:12 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF updated the task description for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 2:01 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF removed a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.
Oct 18 2021, 1:58 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a parent task for T288257: Get estimates for size of astronomical objects and queries in Wikidata graph: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Oct 18 2021, 1:58 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF removed a parent task for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T288257: Get estimates for size of astronomical objects and queries in Wikidata graph: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:58 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.
Oct 18 2021, 1:57 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:57 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T291205: Analysis: Property usage by items' P31.
Oct 18 2021, 1:54 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF added a parent task for T291205: Analysis: Property usage by items' P31: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:54 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a subtask for T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure: T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:52 PM · Epic, Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a parent task for T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata: T282790: [EPIC] Get estimates for dropping data from Wikidata in case of Blazegraph catastrophic failure.
Oct 18 2021, 1:52 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata
AKhatun_WMF created T293628: Get baseline measurements/expectations for splitting various subgraphs from Wikidata.
Oct 18 2021, 1:51 PM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata

Oct 4 2021

AKhatun_WMF added a comment to T292306: [DSE Hackathon] Sounds of the Commons: Neural Audio Mashups.

Interested in playing with autoencoders.

write a script that will randomly combine these audio files and sample the latent spaces of their combined embeddings to create new machine-generated audio files

Does this entail we train the autoencoder with the dataset we curated from commons and then have it generate a sample audio file from random numbers? Maybe I'm a bit confused about what 'randomly combining' audio files means here.

Oct 4 2021, 11:47 AM · Machine-Learning-Team

Sep 27 2021

AKhatun_WMF moved T291205: Analysis: Property usage by items' P31 from Analysis to Current work on the Wikidata-Query-Service board.
Sep 27 2021, 10:28 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF claimed T291205: Analysis: Property usage by items' P31.
Sep 27 2021, 10:27 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service

Sep 24 2021

AKhatun_WMF added a comment to T288257: Get estimates for size of astronomical objects and queries in Wikidata graph.

Astronomical objects are structured hierarchically and so not everything is direct instance of Q6999 (unlike scholarly articles).

Sep 24 2021, 12:08 PM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF updated the task description for T291205: Analysis: Property usage by items' P31.
Sep 24 2021, 11:56 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF moved T291190: Determine cost-benefit of doing vertical data slicing on WDQS from Incoming to Needs Reporting on the Discovery-Search (Current work) board.
Sep 24 2021, 11:44 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF edited projects for T291190: Determine cost-benefit of doing vertical data slicing on WDQS, added: Discovery-Search (Current work); removed Discovery-Search.
Sep 24 2021, 11:43 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF added a project to T291190: Determine cost-benefit of doing vertical data slicing on WDQS: Discovery-Search.
Sep 24 2021, 11:40 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF moved T291190: Determine cost-benefit of doing vertical data slicing on WDQS from Analysis to Current work on the Wikidata-Query-Service board.
Sep 24 2021, 11:32 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF added a comment to T291190: Determine cost-benefit of doing vertical data slicing on WDQS.

Query analysis report for some vertical slices of Wikidata: Wikidata_Vertical_Analysis#Query_Analysis
Summary: Wikidata_Vertical_Analysis#TL;DR

Sep 24 2021, 11:31 AM · Discovery-Search (Current work), Wikidata-Query-Service, Wikidata Analytics, Wikidata
AKhatun_WMF moved T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata from In Progress to Needs Reporting on the Discovery-Search (Current work) board.
Sep 24 2021, 11:25 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service
AKhatun_WMF added a comment to T281854: Get baseline measurements/expectations for splitting scholarly articles from Wikidata.

Here is the analysis done on scholarly articles in Wikidata and WDQS queries related to them: https://wikitech.wikimedia.org/wiki/User:AKhatun/Wikidata_Scholarly_Articles_Subgraph_Analysis

Sep 24 2021, 11:23 AM · Discovery-Search (Current work), Wikidata, Wikidata-Query-Service