Page MenuHomePhabricator

tanny411 (Aisha Khatun)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Oct 11 2020, 3:11 AM (26 w, 5 d)
Availability
Available
LDAP User
Aisha Khatun
MediaWiki User
Aisha Khatun [ Global Accounts ]

Recent Activity

Mon, Apr 5

tanny411 closed T263678: Analyze community authored functions that build Wikipedia infoboxes and more as Resolved.
Mon, Apr 5, 4:22 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 updated subscribers of T263678: Analyze community authored functions that build Wikipedia infoboxes and more.

@dr0ptp4kt @LostEnchanter
I am closing this task. Some subtasks are pending, should we move these to a new task?

Mon, Apr 5, 4:21 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)

Mar 8 2021

tanny411 updated the task description for T271400: Collect analytics data such as pageview.
Mar 8 2021, 4:56 AM · Abstract Wikipedia team

Mar 7 2021

tanny411 updated the task description for T270827: Detect similarity in Lua sourcecodes.
Mar 7 2021, 1:59 PM · Abstract Wikipedia team
tanny411 added a comment to T273767: Detect "data" modules .

Should this be closed? @LostEnchanter

Mar 7 2021, 1:51 PM · Abstract Wikipedia team

Mar 2 2021

tanny411 added a comment to T274787: Create web service for providing results of analysis work .

@LostEnchanter Glad I could help! And really great work!

Mar 2 2021, 11:20 AM · Abstract Wikipedia team
tanny411 added a comment to T274787: Create web service for providing results of analysis work .

So I couldnt test it because I couldn't find where you populated the linked_df dataframe, sorry about that. But this snippet should be enough:
Get a list of all dbs with the chosen families:
dbs = linkage_df[linkage_df['family'].isin(chosen_families_list)]['database']
Filter the scores dataframe with the retrieved dbs list:
df = df[df['dbname']].isin(dbs)

Mar 2 2021, 8:37 AM · Abstract Wikipedia team

Feb 26 2021

tanny411 added a comment to T270827: Detect similarity in Lua sourcecodes.

I've updated the pdf to include noise removal and tuning analysis. This should be it for now regarding similarity analysis. Feel free to send feedback.

Feb 26 2021, 11:37 AM · Abstract Wikipedia team
tanny411 closed T270827: Detect similarity in Lua sourcecodes as Resolved.
Feb 26 2021, 11:35 AM · Abstract Wikipedia team
tanny411 closed T270827: Detect similarity in Lua sourcecodes, a subtask of T263678: Analyze community authored functions that build Wikipedia infoboxes and more, as Resolved.
Feb 26 2021, 11:35 AM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)

Feb 23 2021

tanny411 updated subscribers of T270827: Detect similarity in Lua sourcecodes.

@LostEnchanter @gengh @dr0ptp4kt
I've attached my analysis and procedures so far. Some more tasks are added as todo. Tuning takes some time due to longer clustering time but good thing there's not much to tune.

Feb 23 2021, 7:09 AM · Abstract Wikipedia team
tanny411 updated the task description for T270827: Detect similarity in Lua sourcecodes.
Feb 23 2021, 7:07 AM · Abstract Wikipedia team

Feb 17 2021

tanny411 updated subscribers of T272003: Analysis of data collected from databases to identify priority modules.

I shared a short doc on how the scoring metric works, maybe something we can incorporate into our final report too.

Feb 17 2021, 12:58 PM · Abstract Wikipedia team
tanny411 closed T272003: Analysis of data collected from databases to identify priority modules, a subtask of T263678: Analyze community authored functions that build Wikipedia infoboxes and more, as Resolved.
Feb 17 2021, 12:15 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 closed T272003: Analysis of data collected from databases to identify priority modules as Resolved.
Feb 17 2021, 12:14 PM · Abstract Wikipedia team

Feb 10 2021

tanny411 updated the task description for T270827: Detect similarity in Lua sourcecodes.
Feb 10 2021, 8:06 AM · Abstract Wikipedia team
tanny411 renamed T270827: Detect similarity in Lua sourcecodes from [Abstract Wikipedia data science] Develop algorithms to detect copies in Lua sourcecodes to Detect similarity in Lua sourcecodes.
Feb 10 2021, 8:06 AM · Abstract Wikipedia team
tanny411 updated the task description for T272003: Analysis of data collected from databases to identify priority modules.
Feb 10 2021, 8:05 AM · Abstract Wikipedia team

Feb 3 2021

tanny411 added a comment to T272003: Analysis of data collected from databases to identify priority modules.

Thanks a lot, @Quiddity. These actually help!

Feb 3 2021, 4:36 AM · Abstract Wikipedia team

Feb 2 2021

tanny411 added a comment to T272003: Analysis of data collected from databases to identify priority modules.

@Quiddity, @dr0ptp4kt, and others, we do need some help determining which features to give importance to, to identify important modules. Especially on how to combine the gathered stats on various data. Some questions I had specifically were:

Feb 2 2021, 11:26 AM · Abstract Wikipedia team
tanny411 added a comment to T273000: Change database access code to work with replicas redesign.

@LostEnchanter Thats great! I believe we connect to the shards locally, and in our scripts we match and connect to appropriate shard? I am not sure what we ask toolforge library devs since when we connect locally we use pymysql anyways. I see they have made some changes wrt this recently, those changes may be worth a look.
Storing in Sources table is brilliant if we want to do it ourselves!

Feb 2 2021, 6:04 AM · Abstract Wikipedia team
tanny411 added a comment to T272003: Analysis of data collected from databases to identify priority modules.

@Quiddity Thanks a lot. Your finds match with mine about the Scribunto vs wikitext types and I've checked from language links table as well, enwiki is not connected to trwiki indeed!

Feb 2 2021, 5:53 AM · Abstract Wikipedia team

Jan 29 2021

tanny411 closed T270492: Collect relevant data about the Modules for analysis as Resolved.
Jan 29 2021, 4:19 PM · Abstract Wikipedia team
tanny411 closed T270492: Collect relevant data about the Modules for analysis, a subtask of T263678: Analyze community authored functions that build Wikipedia infoboxes and more, as Resolved.
Jan 29 2021, 4:19 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 updated the task description for T272822: Mysql connection lost during query from toolforge.
Jan 29 2021, 4:19 PM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 closed T272822: Mysql connection lost during query from toolforge, a subtask of T270492: Collect relevant data about the Modules for analysis, as Resolved.
Jan 29 2021, 4:18 PM · Abstract Wikipedia team
tanny411 closed T272822: Mysql connection lost during query from toolforge as Resolved.
Jan 29 2021, 4:18 PM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 added a comment to T272822: Mysql connection lost during query from toolforge.

Thanks all, this issue is now resolved!

Jan 29 2021, 4:16 PM · Data-Services, Toolforge, Abstract Wikipedia team

Jan 28 2021

tanny411 added a comment to T272822: Mysql connection lost during query from toolforge.

Thanks @bd808 that might be it. We used analytics when we connected locally.

Jan 28 2021, 4:51 PM · Data-Services, Toolforge, Abstract Wikipedia team

Jan 27 2021

tanny411 updated the task description for T272003: Analysis of data collected from databases to identify priority modules.
Jan 27 2021, 2:45 PM · Abstract Wikipedia team
tanny411 updated the task description for T272822: Mysql connection lost during query from toolforge.
Jan 27 2021, 12:55 PM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 set Final Story Points to on T272822: Mysql connection lost during query from toolforge.
Jan 27 2021, 12:55 PM · Data-Services, Toolforge, Abstract Wikipedia team

Jan 26 2021

tanny411 added a comment to T272003: Analysis of data collected from databases to identify priority modules.

@tanny411 So, yes, my logic is something like that: they are all different, and it looks like all of them have "?" in title. Can we drop them, or there's something I miss?

Jan 26 2021, 4:48 PM · Abstract Wikipedia team
tanny411 added a comment to T272003: Analysis of data collected from databases to identify priority modules.

@LostEnchanter Yes those with *??* create duplicate titles although actually, they are not the same, these are some alphabets or symbols that I couldn't get rendered anywhere (web or notebook).
Also if you were able to find out certain groups/clusters of pages that go together (like pronunciation modules) then maybe we can find modules similar to them and start reducing our data for further analysis.

Jan 26 2021, 4:42 PM · Abstract Wikipedia team
tanny411 updated the task description for T272822: Mysql connection lost during query from toolforge.
Jan 26 2021, 5:12 AM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 updated the task description for T272822: Mysql connection lost during query from toolforge.
Jan 26 2021, 5:12 AM · Data-Services, Toolforge, Abstract Wikipedia team

Jan 25 2021

tanny411 triaged T272822: Mysql connection lost during query from toolforge as High priority.
Jan 25 2021, 7:12 AM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 updated the task description for T272822: Mysql connection lost during query from toolforge.
Jan 25 2021, 7:11 AM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 added a comment to T272822: Mysql connection lost during query from toolforge.

@dr0ptp4kt @LostEnchanter
Hi, I described as much as possible, let me know if more information could be useful in debugging this situation.

Jan 25 2021, 7:09 AM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 created T272822: Mysql connection lost during query from toolforge.
Jan 25 2021, 7:08 AM · Data-Services, Toolforge, Abstract Wikipedia team
tanny411 updated the task description for T271400: Collect analytics data such as pageview.
Jan 25 2021, 5:59 AM · Abstract Wikipedia team
tanny411 renamed T270492: Collect relevant data about the Modules for analysis from [Abstract Wikipedia data science] Collect relevant data about the Modules for analysis to Collect relevant data about the Modules for analysis.
Jan 25 2021, 5:52 AM · Abstract Wikipedia team

Jan 21 2021

tanny411 added a comment to T272523: Early testing of the new Wiki Replicas multi-instance architecture.

Hi, we would love to test with our code (T263678). We already do connect to databases we need explicitly and we don't have any inter-wiki joins, so that's good. Although when working locally, we connect with meta and use all other dbs as required because connecting to *all* the dbs with SSH is quite a hassle. I believe this shortcut will not work anymore? I think we need to handle this hassle with mappings.

Jan 21 2021, 4:59 AM · Developer-Advocacy (Apr-Jun 2021), Data-Services, cloud-services-team (Kanban)

Jan 16 2021

tanny411 added a comment to T271957: Transclusion in Lua modules might not always show up.

Interesting! Does that mean the templatelinks table is updated only when a module is actually being 'used'?

Jan 16 2021, 5:51 AM · MediaWiki-Templates

Jan 14 2021

tanny411 created T272003: Analysis of data collected from databases to identify priority modules.
Jan 14 2021, 6:18 AM · Abstract Wikipedia team

Jan 13 2021

tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Jan 13 2021, 11:58 AM · Abstract Wikipedia team

Jan 12 2021

tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

Indeed tl_from is a unique value as it is the page_id. SImilarly there were other instances where I could remove the DISTINCT. Still to get a number on the improvement of time, but there will not be any data loss, that's for sure.

Jan 12 2021, 12:24 PM · Abstract Wikipedia team

Jan 11 2021

tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

Hi, thanks for the feedback! I was working on solving some issues with the pageviews, I am going to try out your suggestion for the templatelinks table soon today.

Jan 11 2021, 6:55 AM · Abstract Wikipedia team

Jan 8 2021

tanny411 updated the task description for T271400: Collect analytics data such as pageview.
Jan 8 2021, 8:31 AM · Abstract Wikipedia team

Jan 7 2021

tanny411 updated the task description for T271400: Collect analytics data such as pageview.
Jan 7 2021, 12:49 PM · Abstract Wikipedia team
tanny411 moved T271400: Collect analytics data such as pageview from To triage to Data Science work on the Abstract Wikipedia team board.
Jan 7 2021, 9:24 AM · Abstract Wikipedia team
tanny411 triaged T271400: Collect analytics data such as pageview as High priority.
Jan 7 2021, 9:23 AM · Abstract Wikipedia team
tanny411 created T271400: Collect analytics data such as pageview.
Jan 7 2021, 9:22 AM · Abstract Wikipedia team
tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

@dr0ptp4kt It seems due to the running jobs toolforge has gotten super slow. It's really hard to continue working on other things from toolforge, should I stop the jobs for now? (although they have been running for a long time). Debating myself.

Jan 7 2021, 7:21 AM · Abstract Wikipedia team

Jan 6 2021

tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

Hi @dr0ptp4kt,
So the issues I've been having involve the iwlinks table (code here) and templatelinks table (code here).

Jan 6 2021, 12:31 PM · Abstract Wikipedia team
tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Jan 6 2021, 12:15 PM · Abstract Wikipedia team
tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Jan 6 2021, 6:06 AM · Abstract Wikipedia team

Jan 5 2021

tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Jan 5 2021, 6:02 AM · Abstract Wikipedia team

Dec 31 2020

tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

My idea was that some pages are highly protected and this may mean they are important modules (therefore also used in a lot of places). Those can be prioritized to be centralized.

Dec 31 2020, 3:42 PM · Abstract Wikipedia team
tanny411 added a comment to T270492: Collect relevant data about the Modules for analysis.

@LostEnchanter Hi, I spent couple of days going through the entire database layout and extracting as much information as I found relevant. I have listed them all out. Next, I will be going through how to get pageview information as those are not in the database and then will start storing all info in user database.

Dec 31 2020, 1:33 PM · Abstract Wikipedia team
tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Dec 31 2020, 1:28 PM · Abstract Wikipedia team

Dec 26 2020

tanny411 closed T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents, a subtask of T263678: Analyze community authored functions that build Wikipedia infoboxes and more, as Resolved.
Dec 26 2020, 6:50 AM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 closed T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents as Resolved.
Dec 26 2020, 6:50 AM · Abstract Wikipedia team
tanny411 updated the task description for T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.
Dec 26 2020, 6:47 AM · Abstract Wikipedia team
tanny411 added a comment to T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.

After clearing and scrutinizing the data more, here is the summary (taking only from ns 828 and Scribunto modules):

  • 118 pages from DB not found from API allpages list. Of them 2 are actual scribunto modules and so loaded into our DB. Rest are not Scribunto modules although DB says so. Ignored.
  • 98 pages found from API but not in DB.
Dec 26 2020, 6:25 AM · Abstract Wikipedia team

Dec 25 2020

tanny411 claimed T270492: Collect relevant data about the Modules for analysis.
Dec 25 2020, 12:01 PM · Abstract Wikipedia team
tanny411 added a comment to T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.

Couple of confusion I ran into:

Dec 25 2020, 9:17 AM · Abstract Wikipedia team

Dec 24 2020

tanny411 updated the task description for T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.
Dec 24 2020, 6:43 PM · Abstract Wikipedia team

Dec 23 2020

tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Dec 23 2020, 10:34 AM · Abstract Wikipedia team
tanny411 updated the task description for T270492: Collect relevant data about the Modules for analysis.
Dec 23 2020, 10:33 AM · Abstract Wikipedia team
tanny411 added a comment to T270500: [Abstract Wikipedia data science] Move data storage to database which can be accessed from outside of Toolforge.

We could use dbname but that wasnt not save from the content fetcher. When loading from database I guess that wont matter, so yes, we can use dbname for sure.

Dec 23 2020, 1:09 AM · Abstract Wikipedia team

Dec 21 2020

tanny411 added a comment to T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.

Can you please additionally describe, what do you mean by 'length' there? Amount of symbols in Lua sourcecode?

Dec 21 2020, 2:43 PM · Abstract Wikipedia team

Dec 20 2020

tanny411 added a comment to T270494: [Abstract Wikipedia data science] Create scripts to fetch Module contents.

I've tried to compare pages collected by API and db(id and titles only) by ids. Had to go through a LOT of memory errors to run this script.
This is the output:

Number of db pages: 275154
Number of api pages: 274543
Number of unique pages in db: 740 # pages not found from API calls
Number of unique pages in api: 129 # pages not listed from db queries
Ok

It seems there are some discrepancies. I am looking into what these files are and if there's any pattern here.

Dec 20 2020, 10:46 AM · Abstract Wikipedia team

Dec 16 2020

tanny411 added a project to T263678: Analyze community authored functions that build Wikipedia infoboxes and more: Abstract Wikipedia team.
Dec 16 2020, 12:33 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)

Oct 30 2020

tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Yes, exactly.

Oct 30 2020, 5:49 AM · Outreachy (Round 21)
tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@SafiaKhaleel Yes. after recording a contribution you should submit a final application. Thats where you will be asked to write your prospective timeline of the project.

Oct 30 2020, 3:46 AM · Outreachy (Round 21)

Oct 23 2020

tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

@Tambe Alternately you could:

Oct 23 2020, 2:18 PM · Outreachy (Round 21)

Oct 14 2020

tanny411 added a comment to T263678: Analyze community authored functions that build Wikipedia infoboxes and more.

Thanks @dr0ptp4kt. I was woking with the revision API where I wanted to get content for all the pages using a generator. But the API doesn't seem to return revision content for most pages.
Plus I wanted to get only the lastest revision content, but that seems to be possible only for single page queries. A little help here.

Oct 14 2020, 5:58 PM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 added a comment to T263678: Analyze community authored functions that build Wikipedia infoboxes and more.

Hi, I am an outreachy applicant and interested in joining this project. I went through the task and I will get started with it right away.
Just need a little clarification, are we all going to solve the same task or are there other I have to look at?
Thanks :)

Oct 14 2020, 8:28 AM · Abstract Wikipedia team, Outreach-Programs-Projects, Outreachy (Round 21)
tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Thanks @MelanieImf. @Isaac thanks for addressing the memory issue. It seems I had a bug in the code that cause high memory usage. Fixing that fixed the issue so I removed the comment. 😃

Oct 14 2020, 12:31 AM · Outreachy (Round 21)

Oct 13 2020

tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.

Hi, regarding comparison of dump and API data, should we compare all data or 10 randomly selected ones. Just to be sure if the API will support calling for lots of page ids.
Thanks

Oct 13 2020, 11:09 AM · Outreachy (Round 21)
tanny411 added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

Thanks @Miriam, makes sense :D

Oct 13 2020, 10:30 AM · Outreachy (Round 21), Outreach-Programs-Projects
tanny411 added a comment to T263874: Outreachy Application Task: Tutorial for Wikipedia Page Protection Data.
Oct 13 2020, 10:15 AM · Outreachy (Round 21)
tanny411 added a comment to T263646: Develop an approach to infer which countries are associated with a given Wikipedia article.

Hi everyone, I am an outreachy applicant and super excited to get on board and start contributing!

Oct 13 2020, 4:10 AM · Outreachy (Round 21), Outreach-Programs-Projects
tanny411 added a comment to T263860: Outreachy Project: Create Machine Learning datasets to measure content reliability on Wikipedia..

Hi @Miriam, I am an Outreachy applicant, excited to be a part of this project. Are there any additional steps before I start with the notebooks?
Also I see T263874 is the same for both inferring country and this project. Can be given some clarification as to which project I will be working with on completing the subtask? Or are they part of the same project?

Oct 13 2020, 4:10 AM · Outreachy (Round 21), Outreach-Programs-Projects