Page MenuHomePhabricator

HAKSOAT
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Monday

  • Clear sailing ahead.

User Details

User Since
Nov 30 2019, 1:47 PM (65 w, 6 d)
Availability
Available
LDAP User
Unknown
MediaWiki User
HAKSOAT [ Global Accounts ]

Recent Activity

Sep 26 2020

HAKSOAT updated subscribers of T249382: Scale: ORES topic models for uk, hu, hy, eu, sr (needed as soon as available).

Yes. The models have been built and there was supposed to be a trial deployment. I think it was @Pavol86 that did the work of reducing memory footprint though.

Sep 26 2020, 12:00 AM · Machine-Learning-Team (Active Tasks), Serbian-Sites, Growth-Scaling, Growth-Team

Aug 11 2020

HAKSOAT added a comment to T259829: Topic classification: comprehensive comparison of ORES (text-based) models, Wikidata-based, and link-based models.

In the last week, I have tried understanding how the various APIs for ORES, Wikidata and Link based predictions work. I have also come up with a document containing possible structure the table to be created before analysis.

Aug 11 2020, 3:05 PM · Research

Jul 13 2020

HAKSOAT added a comment to T254356: [Spike] Implement script-optimized tokenization.

Discussion can bee found here: https://phabricator.wikimedia.org/T248480

Jul 13 2020, 4:47 PM · revscoring, Machine-Learning-Team, artificial-intelligence
HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

I decided to go with the non-cjk lexicon but with the regex for matching cjk text much higher up this list. This provided a balance in speed, so non-cjk lexicon doesn't run so slow on cjk text.

Jul 13 2020, 4:47 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Jun 12 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Pull request: https://github.com/halfak/deltas/pull/14

Jun 12 2020, 10:55 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search
HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Update: I ended up creating a lexicon to be used for cjk-dominant text and another to be used for non-cjk dominant text.

Jun 12 2020, 10:27 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

May 16 2020

HAKSOAT added a comment to T247000: Add features for English Language idioms to articlequality models.

I have been able to work on English idioms and excited to say that the speed has been improved by a large margin. Before now, we were simply piping together a bunch of English Idioms and this was quite inefficient.

May 16 2020, 2:22 PM · Machine-Learning-Team (Active Tasks), articlequality-modeling, artificial-intelligence

May 14 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

I checked the regex package. While it works great for the unicode scripts, it's reducing performance by about 40%. I checked the implementation of the library in other to see if I could get some inspiration and pull that into ours, but I see a lot of C code that I do not understand. Looks like I'll have to work with specifying the code ranges using the built-in re package instead.

May 14 2020, 4:09 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

May 11 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Thanks for linking to the regex package. I spent some time working with it and it is quite amazing. I think we can make use of it.

May 11 2020, 4:15 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

May 7 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

I opened a pull request for improving the regex used by wikitext_split on the deltas package.

May 7 2020, 7:45 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

May 6 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

In the last week, I have worked extensively on improving my understanding of regex engines and how to write optimized regular expressions.

May 6 2020, 12:56 AM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Apr 28 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Thanks for this. I currently don't know a lot of Java, just Python and PHP. So I'll need as much help as I can get.

Apr 28 2020, 6:36 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search
HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Thanks for this @TJones . Yes, a lot of the time spent was due to the overhead calling the API. I don't think combining 100 Alan Turings will help though as that was just to profile performance. In the real use case, we will actually only be tokenizing a single article (or a single document depending on the scenario). Hence, we can't combine at that point.

Apr 28 2020, 6:05 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Apr 24 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

I am currently reading up different resources on writing efficient regex so I can figure out possible improvements to the regex in use.

Apr 24 2020, 2:54 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search
HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

I have worked to convert the regex from being Python to Java compatible as seen here.

Apr 24 2020, 2:53 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Apr 6 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

Great. Overall, not as bad as initially thought.

Apr 6 2020, 4:41 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Apr 2 2020

HAKSOAT added a comment to T248480: Improve the performance and quality of tokenization in revscoring.

@Halfak Any update on this? The profiling script.

Apr 2 2020, 10:22 PM · Machine-Learning-Team (Active Tasks), Elasticsearch, revscoring, artificial-intelligence, Discovery-Search

Mar 5 2020

HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

I think its valuable too. I hope the performance didn't drop either?

Mar 5 2020, 6:52 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

Feb 14 2020

HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

Hello @Halfak I think this task requires PRs on the revscoring and articlequality repos.

Feb 14 2020, 1:24 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

Feb 11 2020

HAKSOAT claimed T180822: Improve ORES articlequality feature extraction for images.
Feb 11 2020, 8:01 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

@Halfak I'll like to assign myself this task. Can I go ahead?

Feb 11 2020, 7:58 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling

Jan 3 2020

HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

@Halfak Please take a look at my Pull Request: https://github.com/wikimedia/revscoring/pull/466

Jan 3 2020, 9:14 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring
HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

Thanks for this

Jan 3 2020, 6:02 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring

Dec 27 2019

HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

Hello @Halfak I hope you are having a good time this festive season. So I'm about to parse the text here: https://en.wiktionary.org/wiki/Category:English_idioms I'd normally use the requests and beautifulsoup combo. But I believe there's a tool that does this already. I tried importing pywikibot, but it looks like it needs some initial user configurations. Is there any other method of doing this? Is there a means to use pywikibot for this purpose that I'm not aware of yet?

Dec 27 2019, 4:46 PM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring

Dec 24 2019

HAKSOAT claimed T205545: Add English Language idioms to revscoring.
Dec 24 2019, 2:20 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring
HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

Hello @Halfak I'd like to claim this task.

Dec 24 2019, 2:19 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring

Dec 13 2019

HAKSOAT added a comment to T215273: Allow coordinators to see unavailable partners they are assigned to.

Thanks @AVasanth_WMF The tests now pass

Dec 13 2019, 8:37 AM · good first task, Library-Card-Platform

Dec 12 2019

HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

Oh. I think "test" is the wrong word here. I meant "run". So how do I run the code and see my changes in action?

Dec 12 2019, 11:53 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT added a comment to T203163: Fix List Applications overlapping statistics for waitlisted partners with short description panels.

Ubuntu 19.10

Dec 12 2019, 10:44 PM · good first task, Library-Card-Platform
HAKSOAT added a comment to T215273: Allow coordinators to see unavailable partners they are assigned to.

The build failed on Travis:

Dec 12 2019, 10:41 PM · good first task, Library-Card-Platform
HAKSOAT added a comment to T215273: Allow coordinators to see unavailable partners they are assigned to.

Hello @Samwalton9 I just made a PR:

Dec 12 2019, 10:37 PM · good first task, Library-Card-Platform

Dec 11 2019

HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

Great. It's all coming together in my head now. How do I get to test my code changes though, to ensure that they work. I'm about to assign myself this task and commence.

Dec 11 2019, 8:27 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

So, I'm considering adding a function to that module that fetches the idioms using mwparserfromhell and returns them probably as a list. What do you think of this approach?

Dec 11 2019, 10:26 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring
HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

I saw this https://en.wikipedia.org/wiki/Wikipedia:Extended_image_syntax I think it has all of the extended Wiki markups for images.

Dec 11 2019, 10:18 AM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT added a comment to T205545: Add English Language idioms to revscoring.

Thanks for the pointers @Halfak I have joined the channel on IRC. I'll look at the pointers and get back to you.

Dec 11 2019, 8:52 AM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring
HAKSOAT added a comment to T203163: Fix List Applications overlapping statistics for waitlisted partners with short description panels.

Starts here

Dec 11 2019, 7:23 AM · good first task, Library-Card-Platform
HAKSOAT added a comment to T203163: Fix List Applications overlapping statistics for waitlisted partners with short description panels.

Thanks @AVasanth_WMF for the pointer. I'm trying to run docker-compose build && docker-compose up on my computer, but it comes up with errors and a very long traceback.

Dec 11 2019, 7:22 AM · good first task, Library-Card-Platform

Dec 10 2019

HAKSOAT added a comment to T180822: Improve ORES articlequality feature extraction for images.

I can do some research on that. Thanks for the swift reply @Halfak I'll get back to you.

Dec 10 2019, 10:16 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT updated subscribers of T205545: Add English Language idioms to revscoring.

Hello @Harej Is it possible for me to get more guidance for this task?

Dec 10 2019, 10:04 PM · Machine-Learning-Team (Active Tasks), good first task, artificial-intelligence, articlequality-modeling, editquality-modeling, revscoring
HAKSOAT updated subscribers of T180822: Improve ORES articlequality feature extraction for images.

Hello @Harej This is something I'll like to work on.

Dec 10 2019, 9:45 PM · Machine-Learning-Team (Active Tasks), artificial-intelligence, articlequality-modeling
HAKSOAT claimed T215273: Allow coordinators to see unavailable partners they are assigned to.
Dec 10 2019, 9:26 PM · good first task, Library-Card-Platform
HAKSOAT added a comment to T203163: Fix List Applications overlapping statistics for waitlisted partners with short description panels.

Hello @Samwalton9 I'll like to work on this task. But I'm finding it a bit difficult figuring out the file that causes this behavior. I'll appreciate some help.

Dec 10 2019, 9:17 PM · good first task, Library-Card-Platform

Dec 7 2019

HAKSOAT added a comment to T234133: Pywikibot-redirect ignoring -namespace.

@Dvorapa I realized that the argument for namespace doesn't even send to redirect. I made use of the example python pwb.py redirect double -namespace:6

Dec 7 2019, 10:19 AM · Pywikibot, Pywikibot-redirect.py
HAKSOAT added a comment to T234133: Pywikibot-redirect ignoring -namespace.
Dec 7 2019, 7:15 AM · Pywikibot, Pywikibot-redirect.py

Dec 4 2019

HAKSOAT added a comment to T236614: Page.title(as_filename=True) don't remove "\"" (quotes) forbidden character.

Thanks

Dec 4 2019, 11:38 AM · Patch-For-Review, Pywikibot
HAKSOAT added a comment to T234133: Pywikibot-redirect ignoring -namespace.

Thanks @Dvorapa I've spent some time first checking another script, then the redirect script. I see that the script has the ability to pass namespaces to RedirectGenerator. But the value is not being extracted from the arguments as expected. So I'm to get it to extract and pass to RedirectGenerator. Am I thinking in the right direction?

Dec 4 2019, 11:14 AM · Pywikibot, Pywikibot-redirect.py

Dec 3 2019

HAKSOAT added a comment to T234133: Pywikibot-redirect ignoring -namespace.

Hello @Dvorapa I am new here. Can I work on this? Though I do not fully understand the problem yet.

Dec 3 2019, 9:35 PM · Pywikibot, Pywikibot-redirect.py
HAKSOAT added a comment to T236614: Page.title(as_filename=True) don't remove "\"" (quotes) forbidden character.

I'd like to work on this.

Dec 3 2019, 7:10 PM · Patch-For-Review, Pywikibot