⚓ T274787 Create web service for providing results of analysis work

LostEnchanter triaged this task as High priority.Feb 15 2021, 1:45 PM

LostEnchanter moved this task from To triage to Data Science work on the Abstract Wikipedia team board.

LostEnchanter updated the task description. (Show Details)Feb 16 2021, 9:24 PM

Today was the second day of fighting with rolling out of production version, but I believe I finally fixed everything for the current version, so the main functionality is working - you can test it at https://abstract-wiki-ds.toolforge.org/
I'm planning onto fixing css tomorrow - and maybe adding language filter, which for now is left behind, as it takes too long to check all the entries.

Onto filter issue - @tanny411 , there's a filter_families_with_linkage function in scores_retrieval.py , which uses the approach we discussed today - uses dataframe with info database-family-language from meta database and checks project family using it. Somehow the code in this function, when I tested locally, took more than 5 times longer compairing to the current implementation. If you have any idea, if we can make it faster, please notify! I'm not that good with dataframes so I might use something improperly.

Hi @LostEnchanter

So I couldnt test it because I couldn't find where you populated the linked_df dataframe, sorry about that. But this snippet should be enough:
Get a list of all dbs with the chosen families:
dbs = linkage_df[linkage_df['family'].isin(chosen_families_list)]['database']
Filter the scores dataframe with the retrieved dbs list:
df = df[df['dbname']].isin(dbs)

You can do the same thing for language as well in the exact same way. For example:

dbs  = linkage_df[ (linkage_df['family'].isin(chosen_families_list))  \
                && (linkage_df['language'].isin(chosen_language_list)) ]['database']

A couple of things, some I think you may already have in your list or some improvement for later.

site title
the vue logo in title.
link to the actual module page (maybe when loading its source, the page title could be the actual link)
list of important modules can be listed as a table with their scores and other data (features) as well. Maybe some useful information for the user to assess.
pagination in list of important modules
pagination in similar modules list (showing nearby clusters, as we discussed)

Question: Are we not including the special wikis? Instead of having a list of hardcoded families, maybe you can use the meta_table acquired data to get a list of all families as well as language and display them. To avoid repeated calls, just saving them in a array should work when app initializes.
You will understand the app performance issues better, so I'm leaving the decision to you, let me know where I can jump in.

In T274787#6873628, @tanny411 wrote:

Instead of having a list of hardcoded families, maybe you can use the meta_table acquired data to get a list of all families as well as language and display them. To avoid repeated calls, just saving them in a array should work when app initializes.
You will understand the app performance issues better, so I'm leaving the decision to you, let me know where I can jump in.

This one is easily doable!

I'll test how these dataframe filters perform, thanks for the input.

@tanny411 Thanks for the help with Dataframes, they are way faster when used like that.

Update with filtering by language is live.

@LostEnchanter Glad I could help! And really great work!

@LostEnchanter: Removing task assignee as this open task has been assigned for more than two years - See the email sent to task assignee on Feburary 22nd, 2023.
Please assign this task to yourself again if you still realistically [plan to] work on this task - it would be welcome! :)
If this task has been resolved in the meantime, or should not be worked on by anybody ("declined"), please update its task status via "Add Action… 🡒 Change Status".
Also see https://www.mediawiki.org/wiki/Bug_management/Assignee_cleanup for tips how to best manage your individual work in Phabricator. Thanks!

Closing this down with a Declined, to indicate no further work to be done.

editors_count	x1
edits_count	x2
impacted_pageviews_count	x3
langlinks_count	x4
transclude_count	x5
similars_count	x6
closeness_to_50_lines_of_code	x7
edits_count_to_editors_count_ratio	x8
page_links_count	x9

Create web service for providing results of analysis work
Closed, DeclinedPublic
Actions

Description

Event Timeline

Create web service for providing results of analysis work Closed, DeclinedPublicActions

Description

Event Timeline

Create web service for providing results of analysis work
Closed, DeclinedPublic
Actions