Aitolkyn (Aitolkyn)
User

Projects

User does not belong to any projects.

Calendar

User Details

User Since: Apr 10 2021, 8:11 AM (158 w, 3 d)
Availability: Available
LDAP User: Aitolkyn
MediaWiki User: Unknown

Recent Activity
View All

Fri, Apr 19

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

15/04 - 21/04:
Ref.need experiments with different models (refer to slide#10 for results)
Ref.need feature exploration, e.g., num_words, section_name (refer to slide#8), below plots for Logistic Regression feature importance for positive/negative labels

Screenshot 2024-04-19 at 10.32.30 PM.png (944×938 px, 65 KB)

Ref.risk analysis of URL permanence to reliability label, example below:

Screenshot 2024-04-19 at 10.45.26 PM.png (888×1 px, 175 KB)

Fri, Apr 19, 5:47 PM · Research (FY2023-24-Research-April-June)

Tue, Apr 16

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

In T362533#9716944, @Urbanecm_WMF wrote:
In T362533#9714116, @Aitolkyn wrote:
In T362533#9713681, @ssingh wrote:
In T362533#9713602, @Aitolkyn wrote:
@ssingh Thank you for checking! I get the following error when trying to access my tables:
mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`
Maybe I need to set anything before I can access those?
Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.
The table name is "aitolkyn.domain_reverted_added_2019_2023" and my user name is "aitolkyn". I am running
SELECT *
FROM   aitolkyn.domain_reverted_added_2019_2023
The error message suggests you're trying to run that query in the analytics MariaDB replicas (using analytics-mysql or directly via mysql). This database only exists in Hive:
[urbanecm@stat1005 ~]$ hive --database=aitolkyn
[...]
hive (aitolkyn)> show tables;
OK
[...]
domain_reverted_added_2019_2023
[...]
Can you try querying for the data using eg. Hive (see https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Systems/Hive)? Or when in Superset, using the presto_analytics_hive database?

Tue, Apr 16, 11:19 AM · Patch-For-Review, SRE, LDAP-Access-Requests

Mon, Apr 15

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

In T362533#9713681, @ssingh wrote:
In T362533#9713602, @Aitolkyn wrote:
@ssingh Thank you for checking! I get the following error when trying to access my tables:
mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`
Maybe I need to set anything before I can access those?
Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.

Mon, Apr 15, 3:46 PM · Patch-For-Review, SRE, LDAP-Access-Requests

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

Mon, Apr 15, 2:26 PM · Patch-For-Review, SRE, LDAP-Access-Requests

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

08/04 - 14/04:

ref.need experiments with BERT are continued with additional data cleaning, facing overfitting issues, experimenting on a sample of data following prev. work approach
ref.risk extending data with URL permanence in terms of (lifespan of a URL on a page and number of edits)
ref.risk analysis of ground-truth labels from enwiki perennial sources list

Mon, Apr 15, 12:36 PM · Research (FY2023-24-Research-April-June)

Fri, Apr 5

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

01/04 - 07/04:

ref.need using the dataset we prepared, started experimenting with BERT for sentence classification into cited/uncited in enwiki

Fri, Apr 5, 1:42 PM · Research (FY2023-24-Research-April-June)

Sat, Mar 30

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

25/03 - 31/03:

ref.need running baseline models for classification using 1) tf-idf, 2) sentence metadata, + logistic regression
planning to run with LLMs next, studying hugging face NLP tutorials

Sat, Mar 30, 9:43 AM · Research (FY2023-24-Research-April-June)

Mar 22 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

18/03-22/03:
ref.need

found and fixed an issue with tokenized sentences -> updated the dataset
Data: ~16M featured articles and ~4.4M extracted sentence (47% accompanied by a citation)
brainstorming and planning for the classifier model

Mar 22 2024, 2:29 PM · Research (FY2023-24-Research-April-June)

Mar 15 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

11/03-15/03:
ref.need

extend dataset with sentences from FAs of dewiki, frwiki, ruwiki, ptwiki, eswiki
proportion of cited sentences among all ranges from 40-49% for enwiki, ruwiki, ptwiki, frwiki, is lower for eswiki (~35%) and lowest for dewiki (~20%)
notebook available here: https://gitlab.wikimedia.org/repos/research/reference-quality/-/blob/research-notebooks/RN/extract-FAs.ipynb?ref_type=heads

Mar 15 2024, 8:31 AM · Research (FY2023-24-Research-April-June)

Mar 8 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

04/03 - 08/03:

ref.need - add featured articles from dewiki, frwiki, ruwiki, ptwiki, eswiki (choice is based on the number of currently existing FAs)
ref.need - [for enwiki] prepare dataset for the model with the following columns:

page_id, revision_id, section_name, sentence, paragraph, citation_label

where citation_label = 0, if sentence does not include a reference,
citation_label = 1, if sentence includes a reference

Mar 8 2024, 2:43 PM · Research (FY2023-24-Research-April-June)

Mar 1 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

26/02 - 01/03:

onboarding
ref. need - prepare featured articles data
ref. risk - literature review (e.g., reverted revs examples, controversiality score) for our dataset specification

Mar 1 2024, 8:33 AM · Research (FY2023-24-Research-April-June)

Aug 5 2023

Aitolkyn added a comment to T333900: Submit the expanded citation quality research to a top-tier venue. .

The paper has been accepted to CIKM (short paper track).

Aug 5 2023, 12:57 PM · Research-outreach, Research (FY2022-23-Research-April-June)

Mar 10 2023

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

We uploaded our work to arxiv: https://arxiv.org/abs/2303.05227

Mar 10 2023, 1:27 AM · Research (FY2022-23-Research-January-March)

Jun 25 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

20/06 ~ 24/06:

extract user lifespan & analyze the lifespan of users vs. collaboration with experts
manually check pages containing sources from external fake websites lists
get dominant sources on wiki from the external lists
finish collecting citation quality scores for the top dataset

Jun 25 2022, 4:54 AM · Research (FY2022-23-Research-January-March)

Jun 17 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

13/06 ~ 17/06:

classify users into exposed and non-exposed in the new random & top datasets
psm on collaboration between experts and non-experts on the new random and top datasets
topic coverage of unreliable source lists (including perennials)
finish collecting citation quality scores for the random dataset

Jun 17 2022, 10:33 AM · Research (FY2022-23-Research-January-March)

Jun 10 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

add one more external source list (3. Snopes)
visualize the intersection of the 3 lists and coverage of #3
citation quality scores start collecting for top2021 dataset

Jun 10 2022, 11:04 AM · Research (FY2022-23-Research-January-March)

Jun 4 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

search for external lists of unreliable sources ( e.g. Melissa Zimdars' fake news websites list )
compute the coverage of wiki by external lists (1. zimdars and 2. daily dots) and compare with perennial source list
setting up the environment for the citation quality scores collection

Jun 4 2022, 9:23 AM · Research (FY2022-23-Research-January-March)

May 27 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

23/05 ~ 27/05

transfer collection of reference need scores to the server (API was too long)
active user contributors to add perennial sources analysis
complete the evolution of references in the perennial source list (the future trend is more positive compared to previous data)

May 27 2022, 10:02 AM · Research (FY2022-23-Research-January-March)

May 20 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

16/05 ~ 20/05

growth of references in each category of perennial source list
comparison of the random and top10 ref. quality evolution
read nature papers: paper1, paper2 --> good visualizations

May 20 2022, 3:17 PM · Research (FY2022-23-Research-January-March)

May 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

It looks very useful, thank you very much!! I'll check this out

May 15 2022, 4:16 AM · Research (FY2022-23-Research-January-March)

May 13 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

09/05 ~ 13/05

collect reference risk scores for the 2 datasets: random and top2021
start collecting all the missing reference need scores
pageviews for 'bad' sources before and after they are classified as 'bad'
analyze the data collected so far (significance tests, distributions, plots)

May 13 2022, 5:33 AM · Research (FY2022-23-Research-January-March)

May 6 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

02/05 ~ 06/05:

re-check the pageviews data for pages in multiple namespaces (namespace_id was added to pageviews_hourly in 2017)
analysis of the reference quality of the most viewed pages' revisions
get pages and collect revision data for two datasets: random and top-viewed

May 6 2022, 7:13 AM · Research (FY2022-23-Research-January-March)

Apr 29 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

25/04 ~ 29/04

extract monthly top-viewed pages and get the pages' revisions at that time
collect reference quality scores for the revisions of top-viewed pages
aggregate pageviews for the references from perennial source list
prepare presentation

Apr 29 2022, 7:36 AM · Research (FY2022-23-Research-January-March)

Apr 22 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

18/04 ~22/04:

extract and analyze data from the pageviews table
check the extracted results with the PageviewsAPI
check PageviewsAPI

Apr 22 2022, 6:45 AM · Research (FY2022-23-Research-January-March)

Apr 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

11/04 ~ 15/04:

perennial source list references lifespan and pageviews
continued exploring PySpark

Apr 15 2022, 1:33 PM · Research (FY2022-23-Research-January-March)

Apr 5 2022

Aitolkyn added a comment to T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

In T305299#7831501, @MoritzMuehlenhoff wrote:

@Aitolkyn Can you please sign https://phabricator.wikimedia.org/L3 ? Then we're good to go.

Apr 5 2022, 11:26 AM · SRE, SRE-Access-Requests

Apr 4 2022

Aitolkyn updated the task description for T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

Apr 4 2022, 4:06 AM · SRE, SRE-Access-Requests

Aitolkyn (Aitolkyn)
User

Projects

Calendar

Today

Tomorrow

Thursday

User Details

Recent Activity
View All

Fri, Apr 19

Tue, Apr 16

Mon, Apr 15

Fri, Apr 5

Sat, Mar 30

Mar 22 2024

Mar 15 2024

Mar 8 2024

Mar 1 2024

Aug 5 2023

Mar 10 2023

Jun 25 2022

Jun 17 2022

Jun 10 2022

Jun 4 2022

May 27 2022

May 20 2022

May 15 2022

May 13 2022

May 6 2022

Apr 29 2022

Apr 22 2022

Apr 15 2022

Apr 5 2022

Apr 4 2022

Aitolkyn (Aitolkyn)User

Projects

Calendar

Today

Tomorrow

Thursday

User Details

Recent ActivityView All

Fri, Apr 19

Tue, Apr 16

Mon, Apr 15

Fri, Apr 5

Sat, Mar 30

Mar 22 2024

Mar 15 2024

Mar 8 2024

Mar 1 2024

Aug 5 2023

Mar 10 2023

Jun 25 2022

Jun 17 2022

Jun 10 2022

Jun 4 2022

May 27 2022

May 20 2022

May 15 2022

May 13 2022

May 6 2022

Apr 29 2022

Apr 22 2022

Apr 15 2022

Apr 5 2022

Apr 4 2022

Aitolkyn (Aitolkyn)
User

Recent Activity
View All