Page MenuHomePhabricator

Aitolkyn (Aitolkyn)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Thursday

  • Clear sailing ahead.

User Details

User Since
Apr 10 2021, 8:11 AM (158 w, 3 d)
Availability
Available
LDAP User
Aitolkyn
MediaWiki User
Unknown

Recent Activity

Fri, Apr 19

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

15/04 - 21/04:
Ref.need experiments with different models (refer to slide#10 for results)
Ref.need feature exploration, e.g., num_words, section_name (refer to slide#8), below plots for Logistic Regression feature importance for positive/negative labels

Screenshot 2024-04-19 at 10.32.30 PM.png (944×938 px, 65 KB)

Ref.risk analysis of URL permanence to reliability label, example below:
Screenshot 2024-04-19 at 10.45.26 PM.png (888×1 px, 175 KB)

Fri, Apr 19, 5:47 PM · Research (FY2023-24-Research-April-June)

Tue, Apr 16

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`

Maybe I need to set anything before I can access those?

Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.

The table name is "aitolkyn.domain_reverted_added_2019_2023" and my user name is "aitolkyn". I am running

SELECT *
FROM   aitolkyn.domain_reverted_added_2019_2023

The error message suggests you're trying to run that query in the analytics MariaDB replicas (using analytics-mysql or directly via mysql). This database only exists in Hive:

[urbanecm@stat1005 ~]$ hive --database=aitolkyn
[...]
hive (aitolkyn)> show tables;
OK
[...]
domain_reverted_added_2019_2023
[...]

Can you try querying for the data using eg. Hive (see https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Systems/Hive)? Or when in Superset, using the presto_analytics_hive database?

Tue, Apr 16, 11:19 AM · Patch-For-Review, SRE, LDAP-Access-Requests

Mon, Apr 15

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`

Maybe I need to set anything before I can access those?

Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.

Mon, Apr 15, 3:46 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

Mon, Apr 15, 2:26 PM · Patch-For-Review, SRE, LDAP-Access-Requests
Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

08/04 - 14/04:

  • ref.need experiments with BERT are continued with additional data cleaning, facing overfitting issues, experimenting on a sample of data following prev. work approach
  • ref.risk extending data with URL permanence in terms of (lifespan of a URL on a page and number of edits)
  • ref.risk analysis of ground-truth labels from enwiki perennial sources list
Mon, Apr 15, 12:36 PM · Research (FY2023-24-Research-April-June)

Fri, Apr 5

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

01/04 - 07/04:

  • ref.need using the dataset we prepared, started experimenting with BERT for sentence classification into cited/uncited in enwiki
Fri, Apr 5, 1:42 PM · Research (FY2023-24-Research-April-June)

Sat, Mar 30

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

25/03 - 31/03:

  • ref.need running baseline models for classification using 1) tf-idf, 2) sentence metadata, + logistic regression
  • planning to run with LLMs next, studying hugging face NLP tutorials
Sat, Mar 30, 9:43 AM · Research (FY2023-24-Research-April-June)

Mar 22 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

18/03-22/03:
ref.need

  • found and fixed an issue with tokenized sentences -> updated the dataset
  • Data: ~16M featured articles and ~4.4M extracted sentence (47% accompanied by a citation)
  • brainstorming and planning for the classifier model
Mar 22 2024, 2:29 PM · Research (FY2023-24-Research-April-June)

Mar 15 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

11/03-15/03:
ref.need

Mar 15 2024, 8:31 AM · Research (FY2023-24-Research-April-June)

Mar 8 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

04/03 - 08/03:

  • ref.need - add featured articles from dewiki, frwiki, ruwiki, ptwiki, eswiki (choice is based on the number of currently existing FAs)
  • ref.need - [for enwiki] prepare dataset for the model with the following columns:
page_id, revision_id, section_name, sentence, paragraph, citation_label

where citation_label = 0, if sentence does not include a reference,
citation_label = 1, if sentence includes a reference

Mar 8 2024, 2:43 PM · Research (FY2023-24-Research-April-June)

Mar 1 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

26/02 - 01/03:

  • onboarding
  • ref. need - prepare featured articles data
  • ref. risk - literature review (e.g., reverted revs examples, controversiality score) for our dataset specification
Mar 1 2024, 8:33 AM · Research (FY2023-24-Research-April-June)

Aug 5 2023

Aitolkyn added a comment to T333900: Submit the expanded citation quality research to a top-tier venue. .

The paper has been accepted to CIKM (short paper track).

Aug 5 2023, 12:57 PM · Research-outreach, Research (FY2022-23-Research-April-June)

Mar 10 2023

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
Mar 10 2023, 1:27 AM · Research (FY2022-23-Research-January-March)

Jun 25 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

20/06 ~ 24/06:

  • extract user lifespan & analyze the lifespan of users vs. collaboration with experts
  • manually check pages containing sources from external fake websites lists
  • get dominant sources on wiki from the external lists
  • finish collecting citation quality scores for the top dataset
Jun 25 2022, 4:54 AM · Research (FY2022-23-Research-January-March)

Jun 17 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

13/06 ~ 17/06:

  • classify users into exposed and non-exposed in the new random & top datasets
  • psm on collaboration between experts and non-experts on the new random and top datasets
  • topic coverage of unreliable source lists (including perennials)
  • finish collecting citation quality scores for the random dataset
Jun 17 2022, 10:33 AM · Research (FY2022-23-Research-January-March)

Jun 10 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
  • add one more external source list (3. Snopes)
  • visualize the intersection of the 3 lists and coverage of #3
  • citation quality scores start collecting for top2021 dataset
Jun 10 2022, 11:04 AM · Research (FY2022-23-Research-January-March)

Jun 4 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
  • search for external lists of unreliable sources ( e.g. Melissa Zimdars' fake news websites list )
  • compute the coverage of wiki by external lists (1. zimdars and 2. daily dots) and compare with perennial source list
  • setting up the environment for the citation quality scores collection
Jun 4 2022, 9:23 AM · Research (FY2022-23-Research-January-March)

May 27 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

23/05 ~ 27/05

  • transfer collection of reference need scores to the server (API was too long)
  • active user contributors to add perennial sources analysis
  • complete the evolution of references in the perennial source list (the future trend is more positive compared to previous data)
May 27 2022, 10:02 AM · Research (FY2022-23-Research-January-March)

May 20 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

16/05 ~ 20/05

  • growth of references in each category of perennial source list
  • comparison of the random and top10 ref. quality evolution
  • read nature papers: paper1, paper2 --> good visualizations
May 20 2022, 3:17 PM · Research (FY2022-23-Research-January-March)

May 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

It looks very useful, thank you very much!! I'll check this out

May 15 2022, 4:16 AM · Research (FY2022-23-Research-January-March)

May 13 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

09/05 ~ 13/05

  • collect reference risk scores for the 2 datasets: random and top2021
  • start collecting all the missing reference need scores
  • pageviews for 'bad' sources before and after they are classified as 'bad'
  • analyze the data collected so far (significance tests, distributions, plots)
May 13 2022, 5:33 AM · Research (FY2022-23-Research-January-March)

May 6 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

02/05 ~ 06/05:

  • re-check the pageviews data for pages in multiple namespaces (namespace_id was added to pageviews_hourly in 2017)
  • analysis of the reference quality of the most viewed pages' revisions
  • get pages and collect revision data for two datasets: random and top-viewed
May 6 2022, 7:13 AM · Research (FY2022-23-Research-January-March)

Apr 29 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

25/04 ~ 29/04

  • extract monthly top-viewed pages and get the pages' revisions at that time
  • collect reference quality scores for the revisions of top-viewed pages
  • aggregate pageviews for the references from perennial source list
  • prepare presentation
Apr 29 2022, 7:36 AM · Research (FY2022-23-Research-January-March)

Apr 22 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

18/04 ~22/04:

  • extract and analyze data from the pageviews table
  • check the extracted results with the PageviewsAPI
  • check PageviewsAPI
Apr 22 2022, 6:45 AM · Research (FY2022-23-Research-January-March)

Apr 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

11/04 ~ 15/04:

  • perennial source list references lifespan and pageviews
  • continued exploring PySpark
Apr 15 2022, 1:33 PM · Research (FY2022-23-Research-January-March)

Apr 5 2022

Aitolkyn added a comment to T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

@Aitolkyn Can you please sign https://phabricator.wikimedia.org/L3 ? Then we're good to go.

Apr 5 2022, 11:26 AM · SRE, SRE-Access-Requests

Apr 4 2022

Aitolkyn updated the task description for T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).
Apr 4 2022, 4:06 AM · SRE, SRE-Access-Requests