Page MenuHomePhabricator

Aitolkyn (Aitolkyn)
User

Projects

User does not belong to any projects.

Today

  • No visible events.

Tomorrow

  • No visible events.

Thursday

  • No visible events.

User Details

User Since
Apr 10 2021, 8:11 AM (244 w, 3 d)
Availability
Available
LDAP User
Aitolkyn
MediaWiki User
Unknown

Recent Activity

Dec 30 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.

Note: My contract is finishing tomorrow. Thank you to everyone involved in this project! Below, I will report updates for the past week.

Dec 30 2024, 6:33 AM · Research (FY2024-25-Research-January-March)

Dec 20 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.
  • SHAP explanations for peacock behavior can be found at the end of this notebook. Additionally, I added bars with the top 20 words in the explanation for each class, where these are adjectives for the "positive" class, and there are mainly proper nouns for the "negative" class.
Dec 20 2024, 12:05 PM · Research (FY2024-25-Research-January-March)

Dec 13 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.

Updates:

  • Exploring the explainability of small language models with SHAP values
  • Writing a final report on data preparation and baseline experiments
  • Adding our reference need work as an additional use-case. Reran full-scale evaluation experiments on reference need data in the top 10 languages by active user count.
  • Examples of shape values for peacock behavior, in which words like "greatest" or "stunning beauty" appear to affect the model's positive label the most.

image.png (506×2 px, 327 KB)

Dec 13 2024, 1:16 PM · Research (FY2024-25-Research-January-March)

Dec 6 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.

Updates:

  • Additionally fine-tuned XLM-Roberta with max length of 512 for NPOV and Peacock. XLM-R performs slightly better than mBERT with the same max length.
  • Updated mBERT trained on enwiki - Peacock
  • The updates are reflected in the table above and in the notebooks at: for npov, for peacock
Dec 6 2024, 5:58 AM · Research (FY2024-25-Research-January-March)

Nov 29 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.

Updates

  • Baselines for tasks 2 and 3 were updated and evaluation of full data was reported
  • All plots & results are presented here: for npov, for peacock
Nov 29 2024, 6:35 AM · Research (FY2024-25-Research-January-March)

Nov 22 2024

Aitolkyn added a comment to T380569: Baseline Experiments for SDS 1.2.1 B.

Updates:

  • Train multiple models on Task 2 and Task 3
  • I used multilingual BERT and XLM-R Longformer in my experiments
  • I ran testing on the full evaluation dataset and reported the results in detail in the slides here
  • We also test the performance by topic in enwiki
  • Findings
    • Passing page title along with the content improves the accuracy
    • Increased context size of 4K tokens with XLM-R Longformer does not improve the performance significantly
    • mBERT trained on all wikis performs better on other languages than mBERT trained only on English samples
    • NPOV detection classification shows lower accuracy results
Nov 22 2024, 10:12 AM · Research (FY2024-25-Research-January-March)
Aitolkyn created T380569: Baseline Experiments for SDS 1.2.1 B.
Nov 22 2024, 10:05 AM · Research (FY2024-25-Research-January-March)
Aitolkyn created T380567: Baseline Experiments for SDS 1.2.1 B.
Nov 22 2024, 10:03 AM · Research (FY2024-25-Research-October-December)
Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

@Aitolkyn I am inclined to close this task and create a new one for baselines. Would it be ok with you?

Nov 22 2024, 6:30 AM · Research (FY2024-25-Research-October-December)

Nov 15 2024

Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

Data can be found on the cluster at:

  • Eval data:

Task 2 at aitolkyn/ai_use_cases/npov/data_final/eval_npov_data.parquet
Task 3 at aitolkyn/ai_use_cases/peacock/data_final/eval_peacock_data.parquet

Nov 15 2024, 6:10 AM · Research (FY2024-25-Research-October-December)

Nov 8 2024

Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

Related Code:

Nov 8 2024, 7:47 AM · Research (FY2024-25-Research-October-December)

Nov 1 2024

Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

Related Code:

Updates:

  • Get templates that link to the NPOV policy using langlinks API and page. redirects
    • hewiki and plwiki do not have a dedicated page for POV template
  • Collected all historical revisions that contain the above-mentioned templates across 23 languages and additionally supplemented with a bunch of metadata
  • Extracted positive/negative pairs from each page following the previous approach
    • 5 languages, i.e., hewiki, hiwiki, idwiki, rowiki, elwiki, have less than 1K pairs --> will be discarded from the final dataset (+plwiki that doesn't have a dedicated POV template page)
    • Stratification by topic for sampling will be applied only to enwiki due to the sparse distribution by topic for non-English languages
  • Checked stats, distribution plots available at the bottom of this notebook
Nov 1 2024, 7:03 AM · Research (FY2024-25-Research-October-December)

Oct 25 2024

Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

Updates:

  • I retrieved 10 similar pages per each seed article sampled previously. Seed article dataset contains additional features to link similar pages, namely sim_page_ids and sim_page_titles and similar pages along with the metadata are available in a separate file.
    • Find current versions of seed articles at ai_use_cases/categories/sample_articles/seed_articles_w_similar10_v1 and similar10 articles at ai_use_cases/categories/sample_articles/similar10_metadata_v1
  • After Mykola’s initial analysis, I updated the above two datasets following his suggestions. I additionally collected revision_text for the main section converting it to plaintext using mwedittypes (https://github.com/geohci/edit-types/blob/main/mwedittypes/utils.py#L77C5-L77C26)
  • I started looking into the previous codes and publications for Task 2, NPOV detection
  • The pipeline for extracting negative and positive samples has been discussed and it was decided to collect for all articles history.
Oct 25 2024, 7:49 AM · Research (FY2024-25-Research-October-December)

Oct 18 2024

Aitolkyn updated subscribers of T357036: References Model: Multilingual Reference Need .
Oct 18 2024, 4:25 PM · Research
Aitolkyn updated subscribers of T357036: References Model: Multilingual Reference Need .

Hello all, my contract for this project ended on September 30, and I would like to summarize our work here.

Oct 18 2024, 4:17 PM · Research
Aitolkyn added a comment to T377423: Collect Evaluation Data for SDS 1.2.1 B.

Status Update:

  • I collected topics, categories, and section headings for all articles that are assigned a topical category (with a score > 0.5) in 23 languages in the AYA23 model family.
  • I checked the distribution by topic in each language, available here
    • Note: one article could be counted for multiple topic categories
  • Based on the decided thresholds from the above distributions, I sampled seed articles in each language
    • decided to sample 50 articels per topical category before 2024 and max(25, number of created articles) per topic in 2024, to oversample for the most recent data.
Oct 18 2024, 6:21 AM · Research (FY2024-25-Research-October-December)

Oct 4 2024

Aitolkyn added a comment to T368614: Essential work - model quantization.

I re-ran our latest reference-need model on a test data of 15K sentences. Our currently deployed model uses distilbert-base-multilingual-cased with torch dynamic quantization (column 2 - torch - in the plots below).

Oct 4 2024, 1:24 PM · Research, Essential-Work, Research-engineering

Sep 27 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

Report on ref. need model latency: https://docs.google.com/document/d/1EJbSJ7fekZvor8F-FiPVl7EGlTkckKtSIGOMXb1K2FM/edit?usp=sharing

Sep 27 2024, 11:32 AM · Research

Sep 20 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

The ref. need was deployed to production this week. Given the time issue, we run experiments to further improve the model latency.
I shared two updated binaries with ML team with 1) a smaller model and 2) a smaller model + quantization. According to preliminary analysis on stat machines, it should reduce the processing time ~2 times.

Sep 20 2024, 1:21 PM · Research

Sep 4 2024

Aitolkyn added a comment to T371902: Request to host the Reference Need Model on LiftWing.

Hello @isarantopoulos! We downgraded to match the version in the knowledge-integrity repo.

Sep 4 2024, 3:56 AM · Lift-Wing, Machine-Learning-Team
Aitolkyn updated the task description for T371902: Request to host the Reference Need Model on LiftWing.
Sep 4 2024, 3:51 AM · Lift-Wing, Machine-Learning-Team

Aug 28 2024

Aitolkyn added a comment to T371902: Request to host the Reference Need Model on LiftWing.

Hi Aiko! The location on the stat1010 is /home/aitolkyn/temp/reference-quality/pretrained_models/multilingual_reference_need_128_v0.pkl
sha512: 0af0ecd12e05e7c40a0d39dd155589917130d1fa00711c3675c48d4373edca402bdc25cb85a56925deb24ebcf3c0ac01843179c86321f0991772b8963c27ed24 *multilingual_reference_need_128_v0.pkl

Aug 28 2024, 12:35 AM · Lift-Wing, Machine-Learning-Team

Aug 27 2024

Aitolkyn updated the task description for T371902: Request to host the Reference Need Model on LiftWing.
Aug 27 2024, 11:41 AM · Lift-Wing, Machine-Learning-Team

Aug 23 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

Update ref. need model input to

lang_code, section_name, sentence, next_sentence, prev_sentence

and reduce input context size to 128 given the time limitation constraints.

Aug 23 2024, 8:58 AM · Research

Jul 26 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

For the second question, it is per article, so these tentative predictions were computed as the times to process an article revision.

Jul 26 2024, 5:57 PM · Research

Jul 19 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

ref. risk logic to compute the score follows the definition given in our WWW paper, where we account for blacklisted/deprecated sources in perennial sources list as risky sources, which can be checked in the repo: https://gitlab.wikimedia.org/repos/research/reference-quality/-/tree/classifiers
ref.risk we further annotate an additional 1000 sources to use as ground-truth.

Jul 19 2024, 3:11 PM · Research

Jul 12 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

Prepared the code here: https://gitlab.wikimedia.org/repos/research/reference-quality/-/tree/classifiers

Jul 12 2024, 2:39 PM · Research

Jul 1 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .
  • ref. need - model works well when tested on smaller languages (kkwiki balanced ~0.7)
  • ref. need - prepared scripts to pass for production
  • ref. risk - run classifier on all domains in enwiki --> save the score and pred. probability per domain
  • ref. risk - article-level analysis of ref. risk score
Jul 1 2024, 5:19 AM · Research

Jun 21 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .
  • ref.need - with transfer learning, results on the full enwiki dataset ( F1-score 0.76, ROC-AUC 0.77, PR-AUC 0.83)
  • ref. risk - results with the ground-truth on enwiki (Accuracy: 0.86, F1-score: 0.74)

The majority (58%) of misclassification is caused by the middle 'Generally unreliable' category

image.png (858×1 px, 79 KB)

Next:

  • test ref. need model on smaller language editions
Jun 21 2024, 3:50 PM · Research

Jun 14 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .
  • ref. need - best-performing model so far is reaching 0.76, we tested with multilingual bert and xlm-roberta. The random sample compared to featured articles sample has on average > twice higher ref. need (RN) score, refer to the plot below
    rn_fa(rnd).png (816×1 px, 36 KB)
  • ref. need - compiled labeling dataset for 5 languages (en, es, de, fr, ru) and labeled ruwiki
  • ref. risk - prepared three training datasets using perennial sources (PS) as ground-truth label (before PS list, after PS list, and all)
  • ref. risk - binary classifier on balanced dataset, notebook with results here
Jun 14 2024, 1:21 PM · Research

Jun 2 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

27/05-02/06:

  • ref.need - run inference with our best-performing model so-far and compute ref. need score with two samples of articles: featured and random sample, getting higher scores for random articles, meaning they are missing more references.

ref. risk - modelling approach set as below

  • Binary classifier to detect risky sources
  • Domain-level
  • English Wikipedia
  • Ground-truth labels
    • Perennial source list (positive class: deprecated and blacklisted sources)
Jun 2 2024, 1:52 PM · Research

May 24 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

20/05-26/05:

  • ref. need - experiments with additional context (such as prev/ next sections, sentences, or paragraphs)
  • ref. need - expand the experiments to multilingual scenarios, train and test on different languages with f1-score ar 0.70-0.72
  • ref. risk - more experiments of the signals of reliability (on data by domain, URL, and page levels)
May 24 2024, 4:15 PM · Research

May 17 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

12/05 - 19/05:

  • ref.need - testing best-performing classifier so far with the remaining languages (refers to the 5 languages featured articles were collected from as reported in earlier weeks), pretrained model from we're using is distilbert-base-multilingual-cased
  • ref.risk - changed the target feature from absolute edit number survived to the survival ratio, with 1 - meaning a reference survived all the edits after addition and 0 - meaning that the reference didn't survive subsequent edits. Analyzed the signals by comparing references on the article level
May 17 2024, 2:19 PM · Research

May 10 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

06/04 - 12/04:

  • Ref. need - model improves with more context, f1-score achieved is ~0.74
  • Ref. risk - experiment with assigning possible reliability labels based on the observed patterns from perennial sources with known labels (last plot here)
  • Ref. risk - compare featured articles and the remaining articles in the current snapshot with assigned our labels
May 10 2024, 2:14 PM · Research

May 3 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

29/04 - 05/05:

  • Ref. need - add topic as a feature
  • Ref. need - ran experiments with additional numerical and textual features, so far the highest performance achieved in terms of f1-score is 0.733
May 3 2024, 5:02 PM · Research

Apr 26 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

22/04 - 28/04:

  • Ref.need - Tested with Citation Needed data, our approach gets better results with their data
  • Ref.need - Added additional inputs to the model, such as the section index, paragraph index, sentence index
  • Ref.risk - Limit our data to instances of sources that are labeled in the perennial sources list, clean and preprocess the data for classification
Apr 26 2024, 4:42 PM · Research

Apr 19 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

15/04 - 21/04:
Ref.need experiments with different models (refer to slide#10 for results)
Ref.need feature exploration, example refer to slide#8 and below plots for Logistic Regression feature importance for positive/negative labels

Screenshot 2024-04-19 at 10.32.30 PM.png (944×938 px, 65 KB)

Ref.risk analysis of URL permanence to reliability label, example below:
Screenshot 2024-04-19 at 10.45.26 PM.png (888×1 px, 175 KB)

Apr 19 2024, 5:47 PM · Research

Apr 16 2024

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`

Maybe I need to set anything before I can access those?

Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.

The table name is "aitolkyn.domain_reverted_added_2019_2023" and my user name is "aitolkyn". I am running

SELECT *
FROM   aitolkyn.domain_reverted_added_2019_2023

The error message suggests you're trying to run that query in the analytics MariaDB replicas (using analytics-mysql or directly via mysql). This database only exists in Hive:

[urbanecm@stat1005 ~]$ hive --database=aitolkyn
[...]
hive (aitolkyn)> show tables;
OK
[...]
domain_reverted_added_2019_2023
[...]

Can you try querying for the data using eg. Hive (see https://wikitech.wikimedia.org/wiki/Analytics/Cluster/Systems/Hive)? Or when in Superset, using the presto_analytics_hive database?

Apr 16 2024, 11:19 AM · SRE, LDAP-Access-Requests

Apr 15 2024

Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

mysql error: SELECT command denied to user 'research'@'10.67.30.187' for table `aitolkyn`.`domain_reverted_added_2019_2023`

Maybe I need to set anything before I can access those?

Hi @Aitolkyn: Can you share the query you are trying to run? While knowing nothing about this, shouldn't the user be aitolkyn here and the table name research? I may be wrong about this! It says table aitolkyn.

Apr 15 2024, 3:46 PM · SRE, LDAP-Access-Requests
Aitolkyn added a comment to T362533: Grant Access to Superset for aitolkyn.

@ssingh Thank you for checking! I get the following error when trying to access my tables:

Apr 15 2024, 2:26 PM · SRE, LDAP-Access-Requests
Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

08/04 - 14/04:

  • ref.need experiments with BERT are continued with additional data cleaning, facing overfitting issues, experimenting on a sample of data following prev. work approach
  • ref.risk extending data with URL permanence in terms of (lifespan of a URL on a page and number of edits)
  • ref.risk analysis of ground-truth labels from enwiki perennial sources list
Apr 15 2024, 12:36 PM · Research

Apr 5 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

01/04 - 07/04:

  • ref.need using the dataset we prepared, started experimenting with BERT for sentence classification into cited/uncited in enwiki
Apr 5 2024, 1:42 PM · Research

Mar 30 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

25/03 - 31/03:

  • ref.need running baseline models for classification using 1) tf-idf, 2) sentence metadata, + logistic regression
  • planning to run with LLMs next, studying hugging face NLP tutorials
Mar 30 2024, 9:43 AM · Research

Mar 22 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

18/03-22/03:
ref.need

  • found and fixed an issue with tokenized sentences -> updated the dataset
  • Data: ~16M featured articles and ~4.4M extracted sentence (47% accompanied by a citation)
  • brainstorming and planning for the classifier model
Mar 22 2024, 2:29 PM · Research

Mar 15 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

11/03-15/03:
ref.need

Mar 15 2024, 8:31 AM · Research

Mar 8 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

04/03 - 08/03:

  • ref.need - add featured articles from dewiki, frwiki, ruwiki, ptwiki, eswiki (choice is based on the number of currently existing FAs)
  • ref.need - [for enwiki] prepare dataset for the model with the following columns:
page_id, revision_id, section_name, sentence, paragraph, citation_label

where citation_label = 0, if sentence does not include a reference,
citation_label = 1, if sentence includes a reference

Mar 8 2024, 2:43 PM · Research

Mar 1 2024

Aitolkyn added a comment to T357036: References Model: Multilingual Reference Need .

26/02 - 01/03:

  • onboarding
  • ref. need - prepare featured articles data
  • ref. risk - literature review (e.g., reverted revs examples, controversiality score) for our dataset specification
Mar 1 2024, 8:33 AM · Research

Aug 5 2023

Aitolkyn added a comment to T333900: Submit the expanded citation quality research to a top-tier venue. .

The paper has been accepted to CIKM (short paper track).

Aug 5 2023, 12:57 PM · Research-outreach, Research (FY2022-23-Research-April-June)

Mar 10 2023

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
Mar 10 2023, 1:27 AM · Research (FY2022-23-Research-January-March)

Jun 25 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

20/06 ~ 24/06:

  • extract user lifespan & analyze the lifespan of users vs. collaboration with experts
  • manually check pages containing sources from external fake websites lists
  • get dominant sources on wiki from the external lists
  • finish collecting citation quality scores for the top dataset
Jun 25 2022, 4:54 AM · Research (FY2022-23-Research-January-March)

Jun 17 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

13/06 ~ 17/06:

  • classify users into exposed and non-exposed in the new random & top datasets
  • psm on collaboration between experts and non-experts on the new random and top datasets
  • topic coverage of unreliable source lists (including perennials)
  • finish collecting citation quality scores for the random dataset
Jun 17 2022, 10:33 AM · Research (FY2022-23-Research-January-March)

Jun 10 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
  • add one more external source list (3. Snopes)
  • visualize the intersection of the 3 lists and coverage of #3
  • citation quality scores start collecting for top2021 dataset
Jun 10 2022, 11:04 AM · Research (FY2022-23-Research-January-March)

Jun 4 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.
  • search for external lists of unreliable sources ( e.g. Melissa Zimdars' fake news websites list )
  • compute the coverage of wiki by external lists (1. zimdars and 2. daily dots) and compare with perennial source list
  • setting up the environment for the citation quality scores collection
Jun 4 2022, 9:23 AM · Research (FY2022-23-Research-January-March)

May 27 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

23/05 ~ 27/05

  • transfer collection of reference need scores to the server (API was too long)
  • active user contributors to add perennial sources analysis
  • complete the evolution of references in the perennial source list (the future trend is more positive compared to previous data)
May 27 2022, 10:02 AM · Research (FY2022-23-Research-January-March)

May 20 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

16/05 ~ 20/05

  • growth of references in each category of perennial source list
  • comparison of the random and top10 ref. quality evolution
  • read nature papers: paper1, paper2 --> good visualizations
May 20 2022, 3:17 PM · Research (FY2022-23-Research-January-March)

May 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

It looks very useful, thank you very much!! I'll check this out

May 15 2022, 4:16 AM · Research (FY2022-23-Research-January-March)

May 13 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

09/05 ~ 13/05

  • collect reference risk scores for the 2 datasets: random and top2021
  • start collecting all the missing reference need scores
  • pageviews for 'bad' sources before and after they are classified as 'bad'
  • analyze the data collected so far (significance tests, distributions, plots)
May 13 2022, 5:33 AM · Research (FY2022-23-Research-January-March)

May 6 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

02/05 ~ 06/05:

  • re-check the pageviews data for pages in multiple namespaces (namespace_id was added to pageviews_hourly in 2017)
  • analysis of the reference quality of the most viewed pages' revisions
  • get pages and collect revision data for two datasets: random and top-viewed
May 6 2022, 7:13 AM · Research (FY2022-23-Research-January-March)

Apr 29 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

25/04 ~ 29/04

  • extract monthly top-viewed pages and get the pages' revisions at that time
  • collect reference quality scores for the revisions of top-viewed pages
  • aggregate pageviews for the references from perennial source list
  • prepare presentation
Apr 29 2022, 7:36 AM · Research (FY2022-23-Research-January-March)

Apr 22 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

18/04 ~22/04:

  • extract and analyze data from the pageviews table
  • check the extracted results with the PageviewsAPI
  • check PageviewsAPI
Apr 22 2022, 6:45 AM · Research (FY2022-23-Research-January-March)

Apr 15 2022

Aitolkyn added a comment to T305888: Reference Quality in English Wikipedia.

11/04 ~ 15/04:

  • perennial source list references lifespan and pageviews
  • continued exploring PySpark
Apr 15 2022, 1:33 PM · Research (FY2022-23-Research-January-March)

Apr 5 2022

Aitolkyn added a comment to T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).

@Aitolkyn Can you please sign https://phabricator.wikimedia.org/L3 ? Then we're good to go.

Apr 5 2022, 11:26 AM · SRE, SRE-Access-Requests

Apr 4 2022

Aitolkyn updated the task description for T305299: Requesting access to Analytic Cluster for Research Intern (Aitolkyn).
Apr 4 2022, 4:06 AM · SRE, SRE-Access-Requests