Page MenuHomePhabricator

Scope project and explore options for developing tools for improving readability
Closed, ResolvedPublic

Description

We have been successfully developing a model to measure the readability of Wikipedia articles (project-page on metawiki). As a next step, we would like to develop a model that could improve the readability of Wikipedia articles (along the lines of text simplification) taking advantage of recent advances in availability and performance of large language models.

As a first step, in this task, we want to scope the project in more detail. Specifically, we would like to get a better overview of potential approaches for implementation.

  • Reviewing recent literature
  • Reviewing existing models for text summarization and text simplification and comparing with available infrastructure on, e.g., LiftWing, to train/host model
  • Reviewing approaches for evaluation
  • Identify relevant benchmark/evaluation datasets
  • From the above, synthetize a work-plan for implementing and testing an exploratory model for simplification

Event Timeline

weekly update

  • discussed with Ilias from ML Team about current and planned options for hosting LLMs in LiftWing (e.g. Bloom T333861)
  • started to read through some recent papers on text simplification using Wikipedia data

weekly update:

  • reviewed a bunch of papers on text simplification with focus on document-level simplification (in contrast to only sentence-level simplification which has been much more widely studied but is less relevant for our use case). This gave a good overview on
    • how to evaluate (benchmarks such as D-Wikipedia, evaluation metrics such as SARI )
    • proposed solutions (especially those that are multilingual and have long-enough context windows)
    • different facets of simplification that are relevant for human evaluation: in addition to simplification of the text, it is also important to consider preservation of meaning and whether it is still grammatical
  • as a next step I will take a closer look on the available models

weekly update:

  • organized documentation of project
  • Read in more detail through available models for simplification:
    • One of the most promising models is the recently released mlongT5 models: multilingual (~100 languages), accepts longer input sequences which is crucial for the simplification task on paragraph/section level (beyond the sentence level), moderate size so seems realistic we can train and run in our current infrastructure, has been successfully used (i.e. fine-tuned) in summarization tasks.
    • Reference: Uthus, D., Ontañón, S., Ainslie, J., & Guo, M. (2023). mLongT5: A Multilingual and Efficient Text-To-Text Transformer for Longer Sequences. In arXiv [cs.CL]. arXiv. http://arxiv.org/abs/2305.11129
  • Read in more detail about how to systematically evaluate simplification systems:
    • Standard way to quantify the degree of simplification is SARI-score; correlates reasonably well with human judgement
    • For simplification systems, we also need to ensure grammaticality and meaning preservation. It was was that BLEU-score from translation shows the highest correlation with human judgement on these dimensions
    • Both scores require parallel corpora (original vs simplified text)
    • Reference: Xu, W., Napoles, C., Pavlick, E., Chen, Q., & Callison-Burch, C. (2016). Optimizing statistical machine translation for text simplification. Transactions of the Association for Computational Linguistics, 4, 401–415. https://doi.org/10.1162/tacl_a_00107

weekly update:

  • Identified existing benchmarks focusing on document-level simplification. there are tons of datasets on sentence-level simplification (see, e.g., Data-Driven Sentence Simplification: Survey and Benchmark but that is not relevant for our use-case. For the task of document-level simplification, I identified only two variants of an English dataset based on Simple Wikipedia (SWiPE, D-Wikipedia). The corresponding papers provide baselines for the performance of simplification models on these datasets.
  • In the context of measuring readability (though not applied on a simplification task), I came across Klexikon (German) and Vikidia (only English, French). Thus, there are no simplifcation baselines available for these datasets.
  • Our datasets from the multilingual readability project cover those datasets as well (with slightly different pre-processing) but provide document-aligned simplifcation datasets for 10 additional languages https://meta.wikimedia.org/wiki/Research:Multilingual_Readability_Research/Evaluation_multilingual_model#Data

weekly update:

  • started writing synthesis from literature review on text simplification (tasks, data, evaluation, models, etc).
  • should finish next week. planning to add to Meta-wiki .

weekly update: