Page MenuHomePhabricator

Analyse the distribution of translated articles across the standard quality criteria
Closed, ResolvedPublic

Description

During the T356765: Correlation between article length, number of translations within a time period, experience of users, and deletion rate. we observed that 20% of the articles created through the Content Translation tool meet the standard quality criteria.

  • It is at least 8kB long in size
  • It has at least 1 category
  • It has at least 7 sections
  • It is illustrated with 1 or more images
  • Its references are at least 4
  • It has 2 or more intra wiki links.

The rest of the articles that do not meet the criteria, as part of the follow-up analysis (T389676), we would like to know the distribution of the rest of the articles across the standard quality criteria. Some of the exploratory questions are:

  • For example, how many articles meet 4 of the criteria, and what is missing.
  • Have articles been improved post creation or not?
  • Did the distribution change from article creation date to the latest date?

Event Timeline

Summary

The analysis explores quality of Wikipedia articles created using the Content Translation (CX) tool and not deleted, focusing on how these articles align with six standard quality criteria: length (minimum 8kB), number of categories, number of sections, presence of images, number of references, and intra-wiki links. As of March 2025, about 20% of all translated articles are of standard quality. Of those, 13% met the standard at the time of creation, while another 7% were improved post-creation. About 78% of the translated articles, do not meet the standard quality. Interestingly, about 1% of articles that initially met the standard quality, later did not, possibly due to content removal.

Among the small percentage of article that saw a drop in quality, reduction in length, decrease in wikilinks, and removal references were the most common causes. The most commons reasons for improvement in article quality are, growth in length, additional sections, and more wikilinks. In nearly 60% of the articles that were improved after creation, the addition of media (at least one image) contributed to meeting the standard quality.

newplot.png (400×1 px, 40 KB)

Among articles that do not meet the standard quality, about 23% meet four of the six criteria, 40% meet three, and 30% meet two. This suggests that many articles are within reach of meeting the standard quality with a few targeted improvements. For instance, nearly 80,000 articles fail to qualify solely because they lack an image. Similarly, expanding articles that already meet four of six criteria by just 100 to 2,300 bytes could bring roughly 109,000 articles up to the standard quality. For articles that meet three criteria, expanding them by 3,000 to 4,500 bytes – especially combined with addition of references or structural improvements (such as division of sections), will bring 170,000 more to standard quality.

Among all criteria, page length was the most frequently missing and highly varied. Articles that meet only one of the six criteria tend to be short, often under 2,000 bytes, while those that meet more of the criteria trend toward the 8,000-byte threshold. A lot of the articles that meet four out of six criteria are already quite close to the 8,000-byte threshold, with many of them between 5,700 and 8,000 bytes. So just expanding the articles by a few hundred to a couple of thousand bytes will enable them to meet the standard quality. The same goes for articles that meet three criteria—most are between 1,700 and 5,000 bytes, and plenty are already over 3,000 bytes.


PWaigi-WMF moved this task from Backlog to Product Signoff on the LPL Hypothesis board.