Text complexity scoring
Closed, DeclinedPublic
Actions

Assigned To

None

Authored By

	Halfak
	Jan 20 2017, 6:55 PM

Description

Design an AI that flags complicated articles or complicated sections of articles -- perhaps using reading grade level or some such.

Wiki thing it helps with:

Make our information available to our readers, who often don't understand what they're reading
Helps editors identify articles that could use either clearer into/summary sections, or perhaps need two articles (one simplified, one with all the complex science/math/etc)

Things that might helps us get this AI built:

Readability measures (LIX, Flesch–Kincaid etc)
Lists of specialized terms not used by the general public

Related Objects

Mentioned Here: T246438: Add text complexity scoring to article quality models

Event Timeline

Halfak created this task.Jan 20 2017, 6:55 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptJan 20 2017, 6:55 PM

So I'm thinking that we could have a scorer that, given a rev_id, can return a complex JSON document of scores using a library like textstat.

Here's what I'd expect to get back from a query for a recent revision of the article :en:Waffle. https://ores.wikimedia.org/v2/scores/enwiki/fleschkincade/759009699

{
  "grade": 8.1,
  "words": 3073,
  "sections": [
    {"level": 0, "words": 203, "grade": 6.8},
    {"level": 1, "words": 231, "grade": 9.5},
    {"level": 1, "words": 150, "grade": 9.0},
    {"level": 2, "words": 350, "grade": 5.2},
    {"level": 2, "words": 275, "grade": 7.3},
    {"level": 2, "words": 150, "grade": 8.1},
    {"level": 2, "words": 83, "grade": 9.5},
    {"level": 1, "words": 102, "grade": 9.5},
    {"level": 2, "words": 299, "grade": 9.5},
    {"level": 2, "words": 324, "grade": 9.5},
    {"level": 1, "words": 97, "grade": 9.5},
    ...
  ]
}

This format uses an array for "sections" assuming that the first item is section 0 (the lead) and proceeds from there.

This would be pretty easy to put together, I think.

Halfak added projects: artificial-intelligence, good first task.Jan 20 2017, 8:35 PM

Restricted Application added a subscriber: TerraCodes. · View Herald TranscriptJan 20 2017, 8:35 PM

Ricordisamoa subscribed.Jan 22 2017, 12:48 AM

Halfak triaged this task as Low priority.Jan 26 2017, 3:54 PM

Is this really 'easy'?

Yeah. I think so. There's a well defined library for doing it. We might set our thresholds differently for "Easy" on the #revision-scoring-as-a-service team. If you'd like to pick it up, I'd be really happy to talk to you about it. FWIW, this would be immensely easier than implementing a stochastic prediction model.

Natalia subscribed.Feb 26 2017, 10:12 PM

Basvb subscribed.Apr 19 2017, 3:05 PM

The textstat package looks like a good idea for English. For other languages this might be a bit more difficult to use. Maybe using overall word frequency within a language (wikipedia version) can be used to determine the how complex the terms used in an article are (how many % of the article is top-1000 words, how many top-10000, how many top 100000, and how many outside of that).

Trizek-WMF awarded a token.May 21 2017, 2:13 PM

Trizek-WMF subscribed.

Trizek-WMF unsubscribed.

JeanFred awarded a token.May 22 2017, 8:54 AM

Nemo_bis subscribed.May 23 2017, 2:11 PM

I worked with @Tdcan at the Wikimedia Hackathon to build https://github.com/wiki-ai/flesch_complexity

We'll probably want to extend this to include other text readability scores before deploying in production, but for now, you can test out the model. https://ores.wmflabs.org/v2/scores/enwiki/flesch

Halfak edited projects, added Machine-Learning-Team (Active Tasks); removed Machine-Learning-Team.May 29 2017, 4:04 PM

We should add more scores before we call this done.

Harej moved this task from Ideas to Epic on the Machine-Learning-Team board.Apr 3 2019, 4:31 AM

Hello @Halfak. I am a new contributor to wikimedia and would love to help out with this issue. What are the restrictions on the model to be used to detect complexity and do you wish to have a regression based model or a classifier?

Thanks,
Chaitanya

Hello @Chtnnh! We've had a group of newcomers take this task on once before. We were able to get their work deployed for a period of time, but I think we need to do some more work to make text complexity scoring useful on its own.

But I think we can do a lot with adding text complexity to our article quality models. I created a task for you with a bunch of detail about how I'd approach the problem. See T246438: Add text complexity scoring to article quality models. Let me know if that would interest you.

Halfak removed a project: good first task.Feb 28 2020, 1:27 PM

@Halfak What happened to the solutions deployed before? Were they ineffective in solving the issue or were there other constraints?

The AI posed in the original question could be replaced by a script that parses JSON outputs given by other NLP models that we can deploy thus abstracting the process of flagging a complicated article further from the NLP models.

The benefits of this are two fold. Firstly, It makes for easier NLP model coding and replacement. Secondly, it separates the two tasks and clearly demarcates responsibilities for new contributors.

What do you think?

As far as T246438 is concerned, I would love to help out with that and see through its deployment.

The biggest limitation of the solution deployed previously is that it didn't have a clear use-case. E.g. let's say that an article scores an 8.3 flesch reading ease. What does that mean for the article? Is it too high? Too low? Is it merely an artifact of the article's general topic space? The nice thing about incorporating these signal sources into our article quality models is that the model can work out these details for us and give us actionable feedback that could direct work.

Right, so what we are doing in T246438 is essentially incorporating complexity into article quality, leaving this task redundant.

TerraCodes unsubscribed.Mar 1 2020, 9:13 AM

@Halfak Should we mark this task as duplicate?

I think we should decline this as it doesn't look like we want to deploy this. But we would like to do something different with T246438.

Noted, thank you.

Text complexity scoringClosed, DeclinedPublicActions

Description

Related Objects

Event Timeline

Text complexity scoring
Closed, DeclinedPublic
Actions