Page MenuHomePhabricator

EditCheck: Investigate a heuristic to detect and exclude quoted text
Closed, ResolvedPublic

Description

Editors shouldn't usually alter the contents of quotations. So for many or most Edit Checks, it would be desirable for them to ignore quoted text.

However quoted text can be difficult to detect. Quotations can be represented in different ways, e.g. template wrapped (e.g. {{Blockquote ...}} on enwiki or {{Zitat ...}} on dewiki), or wrapped in <blockquote>...</blockquote>, or enclosed in plain quotation marks (e.g. “...” or "..." in English, 「...」 in Traditional Chinese, etc).

It is infeasible to detect all quoted text reliably, because of the ambiguity of quotation marks, but we may be able to minimize false positives at the cost of having more false negatives.

Event Timeline

Change #1227867 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] BaseEditCheck: add ignoreQuotedContent config

https://gerrit.wikimedia.org/r/1227867

^ patch is an attempted approach to this:

  1. anything in a blockquote is a quote
  2. any point inside a block node with an odd number of quotation marks preceding it is considered to be quoted

There's an attempt to match up types of quotation marks, so you can nest quotations with different styles and it won't get confused. It also tries to cope with distinguishing between single-quotes and apostrophes.

This is going to completely ignore templates that're used for wrapping quotes. However, so long as the content that "quoted" is a template-parameter those weren't going to trigger most checks anyway. If there are templates that're being used as {{open_quote}} foo {{close_quote}} then that won't count.

Change #1227867 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] BaseEditCheck: add ignoreQuotedContent config

https://gerrit.wikimedia.org/r/1227867

Change #1229619 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck should call isRangeValid not isRangeInValidSection

https://gerrit.wikimedia.org/r/1229619

DLynch removed a project: Editing QA.

QA note for after that other patch merges: this adds a new ignoreQuotedContent config, which (if you set it) stops ranges that're inside quotes from being allowed. The easiest way to test this is probably to make a textmatch rule that includes ignoreQuotedContent:true that'll find some word ("FOO"), and then verify that it matches on just FOO, but not on "FOO".

Change #1229619 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] TextMatchEditCheck should call isRangeValid not isRangeInValidSection

https://gerrit.wikimedia.org/r/1229619

I am still seeing textmatch check being matched for ranges that're inside quotes even after setting ignoreQuotedContent:true here: https://en.wikipedia.beta.wmcloud.org/wiki/MediaWiki:Editcheck-config.json

Screenshot 2026-01-26 at 4.38.23 PM.png (474×2 px, 553 KB)

Change #1233304 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/VisualEditor@master] Edit check: fix an inversion of the ignoreQuotedContent config logic

https://gerrit.wikimedia.org/r/1233304

Annoyingly layered problem. There was an error in the config check, masked by an error in the test-logic. Patch should fix it.

Change #1233304 merged by jenkins-bot:

[mediawiki/extensions/VisualEditor@master] Edit check: fix an inversion of the ignoreQuotedContent config logic

https://gerrit.wikimedia.org/r/1233304

Ryasmeen edited projects, added Verified; removed Editing QA.