Page MenuHomePhabricator

Auto copyeditor
Open, Stalled, LowPublicFeature

Description

I would love to see a Transformer such as GPT.x that has learned how to copyedit articles.

I am a member of WP's Guild of Copy Editors. We copyedit thousands of articles per year, mostly making routine improvements to the corpus. WP has many bots that algorithmically modify articles, but transformers have advanced text processing to a far more sophisticated level and seem like a better approach.

The basic idea would be to train using before and after revisions where the after edit summary is labeled "copy edit" or some variant. The transformer learns the art and can then be set loose, initially on articles tagged for copyediting. I am sure that Guild members (including me) would love to provide monitoring/feedback along the way to ensure a quality outcome.

This would allow copyeditors to focus their efforts on tasks that the transformer leaves undone, improving the corpus and allowing copy editors a better quality of life.

Event Timeline

Aklapper changed the task status from Open to Stalled.Sep 5 2022, 6:28 PM

Hi @Lfstevens, thanks for taking the time to report this. What/who is a "transformer" or "GPT.x"? I assume "WP" means "English Wikipedia"? Could you please use the feature request form, and fill in the sections in the template, and provide more context? Thanks a lot:

Feature summary (what you would like to be able to do and where):

Use case(s) (list the steps that you performed to discover that problem, and describe the actual underlying problem which you want to solve. Do not describe only a solution):

Benefits (why should this be implemented?):

Feature summary: I would like to have an automated copyeditor that can handle the bulk of the hundreds of copyedit requests that the Guild of Copyeditors receive each month.

Use case: Many of the articles for which we receive copyedit requests consist of only a few paragraphs, many without subsections. The necessary edits are often trivial, but take time and attention of editors to address. This offers low-hanging fruit for copyediting automation. Once the autoedit is complete, a human editor can review before the edit is finalized.

Benefits: save time and attention of the overworked copyeditors, with possible additional benefits in edit consistency and compliance with WP:MOS, etc.

A Transformer is a type of neural network that has achieved spectacular results in text processing tasks ranging from language translation to question answering to conversing with humans. GPT.x refers to a series of such models, built by OpenAI, that have set benchmarks in these areas, even generating working software from plain language requests. Transformers' ability to handle languages other than English should offer a much-appreciated boost to non-English language wikis.

Thanks for the quick response.

What is the "Guild of Copy Editors"? Where to see examples for requests? Where to see a statement about "spectacular results" and which languages apart from English does that refer to? Please provide more context via links and references for statements - thanks a lot. :)