Page MenuHomePhabricator

Compile list of templates, jargon and policies relevant to Copyvio
Closed, ResolvedPublic

Description

This task involves the work of compiling the list of language/phrases/shorthand/codes/etc. volunteers at the wikis listed below use to signal that a given edit is being reverted on the grounds of a copyright violation and/or related policies.

Knowing the above will enable @MNeisler – as part of T376064 – to approximate the current rate (read: baseline) at which new content edits are reverted on the grounds of suspected/potential copyright violations.

Deadline to participate: September 9. Feel free to edit this task to add the relevant information that concerns your wiki.

Requirements

For each of the wikis listed below, document all of the language/phrases/shorthand/codes/abbreviations/etc. volunteers use in edit summaries that are relevant to copyright violations.

Wikis

Pilot wikis
WikiJargon¹ used in Edit summaries to tag a copyright violation (comma separated)
fr.wikicopyvio, violation du droit d'auteur, droit d'auteur, cv, plag,
es.wikicopyvio, plagio, violación copyright, copyright, posible plagio, posible copyvio, sin wikificar/copyvio, violación derechos de autor, viola copyright
ar.wikiانتهاك الحقوق، خرق حقوق التأليف والنشر، خرق حقوق الطبع
en.wikicopyvio, copy-vio, copy vio, copyright violation, close para, close paraphrase, remove copying, cv-revdel, CLOP

¹ Any language/phrases/shorthand/codes/abbreviations/etc. editors use in edit summaries to identify that they removed a copyright violation.

Other prioritized Wikipedias

These wikis have been prioritized as they will be part of an upcoming experiment.

WikiJargon used in Edit summaries to tag a copyright violation (comma separated)
fa.wiki
ja.wiki
pt.wikiVDA, Violação de direitos autorais, G3, copyvio
de.wikiUrheberrechtsverletzung, URV
it.wikiviolazione di copyright, copyviol
zh.wikicopyvio, 侵權, 侵权, 侵犯版权, 侵犯版權, 侵犯著作權, 侵犯著作权, 抄襲, 抄袭, cv rv, cvrv
ko.wiki
id.wikicopyvio, teks berisi pelanggaran hak cipta, [[WP:PHC|pelanggaran hak cipta]]
uk.wiki
pl.wiki[[WP:NPA|NPA]], npa, [[WP:NPA]], copyvio.
nl.wikicopyvio, auteursrechten(schending)
he.wiki
cs.wikicopyvio, copy vio, porušení autorských práv, porušení aut. práv, okopírované, zkopírované, okopírováno, zkopírováno, doslovná kopie, cv
vi.wikivi phạm bản quyền, VPBQ, copyvio
tr.wiki
Other Wikipedias

Other Wikipedias will be part of future experiments.

WikiJargon used in Edit summaries to tag a copyright violation (comma separated)

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I'm assigning this over to Benoît to coordinate this work.

You appear to have caried them over from the parent task without reviewing them as you should have. As one of those (unnecessary) subscribers, I was notified when you created the task when I otherwise wouldn't have been.
VisualEditor does not apply to this task, and it does not appear to be a Goal.

You appear to have caried them over from the parent task without reviewing them as you should have. As one of those (unnecessary) subscribers, I was notified when you created the task when I otherwise wouldn't have been.
VisualEditor does not apply to this task, and it does not appear to be a Goal.

Got it. Thank you for sharing this context. I think everything is sorted now :)

Can you please say more about what you mean here, @Trizek-WMF? Asked another way: what aspect(s) of T389445 are you thinking we could considering "reusing" in this context?

en.wiki
Documenting a couple of patterns @Sdkb shared offline about en.wiki...

  • Rather than being reverted, many revisions that introduce perceived copyright violations are deleted and reference the RD1 redaction reason: "RD1. Blatant violations of the copyright policy. Best practices for copyrighted text removal can be found at WP:Copyright problems and should take precedence over this criterion. Usernames should not be hidden under RD1."
  • In many instances, copyvio reverts don't have specific language in the edit summary; they're just reverted with the default summary and then {{copyvio-revdel}} is applied. Example.

Added to Tech News:

In order to identify how many edits are reverted due to copyright issues, the Editing team want to compile a list of templates, jargon, and policies used in edit summaries when a copyright violation is removed. We invite community members to list these terms in T402601, or to share their list with Trizek_(WMF). We will collect terms from the following wikis: Persian Wikipedia, French Wikipedia, Spanish Wikipedia, Arabic Wikipedia, English Wikipedia, Japanese Wikipedia, Portuguese Wikipedia, German Wikipedia, Italian Wikipedia, Chinese Wikipedia, Korean Wikipedia, Indonesian Wikipedia, Ukrainian Wikipedia, Polish Wikipedia, Dutch Wikipedia, Hebrew Wikipedia, Czech Wikipedia, Vietnamese Wikipedia, Turkish Wikipedia. This participative project is open until September 9.

I suspect a number of edits reverted as "copied", "plagiar*", as well as deletions under the enwiki criterion "G11", "spam", "advertising", are also copyvios.

on en.wiki typical edit summaries would include "copyvio", "copy-vio", "copy vio", "copyright violation", "cv" (but "cv" is ambiguous as also could be referring to "Curriculum Vitae" - but "cv" and a url is probably a hit), also all can be pluralised (+s) and G12

On EN Wiki i use the following jargon in my edit summaries while removing copyvio:

copyvio
close para
close paraphrase
remove copying

I also use combined variations of the phrases above, but these are the main ones.

With the revdel script:
Requesting copyvio revdel (cv-revdel)

Question: Should we also list terms and jargon that don't directly refer to copyright, but their usage is correlated strongly with copyright issues? Plagiarism is a different concept from copyvio but things that are plagiarized are frequently copyvios too, for instance.

Pl.wiki: [[WP:NPA|NPA]] (automatic), npa, [[WP:NPA]], copyvio.

I think "copyvio" might be used sometimes for images, but rarely. We have a tool for adding common phrases to summary that adds NPA with a link, so that would be the most popular.

On en.wp "G12" and "F9" are the speedy deletion reasons for copyright violations of text and files respectively (the "G11" noted above is a typo for G12, G11 is related to advertising and promotion, not copyright). Do note that these can appear in other edit summaries that are not removal of content (e.g. a deletion discussion may be closed with an edit summary like "close as speedy delete G12" with the nominated content then being deleted under that criterion).

On enwiki, I use "rmv copyvio", "rv copyvio", "rmv close paraphrasing", "rv close paraphrasing". When requesting revdel, "Requesting copyvio revdel (cv-revdel — Red-tailed hawk's version)"

  • Above it is suggested that spam/advertising are quite likely to be copyvios. If the spammer/advertiser is an agent of the copyright holder, they have arguably relaxed the copyright by publishing it with a CC license. Perhaps more to the point the reason for the revert is not copyvio.
  • I'm not sure that we should distinguish between deletions of revisions and reverts, since an effective revert is still required - except where a page is deleted (in which case it should probably be blanked anyway before deletion, as a belt-and-braces measure).

On he.wiki, common edit summaries are הז"י and הפרת זכויות יוצרים. Less common, and somewhat ambiguous but still often suggestive of copyright infringement, are העתקה and הועתק.

he.wiki relevant templates:
files:
תבנית:אישור OTRS = No permission on Commons
תבנית:תמונה מוגנת = Copyvio on Commons
תבנית:תמונה חופשית = Not fair use
תבנית:תמונה בשימוש הוגן = Not free file, should be fair use
תבנית:תמונה בשימוש הוגן ללא מקור = fair use file without source
תבנית:כללי מדיניות ישנים לקבצים = Grandfathered old file
תבנית:אין להעביר לוויקישיתוף = don't move to Commons
ויקיפדיה:תמונות/אולם דיונים = files notice board
ויקיפדיה:ויקישיתוף/קבצים לאישור בקר-רישיון בוויקישיתוף = open license review for Hebrew requests

articles:
תבנית:הפרת זכויות יוצרים - for copyvios

also:
bot checking copyvios - ויקיפדיה:בוט/בדיקת הפרת זכויות יוצרים

handling copyright:
ויקיפדיה:זכויות יוצרים/שאלות ותשובות - for questions
עזרה:תמונות/ניטור - help page for file patrolers

On enwiki, I use CLOP for close paraphrasing, and the other ones mentioned here.

A quick note to say: THANK YOU! Thank you all for contributing to and partnering with us in this work.

It's been a delight to see the stream of log entries in this task that read "USERNAME updated the task description" ❤️

@Thryduulf I listed G11 because pure spam is often done by copy-pasting, so even if it isn't flagged as G12 or copyvio it often is one.

But G11 is also used for a lot of things that are not copyright violations so it is not a reliable indicator of copyright status. This task (as I understand it) is seeking to identify jargon that unambiguously identifies content removed for being a copyright violation.

Hi @Trizek-WMF, Does this list also include the logs (e.g. reason on deleted pages due to copyright infringement and/or RevDel), or is only based on the reason of the reverted edits? Thanks!

bit more (he.wiki):
מדיה ויקי:Filedelete-reason-dropdown - reasons for deletion for admins
ויקיפדיה:מדיניות המחיקה --> Deletion policy
ויקיפדיה:מדיניות המחיקה#סיבות למחיקה מהירה --> reasones for speedy deletion

CLOP, CV, Copyvio, copypaste, cutpaste move (reverting cut and paste moves without attribution), G12, close paraphrasing, copyright violations.

This is quite niche, but English Wikipedia will use PDEL or Presumptive deletion/removal in certain cases; where we have confirmed previous instances of copyright violations from a user and have identified the case as not worth the editor time to carefully review either due to volume or difficulty accessing sources.

We may also refer to the excessive use of quotations (ie. a whole section being made of quotes) as overquoting. Some people use "blanking" to refer to sending an article to Copyright problems.

Japanese Wikipedia

  • Uses the exact wording from MediaWiki:Deletereason-dropdown for copyright-related deletion reasons:
    • 著作権侵害のおそれ (“Possible copyright infringement”) → e.g. 著作権侵害のおそれ: Wikipedia:削除依頼/Article name
    • GFDLまたはCC-BY-SA違反 (“GFDL or CC-BY-SA violation”)
    • [[WP:CSD#全般9]] 明白な著作権侵害 (“Obvious copyright infringement”)
  • Unlike other wikis (e.g. “cv”, “copyvio”), no abbreviations are used.
  • Related policies: WP:COPY, WP:DP#B-1, and WP:CSD#全般9

Note: I only noticed the Wikimedia-wide announcement (Tech News 2025/36) today, so I apologize for the late submission.

Hi @Trizek-WMF, Does this list also include the logs (e.g. reason on deleted pages due to copyright infringement and/or RevDel), or is only based on the reason of the reverted edits? Thanks!

Sorry for the late reply. The list is only for edit summaries, post-edition. We aren't exploring deletions as we can't reuse them for comparisons.