Per T108422, it would be nice to have a suite of pages that could be used to consistently test Copyvio Detector's violation confidence algorithm. These could be set up as subpages under https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/.
The test suite should include:
- A page that consists almost entirely of plagiarism, like https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioExample
- A page that includes a single paragraph of plagiarism, but the rest is unplagiarized
- A page that includes numerous plagiarized sentences mixed with unplagiarized text
- A page that is closely paraphrased from another source (a few words or phrases changed in each sentence)
- A couple pages that are not plagiarized at all, like https://en.wikipedia.org/wiki/Mary_Wollstonecraft
For pages that actually contain plagiarism, make sure they are plagiarized from public domain sources that are not list at https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Exclusions.
For pages based on existing Wikipedia articles, make sure that you attribute the articles to their original URL in the edit summary so that CC attribution isn't violated.