If a file has been uploaded to Commons it's free (otherwise it would have been deleted already), if a file has been uploaded to a Wikipedia it's non-free (otherwise it would have been moved to Commons already).
- Queries
- All Stories
- Search
- Advanced Search
- Transactions
- Transaction Logs
Advanced Search
Jan 21 2016
Jan 20 2016
This is... done, I think. I want to hack on the visual output further, but it works.
Dec 19 2015
Nov 11 2015
Okay, so Coren's been the point of contact in the past between me and the WMF with regards to managing the Yahoo! BOSS API keys that are necessary to use that service. As far as I know, he still has that role. I was suggesting that he could create a new key for Fhocutt for developing/testing this new feature (since sharing of keys doesn't sound like a good idea, although we could do that too, I guess).
Nov 5 2015
@kaldari Still useful to test how the results look when combined with the regular BOSS hits, I guess?
Nov 4 2015
Sorry Coren, I didn't really mean to add you as a subscriber...!
Hmm... I guess you can ask @coren for a BOSS key for testing? Alternatively, disable part of EarwigBot: in earwigbot/wiki/copyvios/__init__.py, comment out line 116 and change 133 to if True:. That should make it just report "no match" for everything. I might add a more graceful fallback in the future.
Nov 3 2015
You probably didn't put it in the "wiki" section.
To be honest, I'm struggling with free time right now. Not sure the best way for you to approach this.
Sep 27 2015
All done now.
Sounds fine. I'm not sure about putting the Turnitin results above the main result summary, but that's a nitpick.
Sep 26 2015
Oh, good point on that last one. I can definitely use posts from my own blog. Will try that.
Now at https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Tests. Did some cleanup and added a few new tests.
Sep 22 2015
This is very useful. Thanks!
Sep 21 2015
I will likely work on this on my own over the next couple of weeks. It'll be useful for other improvements that I plan to make to the comparison engine.
Sep 17 2015
I don't understand. What kind of work are you doing that requires so much memory?
Sep 11 2015
For https://en.wikipedia.org/w/index.php?title=Clinoch_of_Alt_Clut&diff=prev&oldid=680314774, I believe the correct parsing is "Clinoch of Alt Clut" rather than "Clut, Clinoch of Alt", per WP:PEER. This is strange to me because WP:AWB/GF indicates it should be doing this already. For the second page, I think "Byzantine Master of the Crucifix of Pisa" without any modification is correct.
Aug 25 2015
Yes, this is a good idea. I already use https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioExample and https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioPDFExample as basic sanity checks, but a more comprehensive suite would be much better.
Aug 22 2015
It is custom-written. You are right that the particular result there is poor; my first thought is to work on the confidence algorithm a bit to value large contiguous blocks more than lots of disjoint trigrams. For quotes, I'm not so sure; if that issue was fixed I think it might not be so important. I can look into that.
Aug 19 2015
Regarding l10n, the tool works fine for non-English content from a technical perspective (logs show many successful requests involving Korean etc wikis; people have added German and Russian mirrors...).
Aug 15 2015
There is only one outstanding bug with the tool that comes to mind. I have a memory leak that I've been unable to get to the bottom of for about a year now. It happens so slowly and unpredictably that progress on it is difficult, especially given the lack of urgency and questions about why Python's internal memory management isn't working. I could probably fix it if I devoted enough time to extra debugging.
Jul 30 2015
Okay! I released mwparserfromhell 0.4.1 (and 0.4.2, because I made a mistake...) just an hour ago, which fixes the Python 3.5 issue. I also have Windows binary releases working properly thanks to Appveyor.
Jul 24 2015
Sorry, I forgot I had an unreleased fix for the Python 3.5 issue that you've been waiting on. I'm back to working on the parser after a little break so it should come soon.
Jun 4 2015
May 24 2015
mwparserfromhell 0.4.1 should come soon (next few days, just need to work out the windows binary situation) - I'll add 3.5 support along with it, so we shouldn't have any problems with this.
Mar 30 2015
Mar 9 2015
Interesting. It seems like the problem has been resolved, although 16 pages still result from the above query. However, that looks like accurate replication of a corrupted database rather than the other way around, so I'm deferring to T92046.
There appear to be sixteen affected pages in total, given by the following query. Note that NS 3 is User_talk and NS 4 is Wikipedia.
Jan 9 2015
I thought that issue was caused by an incorrectly set-up Windows build environment, which @valhallasw and I resolved by distributing binaries with releases? I'm not clear on this since I haven't looked in a while, but it seems the underlying problem is https://github.com/earwig/mwparserfromhell/issues/78 blocking py3.3 and py3.4 builds. Would fixing that and releasing v0.4 along with the necessary binaries be sufficient to fix this?