Page MenuHomePhabricator

Earwig (Ben Kurtovic)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 25 2014, 1:47 AM (238 w, 6 d)
Availability
Available
IRC Nick
Earwig
LDAP User
BenKurtovic
MediaWiki User
The Earwig [ Global Accounts ]

Recent Activity

Feb 24 2019

zhuyifei1999 awarded T216340: Raise memory limit for copyvios tool's k8s webservice a Like token.
Feb 24 2019, 7:32 AM · Toolforge
Earwig closed T216340: Raise memory limit for copyvios tool's k8s webservice as Resolved.
Feb 24 2019, 5:33 AM · Toolforge
Earwig added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

I've managed to fix a couple more bugs and poor design choices in the tool, and it looks like memory usage has fallen to more reasonable levels, so I'm closing this ticket. Thanks for the help earlier!

Feb 24 2019, 5:33 AM · Toolforge

Feb 19 2019

Earwig added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

Did some investigating with my tool of choice guppy and found a potential "leak" (really shouldn't be, but apparently a stack frame was living longer than intended and keeping a bunch of things alive with it). With that cleaned up, the pure-Python tools no longer seem to be reporting any leak candidates, but memory usage still seems kinda high. I'll follow up.

Feb 19 2019, 6:52 AM · Toolforge

Feb 17 2019

Earwig added a comment to T216340: Raise memory limit for copyvios tool's k8s webservice.

uWSGI logs the following every several hours, which I assume is the OOM-killer:

Feb 17 2019, 5:20 PM · Toolforge
Earwig created T216340: Raise memory limit for copyvios tool's k8s webservice.
Feb 17 2019, 12:30 AM · Toolforge

Feb 16 2019

Earwig closed T216312: Copyvio detection tool is down as Resolved.

It's working now.

Feb 16 2019, 8:37 PM · Tools

May 20 2018

Earwig added a comment to T194541: Investigation: Why is there a Google Proxy API usage spike every 5 days?.

Looks like there was no increase in tool usage on the 18th, but I don't have the exact number of Google API queries made readily available.

May 20 2018, 7:30 PM · Tools, Community-Tech

May 13 2018

Earwig added a comment to T194541: Investigation: Why is there a Google Proxy API usage spike every 5 days?.

OK, I'll try blocking the bot user agents from above and in @MusikAnimal's comment. If that doesn't reduce the rate on the 16th, or we still want to implement additional protections, we'll go for @Niharika's suggestion of requiring logins. (I'm not sure how this would integrate with the API, though.)

May 13 2018, 8:42 PM · Tools, Community-Tech

May 12 2018

Earwig added a comment to T194541: Investigation: Why is there a Google Proxy API usage spike every 5 days?.

OK, so starting around 2018-05-11 at 07:40, someone hammers the tool for two hours copyvio-checking a bit over a thousand AfC drafts. They're not using the API, but the sheer rate definitely makes it look like an automated process. They're checking mostly active drafts, but some declined submissions that haven't been touched in months as well. The URLs all have the same format as the copyvio check link in the submission template, a format which probably wouldn't arise if you were generating the URLs yourself, so I suspect it's some web crawler with a predictable activity pattern. I can't imagine why a person would behave in this manner, nor a real Wikipedia bot.

May 12 2018, 9:13 PM · Tools, Community-Tech

May 2 2018

Earwig added a comment to T193559: Copyvio detection tool cannot use Google search engine.

I don't have access to request IPs on Toolforge. Other methods of tracking are creepy/error-prone (or maybe even disallowed?), and I don't want logging in to be required, so it's difficult.

May 2 2018, 3:14 AM · Tools, Community-Tech

Apr 11 2018

Earwig added a comment to T191861: mwparserfromhell ParserError on Premier League.

This was fixed in mwparserfromhell v0.5 (latest stable is 0.5.1, this bug existed in versions 0.4.4 and earlier). Please upgrade.

Apr 11 2018, 1:51 AM · Scoring-platform-team (Current), ORES

Aug 9 2017

Earwig closed T172397: Tool "copyvios" loads assets from code.jquery.com, a subtask of T172065: Hunt for Toolforge tools that load resources from third party sites, as Resolved.
Aug 9 2017, 8:40 PM · Toolforge-standards-committee, Tools, Privacy
Earwig closed T172397: Tool "copyvios" loads assets from code.jquery.com as Resolved.

I fixed it. Thanks.

Aug 9 2017, 8:40 PM · Tools

Nov 18 2016

Earwig added a comment to T149542: Stats page shows misplaced Draft that isn't actually misplaced.

Yep. It might be covered by another ticket. These kinda things often are. I'm not sure.

Nov 18 2016, 10:05 AM · MediaWiki-General-or-Unknown
Earwig added a comment to T149542: Stats page shows misplaced Draft that isn't actually misplaced.

It's a database desync issue. (I thought I mentioned that to primefac, guess it got miscommunicated?)

Nov 18 2016, 10:02 AM · MediaWiki-General-or-Unknown

Aug 19 2016

Earwig closed T113287: new_discussions.py does not work right at all as Resolved.
Aug 19 2016, 5:34 AM · Reports-bot

Aug 4 2016

Earwig closed T113316: members.py should follow redirects when generating membership lists as Resolved.

Redirects are followed, and cards are updated (T120695), as long as the new project title is used in wikiproject.json.

Aug 4 2016, 1:28 PM · Reports-bot
Earwig closed T120695: Script to track WikiProject renames and update WikiProjectCards as Resolved.

Implemented in 64aaa1d. Should work as expected, as long as the new project name is configured in wikiproject.json and the old one isn't (since that's how the bot determines which project names are valid).

Aug 4 2016, 1:28 PM · Reports-bot
Earwig renamed T139645: Per-site configuration system for reports_bot from Per-project configuration system for reports_bot to Per-site configuration system for reports_bot.
Aug 4 2016, 12:24 PM · Reports-bot
Earwig added a comment to T139645: Per-site configuration system for reports_bot.

We need some form of per-site configuration anyway. For example, sites have custom names for things like the wikiproject.json file, and there's localization questions with the bot's messages.

Aug 4 2016, 12:24 PM · Reports-bot
Earwig closed T119358: Tool that automatically creates list of newly created/improved articles created based on Wikidata criteria as Resolved.

Added support for category trees. The configuration allows using a list of categories exclusively, mixing them with Wikidata, or using the project index. Should be good enough for most purposes.

Aug 4 2016, 8:38 AM · Reports-bot
Earwig closed T119358: Tool that automatically creates list of newly created/improved articles created based on Wikidata criteria, a subtask of T116831: Women in Red workflow automation / optimization (tracking), as Resolved.
Aug 4 2016, 8:38 AM · Tracking-Neverending, WikiProject-X
Earwig added a comment to T139209: Include scores in CopyPatrol interface for each source URL.

It wasn't a user-agent issue, but something else that's hard to explain. Anyway, I fixed it.

Aug 4 2016, 5:08 AM · Community-Tech, CopyPatrol

Jul 7 2016

Earwig closed T137646: Inexplicable delay between a WikiProjectCard being created and it being posted to the corresponding WikiProject member list as Resolved.

I checked my logs from that time, and it turns out the Labs databases were experiencing a bit of replication lag, which I had coincidentally happened to notice:

Jul 7 2016, 7:43 PM · Reports-bot
Earwig closed T116668: Track more than main namespace and draft namespace in project index. as Resolved.
Jul 7 2016, 3:53 PM · Reports-bot
Earwig closed T116668: Track more than main namespace and draft namespace in project index., a subtask of T106876: Refactor reports_bot, as Resolved.
Jul 7 2016, 3:53 PM · Goal, Reports-bot, WikiProject-X

Jun 29 2016

Earwig removed a subtask for T66539: Issues related to the Draft namespace (tracking): T116668: Track more than main namespace and draft namespace in project index..
Jun 29 2016, 10:33 PM · Tracking-Neverending, Wikimedia-Site-requests
Earwig removed a parent task for T116668: Track more than main namespace and draft namespace in project index.: T66539: Issues related to the Draft namespace (tracking).
Jun 29 2016, 10:33 PM · Reports-bot

Jun 23 2016

Earwig closed T116664: Store page titles as IDs instead of titles as Resolved.

This is done in the schema, and will be deployed as soon as the new update_project_index script is finished.

Jun 23 2016, 6:47 PM · Reports-bot
Earwig closed T116664: Store page titles as IDs instead of titles, a subtask of T106876: Refactor reports_bot, as Resolved.
Jun 23 2016, 6:47 PM · Goal, Reports-bot, WikiProject-X
Earwig moved T106877: Readme file and documentation for reports_bot from In Progress to Done on the Reports-bot board.
Jun 23 2016, 6:43 PM · Reports-bot
Earwig closed T106877: Readme file and documentation for reports_bot as Resolved.

All documentation is now in the README or module docstrings.

Jun 23 2016, 6:43 PM · Reports-bot

Jun 9 2016

Earwig closed T132949: Create an output API for Earwig's Copyvio Detector Tool, a subtask of T132832: Show the comparison from Earwig's detector on the CopyPatrol interface, as Resolved.
Jun 9 2016, 9:49 PM · Community-Tech, CopyPatrol
Earwig closed T132949: Create an output API for Earwig's Copyvio Detector Tool as Resolved.

This should work now. Simply pass detail=true when using action=compare.

Jun 9 2016, 9:49 PM · Community-Tech, CopyPatrol
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

@kaldari According to my logs, (human) tool usage has remained normal, but API usage completely stopped after Jun 8 at ~22:45 UTC — does this match with your info? If so, it would indicate that the German API users are responsible for the high usage rate. I don't know why they would suddenly stop using it, though, so we can't assume anything.

Jun 9 2016, 8:09 PM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

There are two links:

Jun 9 2016, 4:58 PM · Community-Tech, Developer-Advocacy

Jun 7 2016

Earwig added a comment to T132949: Create an output API for Earwig's Copyvio Detector Tool.

I can do the implementation, but it would be helpful to get some suggestions for the output format.

Jun 7 2016, 6:38 PM · Community-Tech, CopyPatrol
Earwig closed T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? as Resolved.

Yes, it looks good now. Cheers.

Jun 7 2016, 8:54 AM · Community-Tech, Developer-Advocacy
Earwig closed T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?, a subtask of T131169: Help CorenBot migrate to a new API, as Resolved.
Jun 7 2016, 8:54 AM · Community-Tech
Earwig closed T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?, a subtask of T116957: Plagiarism detection tools for text (tracking), as Resolved.
Jun 7 2016, 8:54 AM · CopyPatrol

Jun 6 2016

Earwig reopened T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? as "Open".

Google works, but unfortunately, it seems we are having some issues with the results themselves.

Jun 6 2016, 4:00 AM · Community-Tech, Developer-Advocacy
Earwig reopened T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?, a subtask of T116957: Plagiarism detection tools for text (tracking), as Open.
Jun 6 2016, 4:00 AM · CopyPatrol
Earwig reopened T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?, a subtask of T131169: Help CorenBot migrate to a new API, as Open.
Jun 6 2016, 4:00 AM · Community-Tech

May 23 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

The copyvio text has been deleted so I can't really investigate this.

May 23 2016, 5:18 PM · Community-Tech, Developer-Advocacy

May 20 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

This question was asked and answered above for me. I don't think Eranbot uses anything besides Turnitin; did you mean CSB? At the moment, it looks like usage has dropped from the previous estimate, perhaps because people are less satisfied with the current quality of results. Ballpark is between 1,000 and 4,000 per day.

May 20 2016, 12:57 AM · Community-Tech, Developer-Advocacy

May 17 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

About half of all queries.

May 17 2016, 2:58 AM · Community-Tech, Developer-Advocacy

May 13 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Is anyone gonna answer my question first?

May 13 2016, 12:50 AM · Community-Tech, Developer-Advocacy

May 10 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

@kaldari Probably—it's not a big deal to implement—but what about the API?

May 10 2016, 9:32 PM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

@Compassionate727 It's funny, I asked nearly the exact same question...

May 10 2016, 5:44 PM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

I've got Yandex up and running for now. I set up a proxy on a personal server, since I can't use the Lab's one due to the IP thing.

May 10 2016, 10:06 AM · Community-Tech, Developer-Advocacy

May 9 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

I don't think the copyvios tool actually takes advantage of any Labs-specific features (IOW, the DB replicas). It might be cheaper for everyone if I self-host it and do some sketchy stuff on my end—like scraping Bing directly—so the Labs folks aren't held responsible.

May 9 2016, 11:12 PM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

@Ricordisamoa As a service, it seems fairly limited. Maybe in the future? Is there a timeframe?

May 9 2016, 11:08 PM · Community-Tech, Developer-Advocacy

May 5 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Agreed, we can handle this without panic.

May 5 2016, 3:00 AM · Community-Tech, Developer-Advocacy

May 4 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Re point #2, can we argue that CSB is user-initiated on the principle that a user submitting an article implicitly triggers a check? Maybe bury it in an edit notice when you create a page?

May 4 2016, 12:10 AM · Community-Tech, Developer-Advocacy

May 2 2016

Earwig added a comment to T132949: Create an output API for Earwig's Copyvio Detector Tool.

Probably not too crazy, but it depends on the way you want the results presented.

May 2 2016, 9:07 PM · Community-Tech, CopyPatrol

May 1 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Yes. Bing was shut off at the end of the month.

May 1 2016, 6:50 PM · Community-Tech, Developer-Advocacy

Apr 28 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

If our usage remained at 300,000 queries per month [...]

Apr 28 2016, 11:57 PM · Community-Tech, Developer-Advocacy

Apr 20 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

2.2. Subject to your strict compliance with these Terms, General Policies and Site ToS, Yandex grants you a non-exclusive, non-assignable, non-transferrable right to use the Service for the following purposes: (i) display Yandex search results on your website and in application; (ii) make temporary copies of Yandex search results for the use on your website or in application.
2.5. You shall be entitled to use the Service solely for the purpose of providing Yandex search results at your website or in application without alteration of order of Yandex search results impression, unless otherwise provided herein.
2.7. [...] You hereby further undertake at any time to refrain from, as well assist or permit any third parties performing the following actions:
2.7.7. Reorder, intermix, obscure, filter, replace the text, images or other information in Yandex search results obtained through the Service, unless otherwise required by applicable legislation and provided herein;
2.7.11. Modify the display of any website or webpage accessed by the links through the Service.

Apr 20 2016, 6:40 AM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

I'm hoping there isn't a limit on how many accounts can register the same IP address!

Apr 20 2016, 5:45 AM · Community-Tech, Developer-Advocacy

Apr 18 2016

Earwig added a comment to T132949: Create an output API for Earwig's Copyvio Detector Tool.

So http://tools.wmflabs.org/copyvios/api, but a solution for caveat #1?

Apr 18 2016, 10:04 PM · Community-Tech, CopyPatrol

Apr 17 2016

Earwig lowered the priority of T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? from Unbreak Now! to High.
Apr 17 2016, 7:18 PM · Community-Tech, Developer-Advocacy

Apr 13 2016

Earwig reopened T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool? as "Open".

Nope.

Apr 13 2016, 7:02 AM · Community-Tech, Developer-Advocacy
Earwig reopened T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?, a subtask of T116957: Plagiarism detection tools for text (tracking), as Open.
Apr 13 2016, 7:02 AM · CopyPatrol

Apr 3 2016

Earwig closed T131175: Help Copyvio Detector migrate to a new API as Resolved.

It works. Hallelujah.

Apr 3 2016, 4:04 AM · Community-Tech
Earwig closed T131175: Help Copyvio Detector migrate to a new API, a subtask of T116957: Plagiarism detection tools for text (tracking), as Resolved.
Apr 3 2016, 4:04 AM · CopyPatrol

Mar 31 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

EarwigBot doesn't tag revisions at the moment, and hasn't for a while; there's only the web interface which is run on demand. Its log of checks is not public, and cross-referencing at scale would be a bit difficult because it's not designed to retain results for more than a few days.

Mar 31 2016, 5:06 AM · Community-Tech, Developer-Advocacy

Mar 30 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

@Doc_James Would be useful, though I'm not sure where we'd get the data, but comparing it against our test cases might be a good start.

Mar 30 2016, 1:41 PM · Community-Tech, Developer-Advocacy
Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Yes, it makes multiple queries per check, each with different chunks of sentences distributed somewhat uniformly throughout the article. The chunks have a size limit we can adjust, and the maximum number of queries (currently 10) can be changed as well, though it needs to be large enough to generate useful results when only a portion of an article is copied.

Mar 30 2016, 1:04 AM · Community-Tech, Developer-Advocacy

Mar 28 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Also, the WMF should have more accurate/long-term information on usage stats through BOSS's own interface, which I can't access myself. I don't know if said info would be per-user or including Coren's stuff, but either way it would be useful information.

Mar 28 2016, 5:52 PM · Community-Tech, Developer-Advocacy

Mar 23 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

The ballpark is 6,000–12,000 queries/day, based on the past few days. We might be hitting up against the CSE limit then, but just barely. There's a (fairly conservative) bound of 300,000 queries/month...

Mar 23 2016, 7:34 AM · Community-Tech, Developer-Advocacy

Mar 3 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Clock is ticking. Any updates?

Mar 3 2016, 7:44 AM · Community-Tech, Developer-Advocacy

Feb 27 2016

Earwig added a comment to T128247: Broken section editing link.

A null edit fixed it. Here's a screenshot from before:

Feb 27 2016, 4:05 AM · MediaWiki-Interface

Feb 7 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Experience indicates search engines are miles better than Turnitin at detecting copyright violations as done by my tool.

Feb 7 2016, 5:56 AM · Community-Tech, Developer-Advocacy

Feb 2 2016

Earwig added a comment to T125459: Investigation: Can we find a new search API for CorenSearchBot and Copyvio Detector tool?.

Google would be ideal, if we can work out a thing with them. I've looked into DuckDuckGo a bit and I'm not sure their setup is right for us; they seem more concerned with providing semantic search results than having a large text database, which is what we really need. Having dealt with Yahoo for (seven?) years at this point, I am not terribly impressed by them and suggest we look elsewhere.

Feb 2 2016, 1:45 AM · Community-Tech, Developer-Advocacy

Jan 31 2016

Earwig added a comment to T125325: Allow administrators to disable access to a blocked user's watchlist.

Not a huge fan of this. None of the other admin blocking powers can disable "reader-focused" features that only affect the user directly (e.g., we can't stop people from browsing pages). Also, it doesn't seem particularly useful.

Jan 31 2016, 5:23 AM · MediaWiki-Watchlist

Jan 21 2016

Earwig added a comment to T124225: PageImages should never return non-free images.

If a file has been uploaded to Commons it's free (otherwise it would have been deleted already), if a file has been uploaded to a Wikipedia it's non-free (otherwise it would have been moved to Commons already).

Jan 21 2016, 6:46 AM · MW-1.27-release (WMF-deploy-2016-03-08_(1.27.0-wmf.16)), Reading-Web-Sprint-68-"Java and JavaScript are basically the same", Reading-Admin, Readers-Community-Engagement, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Patch-For-Review, WMF-Legal, PageImages

Jan 20 2016

Restricted Application updated subscribers of T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

This is... done, I think. I want to hack on the visual output further, but it works.

Jan 20 2016, 8:57 AM · Community-Tech, CopyPatrol

Dec 19 2015

Earwig updated the title for P2442 {{cite pmid/*}} from untitled to {{cite pmid/*}}.
Dec 19 2015, 7:32 AM

Nov 11 2015

Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

Okay, so Coren's been the point of contact in the past between me and the WMF with regards to managing the Yahoo! BOSS API keys that are necessary to use that service. As far as I know, he still has that role. I was suggesting that he could create a new key for Fhocutt for developing/testing this new feature (since sharing of keys doesn't sound like a good idea, although we could do that too, I guess).

Nov 11 2015, 12:30 AM · Community-Tech, CopyPatrol

Nov 5 2015

Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

@kaldari Still useful to test how the results look when combined with the regular BOSS hits, I guess?

Nov 5 2015, 1:40 AM · Community-Tech, CopyPatrol

Nov 4 2015

Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

Sorry Coren, I didn't really mean to add you as a subscriber...!

Nov 4 2015, 6:43 AM · Community-Tech, CopyPatrol
Earwig updated subscribers of T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

Hmm... I guess you can ask @coren for a BOSS key for testing? Alternatively, disable part of EarwigBot: in earwigbot/wiki/copyvios/__init__.py, comment out line 116 and change 133 to if True:. That should make it just report "no match" for everything. I might add a more graceful fallback in the future.

Nov 4 2015, 6:42 AM · Community-Tech, CopyPatrol

Nov 3 2015

Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

You probably didn't put it in the "wiki" section.

Nov 3 2015, 6:52 AM · Community-Tech, CopyPatrol
Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

To be honest, I'm struggling with free time right now. Not sure the best way for you to approach this.

Nov 3 2015, 3:49 AM · Community-Tech, CopyPatrol

Sep 27 2015

Earwig closed T110778: [AOI] Create a test suite for Copyvio Detector as Resolved.
Sep 27 2015, 9:02 AM · Community-Tech
Earwig added a comment to T110778: [AOI] Create a test suite for Copyvio Detector.

All done now.

Sep 27 2015, 9:02 AM · Community-Tech
Earwig added a comment to T110144: Integrate Turnitin (as used in Plagiabot) into Copyvio Detector tool [AOI].

Sounds fine. I'm not sure about putting the Turnitin results above the main result summary, but that's a nitpick.

Sep 27 2015, 5:22 AM · Community-Tech, CopyPatrol

Sep 26 2015

Earwig added a comment to T110778: [AOI] Create a test suite for Copyvio Detector.

Oh, good point on that last one. I can definitely use posts from my own blog. Will try that.

Sep 26 2015, 6:59 PM · Community-Tech
Earwig added a comment to T110778: [AOI] Create a test suite for Copyvio Detector.

Now at https://en.wikipedia.org/wiki/User:EarwigBot/Copyvios/Tests. Did some cleanup and added a few new tests.

Sep 26 2015, 6:56 AM · Community-Tech

Sep 22 2015

Earwig added a comment to T110778: [AOI] Create a test suite for Copyvio Detector.

This is very useful. Thanks!

Sep 22 2015, 6:39 AM · Community-Tech

Sep 21 2015

Earwig added a comment to T110778: [AOI] Create a test suite for Copyvio Detector.

I will likely work on this on my own over the next couple of weeks. It'll be useful for other improvements that I plan to make to the comparison engine.

Sep 21 2015, 5:20 PM · Community-Tech

Sep 17 2015

Earwig added a comment to T112881: Create Cyberbot Project on Labs.

I don't understand. What kind of work are you doing that requires so much memory?

Sep 17 2015, 11:20 PM · The-Wikipedia-Library, Cloud-Services, VPS-Projects

Sep 11 2015

Earwig added a comment to T112227: Incorrect {{DEFAULTSORT}} additions.

For https://en.wikipedia.org/w/index.php?title=Clinoch_of_Alt_Clut&diff=prev&oldid=680314774, I believe the correct parsing is "Clinoch of Alt Clut" rather than "Clut, Clinoch of Alt", per WP:PEER. This is strange to me because WP:AWB/GF indicates it should be doing this already. For the second page, I think "Byzantine Master of the Crucifix of Pisa" without any modification is correct.

Sep 11 2015, 6:44 AM · AutoWikiBrowser, WorkType-Maintenance

Aug 25 2015

Earwig added a comment to T108422: [AOI] Investigation: Can we improve Copyvio Detector?.

Yes, this is a good idea. I already use https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioExample and https://en.wikipedia.org/wiki/User:The_Earwig/Sandbox/CopyvioPDFExample as basic sanity checks, but a more comprehensive suite would be much better.

Aug 25 2015, 3:49 AM · Community-Tech

Aug 22 2015

Earwig added a comment to T108422: [AOI] Investigation: Can we improve Copyvio Detector?.

It is custom-written. You are right that the particular result there is poor; my first thought is to work on the confidence algorithm a bit to value large contiguous blocks more than lots of disjoint trigrams. For quotes, I'm not so sure; if that issue was fixed I think it might not be so important. I can look into that.

Aug 22 2015, 12:29 AM · Community-Tech

Aug 19 2015

Earwig added a comment to T108422: [AOI] Investigation: Can we improve Copyvio Detector?.

Regarding l10n, the tool works fine for non-English content from a technical perspective (logs show many successful requests involving Korean etc wikis; people have added German and Russian mirrors...).

Aug 19 2015, 12:33 PM · Community-Tech

Aug 15 2015

Earwig added a comment to T108422: [AOI] Investigation: Can we improve Copyvio Detector?.

There is only one outstanding bug with the tool that comes to mind. I have a memory leak that I've been unable to get to the bottom of for about a year now. It happens so slowly and unpredictably that progress on it is difficult, especially given the lack of urgency and questions about why Python's internal memory management isn't working. I could probably fix it if I devoted enough time to extra debugging.

Aug 15 2015, 2:54 AM · Community-Tech

Jul 30 2015

Earwig added a comment to T106763: Mandatory dependency on mwparserfromhell.

Okay! I released mwparserfromhell 0.4.1 (and 0.4.2, because I made a mistake...) just an hour ago, which fixes the Python 3.5 issue. I also have Windows binary releases working properly thanks to Appveyor.

Jul 30 2015, 7:27 AM · Pywikibot-textlib.py, Pywikibot

Jul 24 2015

Earwig added a comment to T106763: Mandatory dependency on mwparserfromhell.

Sorry, I forgot I had an unreleased fix for the Python 3.5 issue that you've been waiting on. I'm back to working on the parser after a little break so it should come soon.

Jul 24 2015, 2:38 PM · Pywikibot-textlib.py, Pywikibot