Huggle, STiki and other tools do the same, and I think we should too.
Amendment: I think we should have an on-wiki page that is fully-protected that lists sites and editors that should be whitelisted. We can format the page however we want and use API:Links to get the editor names and external links on the page. The results could be cached in Redis so we aren't parsing the page every time some one uses the tool. This on-wiki whitelist allows admins and interface editors to keep it up-to-date. Between us and Diannaa I think we'll quickly see better results in CopyPatrol.
Note there is already a blacklist for excluding certain sites from being added to the EranBot database https://en.wikipedia.org/wiki/User:EranBot/Copyright/Blacklist