To help with copyvio patrolling in Commons it would be helpful to have a model that score a file for probability for being copyvio. The model should score new files for possible copyvios, to help patrollers focus on new files that require extra attention.
A rule based model could be a file uploaded by non trusted user (e.g not auto-patrolled), and file that lacks EXIF data etc.
ML based model can do similarly to rule based, but using features and weights and may take advantage of other properties we don't take into account.
As an ML based model it SHOULD NOT be dependent on external commercial search system (Bing, Baidu, Google etc) and should be free to use (no paying $$$ for commercial systems). It may be further used either by patrollers to manually search using their own favourite search engine, or using other tool(s) that interacts with commercial systems.