I (@Halfak) talked to some OSM folks about how they are building something that is very similar to ORES. Here are my notes:
* Working with OSM for a while -- engineering priorities. * Engineer. Started working on building dev applications for validation. Focusing on detection and ML stuff. * Engineer. Recently started working on OSM. Validation. * Thinking about validation and tools for about a year. Manual labeling. Rule based things. ----- 25,000 changesets per day. Review changesets as they happen using tools. - How long does it take to review the average changeset? - How do people coordinate reviews? (Think patrolled flag) - "My area" using a geo filter - Are there areas that under-patrolled? - Watchlists? - Bounding box --> RSS feed (low adoption, bad user experience) - Tools are outside the OSM infrastructure - Have some cron jobs that look for constraint violations --> micro-tasking managers - OSM discourages automatic edits - How many people do this and do they need rights/permissions? - What are ya'll doing with ML? How have you formalized the problem?
My general sense is that they're building something that is so like ORES it's absurd. I think we should explore creating a "changeset oriented" feature tree. And a revscoring.extractors.osm.Extractor.
We might also want to refactor the whole library so that the mediawiki-specific bits are moved elsewhere.
- revscoring --> (scoring, mwscores, osmscores)