Page MenuHomePhabricator

Meeting 14 - Fri 29 July 2016 - 12:30 UTC
Closed, ResolvedPublic

Description

Date: 29 July 2016
Time: 12:30 UTC
Type: skype, fallback IRC (channel #gsoc-catimages)

Description: 14th meeting (Week 10) to set MVP (and "OVP" ;) and continue working

Agenda:

  • Find "magic" number which is "sexy":
    • number of NewImages uploads to commons per day: ~11k (see T141189)
    • number of files not categorized directly with upload per day: ~10-20%, variing from 5-100% but it's hard to give an accurate number, since we lack in stats from wikimedia (see Z441#6202)
    • does it make sense to run the bot on NewImages or should we go for uncategorized images (as catimages did) given the fact that we only want to categorize files only that have not been categorized by humans yet
    • Proposals for MVP by @DrTrigon:
      • given we consider all NewImages (categorized ones too): The MVP is to categorize at least 5% on non-file-type cats of newfiles over the curse of at least 7 days in succession in automatic mode. This categorization must be done on high quality cats (e.g. leaf cat, template) and at least 3 categories shall be placed. The rate of obviously wrong errors (e.g. color paint drawing categorized as monochrome photograph) must not exceed 10% of the categorized files during that campaing. [@DrTrigon feel quite confident about this.]
      • given we consider uncategorized files only: The MVP is to categorize at least 10% on non-file-type cats of uncategorized files (which is about 1% of NewImages for that day) over the curse of at least 7 days in succession in automatic mode. This categorization must be done on high quality cats (e.g. leaf cat, template) and at least 3 categories shall be placed. The rate of obviously wrong errors (e.g. color paint drawing categorized as monochrome photograph) must not exceed 10% of the categorized files during that campaing. [@DrTrigon needs some test runs on uncategorized pages and may be more stats for wikimedia to finally decide on this.]
    • MVP must also include:
      • How we want to close this GSoC project and maintain the code in future?
      • Bot request on commons along with a bot setup an running on labs (auto-mode).
      • Bot script for "home" users (manual mode).
      • Docker for future development of the code
      • Docu including feature comparison between old one and this, what are future development outlooks (training, learning, CV algos, audio/video media support, storing of additional data e.g. DB on labs with API, fudo?, more?)
      • may be something about beta test(er)s...?
      • Agreements with uploades of big file sets (eth-bib, etc.). Agreements with OSM and others we rely on.
      • (more?)
    • Proposal for "OVP": All we ever discussed... ;))
    • Input of @AbdealiJK and @jayvdb on these proposals is needed before meeting such that we can agree on the final MVP proposal at the meeting latest
  • Continue with other tasks again
  • Beta testers?
  • video2copyright?
  • ...

Minutes of the Meeting: