Session Themes and Topics
- Theme: Defining our products, users and use cases
- Topic: How do we integrate Machine Learning into our products?
Session Leader
- Aaron Halfaker
Facilitator
- Kate Chapman
Description
This session looks at the use of machine learning and other types of automated assessments in the Wikimedia ecosystem. We’ll discuss what Wikimedia needs to do in order to embrace the challenges of operating infrastructure for machine learning. We’ll discuss the interface between long term maintenance of AI services with new product development.
Keep in mind:
- We'll be covering a wide range of topics from ecosystems, to funding technology teams. Machine learning is a subject of discussion, but we'll have no time to discuss technical details.
Desired Outcomes:
- A common understanding on what investments into AI will cost us
- Alignment on which AIs to invest in next
Questions to answer during this session
Question | Significance: Why is this question important? What is blocked by it remaining unanswered? |
What is an ecosystem? What’s a technology ecosystem? What makes an ecosystem healthy? | We talk about our “technology ecosystem” but does anyone really understand what an ecosystem is, how they operate, and what their constraints are? This question is important so that we can develop a common language and a common understanding of what technical ecosystem health looks like. |
Where has ML been used within the Wikimedia ecosystem? What are some successes we can be inspired by? What kinds of predictions/assessments/rankings do we want to have access to next? | Machine learning is a relatively new technology. Most people don’t understand what it is or what it can do for them. Through discussing the impacts that ML has already had, participants will gain a grasp of what ML has to offer and why it may be worth substantial investment of time and resources. Examples include simple classifiers(ORES), similarity indexes (Elastic search), and the merging of the two (LTR). Is the next step general recommender infrastructure? Image processing? Knowledge integrity? What do we need to do in the next 5 years. |
What does ML cost? What kind of time and resources do we need to make ML sustainable? | ML might seem like magic, but it’s definitely not free. ORES and the Scoring Platform team are an example of what it takes to invest in ML infrastructure. Knowing what it costs to maintain an ML service can help us know how to plan our investments wisely. It can also help us avoid under-investing and thus creating weak foundations on which to build. |
How do we integrate automated assessments in to the wiki interface? What concerns present themselves when machines begin to encroach on subjective judgement? | Automated analysis isn’t particularly useful on its own; it’s a tool that is designed to make life easier for the wiki communities. In order to achieve this the outputs of these tools need to be meaningful, and need to be embedded in human-machine processes that our users will act out. How do we fit machines into current workflows or use them to enable new workflows? How do we deal with the issues that will inevitably arise from having a machine take on roles that were once purely human? These are questions we must must answer in order to proceed with augmented product development. |
Facilitator and Scribe notes
https://docs.google.com/document/d/1Q7FvPUw6S1SNLkAbPnwKsDKyKa4ggR4BHhSL6VQJw3Y
Facilitator reminders
Resources:
- https://phabricator.wikimedia.org/tag/artificial-intelligence/
- What is an ecosystem?
- Where has/should ML been/be used within the Wikimedia ecosystem?
- What does ML cost?
- How do we integrate automated assessments in human-lead processes?
Session Structure
- Brief keynote by @Halfak (Ecosystems, AIs, ORES, and investments in the Technology dept.)
- Break out groups to tackle questions
- What is an ecosystem? What’s a technology ecosystem? What makes an ecosystem healthy?
- Where has ML been used within the Wikimedia ecosystem? What next?
- What does ML cost? What kind of time and resources do we need to make ML sustainable?
- How do we integrate automated assessments in to the wiki processes?
Session Leaders please:
- Add more details to this task description.
- Coordinate any pre-event discussions (here on Phab, IRC, email, hangout, etc).
- Outline the plan for discussing this topic at the event.
- Optionally, include what it will not try to solve.
- Update this task with summaries of any pre-event discussions.
- Include ways for people not attending to be involved in discussions before the event and afterwards.
Post-event Summary:
- ...
Action items:
- ...