Page MenuHomePhabricator

Develop and write the AI Strategy Brief FY26-28
Open, HighPublic

Description

GoalsWrite an AI strategy paper to set WMF's direction for AI work in the coming few years.

Output: a strategy brief focused on supporting editors with AI (rough six pages long excluding the appendix)

Open Questions: In this section, we will post our open questions. If you are following this task and you have an answer for them, please let us know by leaving a comment in the task and referring to the question number.
We currently do not have an open question.

Details

Due Date
Wed, Feb 5, 12:00 AM
Other Assignee
calbon

Event Timeline

leila set Due Date to Sep 29 2023, 12:00 AM.
leila triaged this task as Medium priority.Jul 10 2023, 6:49 PM

@leila: Hi, the Due Date set for this open task passed a while ago.
Could you please either update or reset the Due Date (by clicking Edit Task), or set the status of this task to resolved in case this task is done? Thanks!

leila changed Due Date from Sep 29 2023, 12:00 AM to Apr 30 2024, 12:00 AM.Feb 28 2024, 2:14 PM

Preliminary meeting: Marshall Miller, Chris Albon, Miriam Redi, Xiao Xiao and I met on May 16th to identify areas for improvement in collaborations between Research, ML and Product teams on the AI front. Marshall, Chris and I met separately to discuss possible solutions for some of the challenges identified.

leila raised the priority of this task from Medium to High.Jul 18 2024, 10:41 PM
leila updated Other Assignee, added: calbon.
leila changed Due Date from Apr 30 2024, 12:00 AM to Aug 30 2024, 12:00 AM.
leila added subscribers: Miriam, XiaoXiao-WMF.

Update:

Chris and I have started working more intensively on this (as part of a larger effort to bring more clarity about what WMF Product and Technology Strategic Pillars mean in practice for specific facets, including AI). We expect our collaboration to continue in the coming month. We have a deadline of (currently) August 20th to deliver a document to the organization that will will communicate more clearly

The August 30th deadline I set for the task is very tentative. We have an August 20th deadline and I have a sense that work will not be entirely done at that point so I left 10 days beyond that as padding. We will update this task with an updated due date if this changes.

Update:

  • Chris and I met with part of the leadership team and gathered some of the information we needed from that group in order to be able to start working on the AI paper that will help define the organization's direction for AI work.
  • We are going to work on a draft in the week of July 22nd with a goal of having something specific enough to be able to share for input from a few people.
leila updated the task description. (Show Details)
leila changed Due Date from Aug 30 2024, 12:00 AM to Aug 20 2024, 12:00 AM.

a comment for language support

would be great to understand, if - we want to cover as many languages as possible with "pretty good" user experience, or we want to cover a minimal set of languages with "very very good" user experience, with the 90% in mind

the implication is both from model training (choice of framework, data availability) and model evaluation perspective - later maintenance as well

Good point, Xiao (and something for us to consider commenting on in the final doc from the lens of this particular paper which is AI). The current open question about the min number of languages needed can help with us thinking about this topic more seriously. If min is very large vs relatively small, we may take different positions.

Some of my past writing around this space:

And a few open questions from me:

  • Language is raised so text-based models are clearly part of the thinking. I would assume there's some discussion of image models as well as image-text models. What about video and audio? Are they considered core technologies to invest in?
  • Are we focusing specifically on ML/AI models that we may or may not host or thinking more broadly about all the components that are related -- e.g., datasets we're releasing, feedback mechanisms for models, benchmarks, etc.

What is the minimum number of languages that we must support in order to reach 90% of the world's population with Wikipedia (encyclopedic) content? What are the languages in that list?

Some of my thinking:

  • Not all tasks are needed by all languages but there are a few that are core and we should push heavily for greater multilinguality. I would be explicit about which core functionality should meet this threshold though. Machine translation is clearly one of these to me as all languages should be able to benefit from being able to read content in other languages and receive edit assistance via translation. Obviously we aren't there yet as many languages have almost no digital text presence outside of Wikipedia that could be used to train a model but there's been impressive improvements in coverage the last few years and it remains important to continue to invest in these technologies and have capacity to support improvements. Identifying the other ones is harder -- I'd argue that OCR is a basic tool that should be available to all language communities so that they can more easily do the important work of digitizing more of their knowledge. Tokenization too to support basic Search functionality and maybe named-entity recognition? OCR and tokenization are less exciting perhaps than LLMs but if we've learned anything it's that LLMs often need supports like RAG to actually be useful as tools and that requires good search.
  • I appreciate the approach taken by the Language team for Content Translation where they initially had a task where language support was poor and existing models were either too resource-intensive to host or not open-source. But they built the general framework for supporting any 3rd-party model, established some ground rules around openness of data around the translations, put proxies in place to protect user privacy, and then worked to get the partnerships to provide editors with various options. And then when the technologies had reached a state where there were very good open-source options that we could self-host, they put that in place to expand how they could use these translation technologies. I suspect if we're approaching the question of LLMs for various tasks, this is a good framework to consider that doesn't force us or any language community to choose one model and allows us to provide many of the benefits while we learn how to properly scaffold these models.
  • For other task-specific language models technologies, I think the important thing is having the capacity to provide support if a language community requests it. For example, the article description model is available in 25 languages currently and more would be better but it's probably not a priority for that model to extend to 200+ languages. The important thing though is that model is designed so it can be easily extended: we have Wikidata descriptions and Wikipedia text available for many more languages so extending the model largely just requires choosing a base model with greater language support. Another example: many small language editions likely do not need vandalism detection models but importantly the way the revert-risk models were designed, there's always a language-agnostic option available and the multilingual model has capacity for more languages and is still small enough that the mBERT base model could likely be switched out for a bigger model with more language support if it was available.

Some experiences are perhaps farther away from "encyclopedic experience" and while they may involve learning, they may not be things //we/ (WMF)/ want to prioritize building for.

I would add that identifying a potential AI-assisted reading experience as important does not mean that we ourselves need to build the front-end for it ourselves. And predicting what these experiences will be is likely quite difficult. But we can invest in making Wikimedia more friendly to folks wishing to build these experiences while nudging on the aspects that we believe are important. I would focus on defining those aspects that we see as core to the reading experience (certainly attribution to the whole article, probably citations / citation-needed tags, maybe an edit button or other calls-to-actions, others?) and core back-end technologies that Wikimedia can provide to enable these high-quality reading experiences (natural-language search, perhaps better back-end for citations so they're easier to structure and package up with the content, etc.)

One clear caveat to me where we should be building ourselves: I think sustainable futures of the Wikimedia projects require that readers (and editors) need to be able to easily access content translated from other languages on-the-fly. There is no way every language community (and very likely not even English Wikipedia) can create and maintain the whole extent of knowledge on Wikipedia. Translation for readers has to fill in many of these gaps so language communities can focus on the content that is more local to their readers, editors, and sources. Outside translation services reduce the privacy guarantee we give to readers and rely on the reader discovering the content in another language that they're less fluent in to begin with. I'm very excited about the MinT for readers initiative and we need to think about what other big investments we need to be making to enable this to be more widespread and for the translated content to be more discoverable from inside and outside of Wikipedia. I wrote about this more in my thoughts around multi-generational sustainability: https://docs.google.com/document/d/15RYUXU6uxlgvF3LIBW8A3o3ih2j3zuWqZjrFq3o21hI/edit#heading=h.qonfksz68hf4

@Isaac Thank you for this detailed response. We read all of it, it informed our thinking, and we have captured (in preliminary form) some parts of it. Once you see the draft you will see the traces of it (sorry, just b/c of the time limitations I won't elaborate more here and now. bear with me.:)

All: we added a third question in the task description. If you have thoughts about it, let us know. Thank you!

@leila thanks for the acknowledgment. One question about the new open question: when you say I will argue that investing in models that use NLP is something we should avoid for the coming few years unless we have very strong reasons to do so., what do you mean by "investing"? Like what sorts of activities are you suggesting that we avoid?

I would expect that if we include that kind of statement, each team reflects on their end and see what it means to them. For Research this may mean that we don't invest in developing models using NLP (unless we have strong reasons for). There may be other things you or others come up with that may not make sense for us to do anymore given this focus.

(And to confirm: that was an example to help with the thinking. It's not a decision.:)

(And to confirm: that was an example to help with the thinking. It's not a decision.:)

Ahh okay because that would have been a huge shift in direction so I was pretty confused. Something like half of our Research model development at the moment is NLP (e.g., Diego's reference-need and peacock work, Martin's text simplification and add-a-link improvements), we're engaging more in their conferences (ACL/EMNLP), and ML Platform is also heavily invested at the moment in hosting LLMs.

Update:

We have "a first bad draft" of prioritized strategy ready for your feedback. If you want to help us improve our thinking, please read the guidance for feedback and proceed. We accept feedback until August 12, 11:59 AM UTC. Thank you in advance if you spend time on it.

And to be clear: the invitation above is for everyone subscribed to this task, not just the Research team members. (A couple of folks messaged privately and asked if it's okay if they share thoughts. Thanks for checking, and of course it is.:)

Update: thanks to those of you who provided feedback. We also received relatively extensive feedback about key open questions we had during Wikimania. We are now closing the feedback and have started an overhaul to improve the document towards V2.

leila renamed this task from Align on high level direction of ML/AI development in the coming years to Develop and write the AI Strategy Brief FY26-28.Sep 13 2024, 11:08 PM
leila updated the task description. (Show Details)

Some updates on this task:

  • I renamed the task to better describe what the work in this task actually is.
  • We presented AI Strategy Brief (updated name, from "6-pager") to WMF leadership on August 20th. We got good feedback during that presentation. We further presented it to the Research team and gathered the team's feedback. We did a few other internal presentations and feedback gathering exercises.
  • We received feedback from Selena (CPTO) and Marshall (Senior Director of Product) that they want us to focus the current brief on Editors. We will write an AI strategy for readers later in the year (likely in March-June 2025).
  • For the coming two weeks: Chris and I will be working hard adapting the current brief to the updated and reduced (in a good way:) scope. We are confident that we can make this work though we know it will be quite a bit of work ahead of us. We did an extensive coordination today and we have a plan of action.
leila changed Due Date from Aug 20 2024, 12:00 AM to Dec 20 2024, 12:00 AM.Dec 13 2024, 11:19 PM
leila changed Due Date from Dec 20 2024, 12:00 AM to Wed, Feb 5, 12:00 AM.Sat, Feb 8, 1:20 AM

The strategy is at a stable place. There are some open comments in the doc that Chris and I intend to review on February 10th and address/resolve. We will then add an acknowledgement section and call it stable. We have worked with the department/org on a communication plan and we will follow that (which includes sharing the doc on meta or some other public place). Thanks to all of you who have been on this task with us so far. We are close to be able to share something stable. :)