Given the concerns around licensing of data models raised on T333856: Cloud VPS open exception request, let's discuss how best to support and enable work in this space in support of the Wikimedia movement on Wikimedia Cloud Service Offerings. I believe it is in the health and interest of the Wikimedia movement to support this emerging field, and to utilize our existing platforms and resources, namely WMCS, to do so.
However, some questions remain:
Licensing
- Given the lack of clear licensing and/or incompatible with OSI licensing, what can users run on WMCS? Is any work being done in the data community to create OSI compatible models? Can something like The Pile, which is free with no other apparent license, be used on WMCS?
- According to @Isaac, T333856#8805764, "a model has at least three separate pieces that are often treated independently: the final model artifact (a bunch of numbers essentially), the code used to train the model, and the data that was fed into the model". Interpreting this, what requirements are imposed by WMCS upon each piece of the model? Must each piece used also be OSI-compatible? If not, which pieces must be? Is this scenario similar to utilizing non-free hardware and software to create an image which is then openly licensed?
- For previous discussion on licenses in the spirit of, but not TOU compliant on WMCS; T152581: Expand the Toolforge definition of "free license" to include FSF-approved and DFSG-compatible licenses
- As noted, CC-by-SA isn't OSI approved, despite being very much aligned with the Wikimedia movement (Note that it's not intended to be a software license)
- What requirements does WMCS impose on non-code objects stored in WMCS? Are models code? If not, then what requirements should we place on them?
Hardware Requirements
- How much disk, RAM, CPU might be needed? Can we meet those needs with our existing hardware?
- Are GPUs required? If so, how many? How would access be controlled?
Non-blocking questions
- What projects exist that wish to explore these fields? What goals / outcomes do they have?
- Are multiple independent projects needed, created on request by any party? Or could the collective work be consolidated into a few primary projects?
The goal of this ticket is to discuss and collect feedback on the listed questions. In addition to update wikitech, WMCS policies, etc as required in accordance with any decisions made.