Page MenuHomePhabricator

What do we release and support? / What is Wikibase Suite?
Closed, ResolvedPublicSpike

Description

Context:
It is strange that we require users to clone the pipeline to then copy the "example" directory in order to run the Wikibase Suite "example".

Do we want to still consider this the "example" or is it correct to call it a "distribution" or better yet "Reference Implementation" .

There are not clear instructions, and potentially adequate setup, in our example configuration for moving it into production (https configuration and/or a proxy server?).

Proposal: Move our distributions (the "example directory") to its own home (github? tbd), and clearly version each release including a semantic versioning scheme which tracks in some informative way against MediaWiki and/or Wikibase versions. This is the most critical step we can make to clarify and streamline our installation, and maintenance/upgrade process. This will also help to set things up to better focus and "tidy up" the build pipeline repository" itself, allowing us to get focused and produce better and more reliable distributions. (See T343349)

Acceptance Criteria:

  • Consider this proposal, and any others as a team, and come to decisions.
  • We know how to test and have tested the distribution on an actual Internet-facing VPS and identified any actual or potential issues in doing so
  • How we communicate to the community about it and its implications for our users should be part of the discussion
  • From the topics to tackle and decide:
    • which ones need to be tackled now and create tasks for completing that work
    • which ones we want to get to at a later point in time and create tasks for these too
    • which ones we want to let go

Goal: Create a clear and more formal Wikibase Suite Distribution story.

Some things we may want to do:

  • Deprecate the "base" Wikibase image in favour of only distributing wikibase-bundle.
  • Change example/docker-compose.yml to run all services, and create a new optional docker-compose.minimal.yml which inherets service definition from example/docker-compose.yml, deprecating example/docker-compose.extra.yml
  • Consider moving extra-install.sh into Docker/Wikibase
  • 🚨 Make Github versioned release packages of what currently goes in the example directory
  • 🚨 Consider moving these releases to a new Github repo called wikibase-suite (i.e. wikibase-suite/releases)

Goal: We want to make the upgrade process easier for users. As part of that we want to improve clarity in configuration of our distributions (Suite and individual Docker images), as well as code readability and maintainability.

Some things we may want to do:

  • Rework the configuration so that it's on one place and editable by users
  • Clarify and normalize use of default.env, template.env, local.env, and .env in both the distribution and build pipeline/test contexts to enable making env var naming consistent.
  • ⚠️ If there are two Environment Variable names in two places that are set to the same value anywhere in the code, and have the same purpose, decide on one naming and apply that throughout
  • 🚨 Rename all *_SCHEME_AND_HOST env vars to be *_URL (and *_SERVER when protocol is not expected to be at the beginning):
    • Remove hard coded “http” anywhere but in environment files in preparation for HTTPS options
  • 🚨 Clarify *_PUBLIC_URL vs internal *_URL and carefully consider naming:
    • If a full server URL or server name+port is particular to the Docker network, vs what must be public. In the distribution this should be further clarified.
    • Consider optional *_PUBLIC_URLs and standard *_URLs (e.g. WIKIBASE_URL and WIKIBASE_PUBLIC_URL, with only WIKIBASE_URL required)
  • 🚨 Rename QS_* to QUICKSTATEMENTS_* to match naming used in Docker files, and reduce confusion with WDQS_* variables
  • 🚨 Rename WB_PROPERTY_NAMESPACE to QUICKSTATEMENTS_PROPERTY_NAMESPACE, etc. These variables are specific to Quickstatements which is only a Wikibase service (vs Elastic Search, etc).
  • 🚨 Make all MW_WG_* variables to be simply MW_*, unless there is a compelling case for keeping the WG
  • 🚨 Create an “Upgrading from X to Y version” in example/README.md and relavent Docker/*/README.mds detailing the environment variable name changes
  • 🚨 Note the change of some public-facing env var names under a “BREAKING CHANGE” note in CHANGELOG for this version, either pointing to the related READMEs for details or directly documenting there

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptAug 7 2023, 9:26 AM
lojo_wmde renamed this task from What is the best way to distribute wikibase suite to What is the best way to distribute Wikibase Suite?.Aug 7 2023, 9:27 AM

Consider notes in the comment in the first comment on this ticket: https://phabricator.wikimedia.org/T343429

This comment was removed by lojo_wmde.
lojo_wmde updated the task description. (Show Details)
lojo_wmde updated the task description. (Show Details)

The questions this spike needs to answer:

  1. Do we release and support our Docker Compose example configuration as a valid basis for a production configuration?
  2. Do we release and support in some way official upgrades to our Docker Compose example configuration?
  3. Do we release and support our Docker Images to be utilised outside of our provided "example" configuration?
  4. Do we release and support our Docker Images to be utilised independent of each other? (e.g. QuickStatements or WDQS Images used against a non-Docker Wikibase installation, or vice versa)
  5. Do we release and support our Docker Images to work only together all at the same version, or can Images be upgraded independently of each other?
  6. Do we support and/or recommend "local" testing using the example Wikibase Suite configuration to try it out?
  7. Do we release a "vanilla" MediaWiki + Wikibase Image?
  8. Do we provide a reverse proxy service and configuration in the Example?
  9. Do we maintain a "basic" and "extra" Docker Compose configuration in our Example, or only one configuration with comments and notes on how to selectively disable services not wanted or used?

After coming to clear "for now" answers on each of the above I think we will have a more clear sense for focusing and ordering subsequent work, as well as have clear what we need to complete the Versioning scheme discussion and work.

lojo_wmde renamed this task from What is the best way to distribute Wikibase Suite? to What do we release and support? / What is Wikibase Suite?.Feb 20 2024, 10:58 AM
topiccurrentPM first gut checkthinking
1. docker compose as valid production basisnyrequired to config. assumption: if we provide it, people will try to use it
2. official docker compose upgradesny butwe should make it clear when improvements happen. not sure what best way to do this is.
3. Docker images Oo Example Confign/a?don't understand this. are we talking about people being able to use their own config ? are there a range of options here? ultimately configurability is a core element that sets us apart from cloud so we are generally interested in opportunities to make good there (particular resource and priorities aside) .
4. docker images used independently of ea othernnvery appealing eventually but not a proven need yet. and supporting this would increase complexity away from core offering.
5. docker images up-gradable independentlynnagain just seems out of scope. also feels like an edge case but either way i think loses re increasing complexity.
6. local testing and confignnsame thinking. see entry in product decision sheet
7. (new) do we ship a vanilla/basic version of just wikibase + mediawikiynthe wikibase extension isn't particularly useful on its own. if people want to adapt their install we should provide this through config of base offering instead of increasing number of products we support. i say this assuming it's less work. am i wrong?

i see an underlying question of 'how much do we want to treat individual components as succinct, free-standing products?' and in general i would say we don't. we're trying right now to slim down to do the most essential version of our product well. when we learn about clear needs, we can address them with intention. until then i dont want to spread us thin supporting a really wide set of deliverables.

core offering: docker package which contains all the essential features* required to run a production instance of wikibase focused on LOD.

*this is a definition we need to pursue more clarity on through research, for the time being the working assumption is the current core offering (wikibase/wikibase-bundle) is self defining.

questions:

A. my understanding is wikibase/wikibase-bundle contains wdqs and qs and not just MW + extensions. is that right?
B. re continuing to distribute individual docker images... can we see any stats on dockerhub or github re usage of individual images? forks on gh?

#3 : Implicitly the Docker Images can be used outside of our Example Docker Compose configuration. They are currently documented in a way that doesn't assume they are used together or only in our example configuration. We have the opportunity to make more clear something here one way or another. By "support" I mean do we provide documentation and support of configurations not based upon our own example.

#4 : It is something we should keep an eye out for. Currently our images are not particularly coupled to each other, if at all. They are brought together through a (hopefully) clear configuration API via environment variables. This decoupling is healthy technically, and likely something we'll want to continue to maintain regardless. However, it is good to know that we don't need to resist making reference to the other Images in the stack in the related documentation/READMEs for each image.

#5 : This is the one I started talking about on MM. I do actually think in terms of operations it's important to consider being able to upgrade each service the stack independently. It is a bit laborious and potentially misleading, both for our release process and for operations, to be releasing all images every time even if when there are no changes to the underlying services. @jon_amar-WMDE let's sync on this one more to make sure we're seeing each other.

#6 : Our current example configuration is, debatably, somewhat oriented around local testing. If we no longer support that and gear the example towards a production configuration "out-of-the-box", we can make the whole example more focused and clear. I'm for this. In the end I think I've framed this unnecessarily as "do we support local testing". We simply don't focus on that scenario, and when someone is testing locally they are implicitly testing-out a production configuration. We may put a comment or a mention in the docs about the 1-2 things to pay attention to when the services are not behind a reverse-proxy (locally or on production).

#7 : I think on the dev side we'd all be delighted to deprecate the Wikibase Base image. It would help focus some things in the code and documentation, and example, and overall be a win in my book. Is this in the "Product Decision Sheet" as well? Can we go ahead with deprecating I?

#8 : @jon_amar-WMDE please add your thoughts (maybe just as an edit above?) re. the reverse proxy. DM me on MM if you have questions or want a walkthrough of the implications technically on that.


Answers to your questions:

A. Yes, Wikibase Bundle includes all extensions needed to work in concert with the provided additional services (WDQS, QuickStatements, and Elasticsearch). The base image doesn't.

B. In the end we distribute individual Docker Images for each service for architectural reasons. They are ran independently implicitly, and on purpose. However, I don't think our stats would give us clear information as to whether an image is being used independent without the other images at the same time. For instance, if there are more downloads of the WikibaseBundle image I that wouldn't on its own be clear information as to whether it was being used independently.

Another question:

  1. Is the Example an example or a Reference Implementation, Suggested Configuration, Configuration Template, or something else? Calling it Example implies to me that it is not to be used for anything other than to create your own version. I think that isn't actually entirely incorrect, but we have been increasingly talking about Example as something that is more fixed and upgradable.

first off i want to make clear that i offered my notes as conversation starters. if devs come to feel differently in your initial alignment than my "first gut checks" that's totally cool. i suspect mostly if you do it will be because youre operating with better information than me here. regardless, we'll sort it out when we align all together.

to that end, im offering another pass in response to loren

  1. "Do we release and support our Docker Images to be utilised outside of our provided "example" configuration?"

Implicitly the Docker Images can be used outside of our Example Docker Compose configuration

thanks for the clarification. after you showing me around the config today that's a lot more clear to me.

By "support" I mean do we provide documentation and support of configurations not based upon our own example.

for now i would avoid this. the intent of the example is for it to be a sensible starting point. im weary of branching support paths.

that said, as customization is a ksp i would like to understand more about what sort of breadth of departure we might encounter from our example dir. e.g. people using a different collection of docker images, different extentions, different variables which they need available to set up their env?

  1. sounds good
  1. i think i misunderstood the original question, i was thinking of upgrading from an end user perspective. if it saves our team's time to be able to upgrade and ship services separately then that seems fine. would be interested to hear if there are pros and cons here or if it's a simple win.
  1. after getting a walkthrough of the example dir today (thanks for that!) the local and "tire kicking" (aka not production) orientation of current example dir is a lot more clear to me. ultimately im trying to orient us towards getting people onto production. based on our convo that sounds like a more normal setup for these example configs. if so then perfect. let's re-orient to that as our new default and then consider separately decisions around expanding support to other use-cases. e.g. reverse proxy support (now added to our product decision sheet as pending)
  1. deprecating the Wikibase Base image. it's in the product decisions sheet as pending. i would love to understand how the overlaps w the reality that you can grab single images and use them on their own already. like how much do we offer a sort of hacky or more expert path here by default. that said, i will put my cards on the table which is that im still feeling bullish on deprecating this. i wouldnt remove any code until we all sync and make it official but if it helps to assume it will go in discussions (esp if that is where you all land as being your preference) then i think it's a pretty safe bet.
  1. i added it as pending to the product decision sheet. in addition to notes in #6, im thinking we should document a tip here to lead people in the right direction but in terms of approaching this as a feature i would need to get more tangible on technical options, costs, and robustness here. i would see it in the realm of config overall and something we could approach on it's own merits in that scope.
  1. these are great questions. if you have time i would love to get a list of defintions from the dev team on this... as i think these are the sorts of categories that can mean different things to different people.

riffing for a minute, holding loosely to this: i have been thinking of this as a sort of copy paste, change the variables to suit your environment, and put it on production offering. like when i do a wordpress install, im expected to make my edits to the main config file. anything else is more advanced usage and right now while there is minimum bar of expertise required we're also not expecting people to be able to DIY this stuff as a baseline skill set. so it should be available to use as-is (required env changes aside).

would be very interested to hear how that lands.

#1: docker compose as valid production basis?
surely from my little experience I say "why not?" since it sounds like something very useful that would lower the entry barrier for users not only to test the product locally but only to install it on production in a more streamlined manner

Open questions:

  • have users talked or asked about this ever?
  • how expensive would it be to make this happen?
  • how much impact would it really bring?

#2: official upgrades to our docker compose example configuration?
I only see this beneficial if we were to put such a big focus on the example configuration that we foresee it changing much more often than the rest of the code. Is that so?

Open questions:

  • if so, how much is the impact on users we can foresee ?

#3: users can use docker images outside our example config
I ma not 100% sure I understand this one fully so bare with me and keep that in mind. I would go an say that supporting "any" configuration would be a deep rabbit hole where I would see a difficult balance between the work that it can involve and the impact on our users.

Open questions:

  • have users ever mention this? (I would like @Tarrow to answer this if he remembers from his time :) )

#4: release and support docker images to be used independently of each other
This makes a lot of sense in my eyes for our current extensions knowing that we don't have any control on their development. This way we could ship newer versions of QS or WDQS without needing to provide a whole release where the rest of the code has barely changed. This would make the update process easier for users - as in they would not need to do a full update, they could just replace the old QS or WDQS images with the new ones. We, of course, need to provide guarantees and documentation on which images work with which wbsuite version(s).

This may get very useful when we start releasing user features catering for specific types of users each time. The users not interested in feature X could just ignore the updates on the X feature image. To be super concrete, not everyone is interested in merging data so if we release a merging data feature only those will need to download it and update it and such.

Also this is a core aspect that I am thinking of when we start creating user facing features: we want (this we have already discussed) to give the opt-out option - this mechanism would be great for that. I would also go for building a mechanism (this one inside our core bundle) where users could not only add the features that we provide but we would also provide an API so that they could make their own enhancements in a more encapsulated/easy way.

Also, thinking of the development cycle this would make our life easier since we could release updates to user feature images again without needing to release the whole product, hence, sparing the users going through the update process.

Open questions:

  • how easy/difficult can be to revert this process? say we decide one thing now and we want to change our decision later on?

#5. upgrade docker images independently from each other

already answered above, under 4

#6. support local testing using the example config
From what we know many users first try the product out by themselves in their own local machines before even bringing the idea to their teams or superiors. We also know (not dure now whether this comes from new or old surveys) that many desist due to both the difficulty of the installing process as well as the fact that they cannot fully test it on their local machines. Based on this I would say yes to this and put some of our time on it.

Open questions:

  • to what extent do users want to test it?
  • what about providing some example content that they can play around with, like a basic set of items, properties, etc.

#7. release a vanilla wikibase + mediawiki
yes. This would the the wbSuite core, although I would add the mechanism I mentioned on #4 to add/remove user features. Then the extensions QS and WDQS as well as the user features user could opt out of if wished. That would make a much slimmer package, easier to handle, manage and maintain.

#8: provide a reverse proxy service and configuration in the Example?
Without this the QS cannot be tested locally.

#9: with what purpose was the example directory really created?
I happily ping @Tarrow to enlighten us here

Thanks @darthmon_wmde!

A few comments back / clarifications:

#6: Agreed on this one. Configuring and running WBS on one's own machine before transcribing that configuration and trying to run it on a server is a likely first-mile story for many if not most users. It is my understanding from @jon_amar-WMDE that they currently are leaning towards us "not supporting" a local installation. I personally think as we come to more mutual understanding on the things that we're talking about that this position will shift. The distinction of "local" vs "server" with regards to a Docker setup is not necessarily describing anything of substance other than the possibility of using the loopback address (127.0.0.1) or localhost.

#7: Clarification on this one: I think here what was being referred to is our current Wikibase Base image. This image is not currently configurable to support the auxiliary services you mention, but Wikibase Bundle is. The Bundle image probably wants refinement to make it more configurable and clear how to disable some services (e.g. Elasticsearch in particular), but as-is the question here was meant to ask whether there is any case for keeping the current Wikibase Base image. The presumption is that there isn't, and this is currently a "draft" decision on the Product Decision Google Sheet.

#8: The reverse proxy is a service that in some form would need to exists above the WBS services in any configuration. The product feature here is about whether we provide an optional "ready to go" reverse proxy in our Docker Compose services for WBS. After having recently configured things that way and finding it fairly non-obtrusive and highly functional, I do tend to think we should add this to our stack.

It is also correct that without a reverse proxy and routed URLs that QuickStatements will not Auth, but this is somewhat unnecessarily tied to being a "local" configuration requirement. It is just that in a Production configuration we're assuming there will already be a reverse proxy of some sort routing public addresses to the services which can be used for the QuickStatements OAuth configuration.

#9: I have researched this and found the conversations here on Phabricator that led to its creation. Where is started seemed clearly as simply an "example" of getting everything working together, and all the current disclaimers about it not being "ready for production" came along with that. It is clear to me that it is and will be used as a starting place for production configurations, and I think that is correct, and we should be supporting it. Most of the current considerations on the example are for me surrounding how to make that configuration a more appropriate and clear and better documented starting place for a production configuration. But before that work can be completed we need alignment and understanding across the team of this intention and the tactics for getting there.

  1. Do we release and support our Docker Compose example configuration as a valid basis for a production configuration?

Yes, this is what I would expect when planning to host a wikibase.

  1. Do we release and support in some way official upgrades to our Docker Compose example configuration?

Technically not trivial, but should be the goal in order to allow users to follow along our developments with as little maintenance as possible while keeping their customization.

  1. Do we release and support our Docker Images to be utilized outside of our provided "example" configuration?

I think we have to. Basically any customization of the example config will be that.

  1. Do we release and support our Docker Images to be utilized independent of each other? (e.g. QuickStatements or WDQS Images used against a non-Docker Wikibase installation, or vice versa)

We are close to having the ENV var interface properly documented in image specific READMEs. I think this is good and should be kept and improved. I think it is also good practice to not tie the containers together in any way. So technically it should be a goal to make the containers independent of each other. Still, I would not officially support or advertise this in order to keep it simple for us now. Is this a requirement our users have? Is cloud still using one of our containers?

  1. Do we release and support our Docker Images to work only together all at the same version, or can Images be upgraded independently of each other?

We can only test so many combinations. Therefore I'd say we only support one set of images per release. I see the point to optimize and not wanting to rebuild and redistribute unchanged containers. Happy to do some analysis there, not urgent imho. I think redownloading all images on release is acceptable.

  1. Do we support and/or recommend "local" testing using the example Wikibase Suite configuration to try it out?

I would like to target production setups only. I think it is ok, if a publicly routable FQDN is a requirement. Good documentation can still guide users to tests on a local machine.

  1. Do we release a "vanilla" MediaWiki + Wikibase Image?

I personally cannot see why we should do that. What is it used for? Who uses that? It might be a technical vehicle to allow extension set customization, but I think we are not there yet.

  1. Do we provide a reverse proxy service and configuration in the Example?

I like it, as it makes a "real" production set up more easy. See 1.

  1. Do we maintain a "basic" and "extra" Docker Compose configuration in our Example, or only one configuration with comments and notes on how to selectively disable services not wanted or used?

I'd like to combine it in order so reduce choices users have to make when getting started. AFAIK wikibase without "extra" is not usable anyway and the separation is quite academic.

Update: So far team discussion have been focused under this heading have been focused on same-paging about our product and the related use journeys described by the "release", "adoption", and "operation" categories coming from @jon_amar-WMDE and I's work on the roadmap. From that perspective we're been reviewing what we currently release, and focused on identify what needs fixed/improved in the setup as-is, before zooming-out to these larger questions.

lojo_wmde changed the task status from Open to In Progress.Mar 5 2024, 12:47 PM
So technically it should be a goal to make the containers independent of each other. Still, I would not officially support or advertise this in order to keep it simple for us now. Is this a requirement our users have?

From a user perspective, I may wish to use low-cost (or free-tier) hosting for peripheral elements like QS, Cradle, WDQS UI and perhaps even WDQS and its updater, as these are likely to be used only intermittently, and with an anticipated low volume of edits. Think Cloud Run, App Service, etc. This also lowers the attack surface on our dedicated servers. As such it'd be a "nice to have" for these to [continue to] be minimally linked and usable between datacenters, without a commitment to fully support with examples, etc.

WDQS and updater share a store and are heavier, so might need to be co-located, but I could (for example) see them being on a dedicated server in Texas to make good use of available SSD and RAM (it's also right next to the codfw datacenter for federated queries), while WDQS UI and an instance of the schema validation tool are on GCP Iowa on App Service and wikibase repo/UI/ES is on a server in Quebec (likely running from Git as part of a wiki farm like Wikipedia has traditionally, rather than a Docker setup).

For others the Wikibase is the whole setup, maybe even with WDQS as the main application, and in that case the containerised suite with a single co-located install is ideal.

We have had a few discussions about these - we have looped the whole cross team in them and we have made the relevant product decisions.
Those decisions are in our internal Product Decisions Sheet and the notes of that discussion are in our internal Meeting Notes Document.

darthmon_wmde claimed this task.
darthmon_wmde moved this task from Doing to Done on the Wikibase Suite Team (Sprint-∞) board.