Page MenuHomePhabricator

[Idea] Introduce a way to empty a Wikibase whilst preserving it
Open, Needs TriagePublic

Description

Use case: People sign up on cloud to explore Wikibase and experiment with how to model their data, and evaluate whether Wikibase(.cloud) is the solution for them. This experimentation can result in needing a fresh start to re-build the Wikibase with the newly gained experience to do it properly this time.
Secondary, even without the explicit intent of experimentation, people might end up making errors in their Wikibase that are too complicated, large scale etc. to be corrected, and result in needing a fresh start as well.

Idea: We've had multiple requests from users to "reset" their Wikibase. Currently the work-around to do this is it to delete the Wikibase and start over with a new one, but the user would lose the domain they used. Therefore, it was suggested to give the users a feature that allows them to empty their Wikibase of all entities, whilst preserving the Wikibase domain. This does not include selective removal, only emptying the entire thing.

Concern: Fully deleting data might remove created links in the future Linked Open Data web, one person's experimental data might be another's foundational data. Furthermore, where does the ability to mass delete in such a way end?

Solutions to consider:

  1. Explicitly making the decision to fully delete the data, potentially severing links, based on the knowledge that this data is fully experimental and the deletion was a very conscious decision made by the owner of the Wikibase (potentially add some education towards the user as to what impact this might have, and a limit to when this is possible, e.g. Wikibase age, size etc.)
  2. Allow users to delete the data, but preserving it in history. This would result in users re-starting from e.g. Q50 instead of Q1, but would otherwise not affect the experience
  3. Don't introduce the feature, instead introduce a heads up to users when creating the Wikibase, that it's not possible to re-start. We'd recommend them making use of the limit of 6 Wikibases, so that they can have an experimental Wikibase and the "real deal" running in parallel
  4. Don't introduce the feature, instead allow users to create a separate Sandbox Wikibase during the first onboarding. This will allow people to play around as much as they want without it being part of the ecosystem just yet

AC:

  • Users can easily remove all entities of the Wikibase themselves
  • The feature should live in the platform UI
  • There should be a double-check to confirm "are you sure.."
  • The count of deleted entities should be trackable for internal analytics purposes
  • This feature is only available to the admin of the Wikibase

Event Timeline

I think this feature would be quite useful. I am currently experimenting on spinning up a wikibase instance based on "rules" from a git repo and really would like to be able to start over every now and then. This should be possible via an api call and it also should reset the id counters. For self-hosted wikibase instances there is the following guide: https://www.wikibase.consulting/deleting-all-wikibase-items-and-or-properties/. I have not tried it but it seems to indicate some demand.

In the refinement session we talked at length about this issue. I brought up that I saw a possible conflict in providing the reset ability with wikibase.cloud wiki's being full members of the Linked Open Data web.

If we allow people to wipe their whole wiki, without even trying to use the Wikibase tooling to show what happened it could be hard for people to feel comfortable linking to these Wikibases. I was proposing perhaps some alternative solutions such as allowing people to delete their entities but not reset the counters. This may inconvenience the Wiki owner since from that point they may be then next creating Q12432 rather than Q1 which could be jarring.

@Charlie_WMDE was making the point that if the user wants to delete this data then is probably experimental and therefore other people shouldn't be linked to it/relying on it anyway. She was pointing out that for the owner who wants their Wikibase to be reset this seems like an unfair burden to carry given that they just thought that were were experimenting and perhaps didn't realise that they were forever "losing" Q1 or P1 of that specific domain.

I then failed to go on to say that this might then be a user education issue and that it if we were to offer this feature as described we might want to make it clear to the wikibase owner how they might be inconveniencing the community if other people were relying on that data.

I also suggested, as we have discussed before, that perhaps we could implement a one time amnesty for people who really don't want to lose their domain but do want to do this kind of "hard" reset.

A few thoughts from my side,
Defining "Emptying" would be useful here, does this mean just entities? users and wikitext pages as well? Other stuff too? Which one of these is actually most useful for users?

We should also consider at least just allowing users to reuse their own domains, this is how things would be "outside of wikibase.cloud", folks would be free to setup a wikibase on a domain, trash it and start again.
Originally disallowing domain name reuse was mainly to stop other people taking over domains of other deleted wikibases. IE, User A makes foo.cloud, then deletes it, then User B makes it "hijacking" the name.
I agree stating why this may not be wanted in some cases makes sense, but also I can see why creating and trashing a wikibase 10 times in a day when you first set it up sounds fine, even with the data reuse topic flagged up.

Some middle ground between ideas would perhaps be to have a button, that after 1 week, or after 1 month, or maybe just right away warns the user with why they might not want to delete everything, OR why they might not want to reuse a domain, depending on emptying, or reusing would be decided as the path forward.

Generally I'd compare this to something like the github or wordpress platform.
Both platforms allow you to reuse domains (repositories or domains) within a user.
Github then has a rule that reads something like for users "Once you delete, that name can never be used again for security reasons." (similar to not allowing users to use each others domains)

Yeah, I think I agree in general that there's a careful balance to be struck.

To provide another external comparison though: npm - this doesn't
allow deletion after a certain (rather short) period of time. I would
suggest this is because people specifically depend on the names;
whereas in an ideal world git is meant to be a distributed store.

Also git will inherently warn you if you try to "pull" from a data
source that is not following a similar historical path to your data.
This versioning system isn't there for an LOD client like another
blazegraph instance pointing at at this query service so the
"dependency user" would just see any of their federated queries break.

Given that users can currently make and delete an unlimited number of
wikibases we should really be nudging them towards this if they want
to constantly create and then trash a wikibase rather than reuse.

tl;dr I think we should probably allow users to do this but we need to
make it clear to them why, in general, this isn't a great idea and is
sort of "selfish" if other people are (or they hope people will) be
depending on their data and a better alternative would be to create
some temporary test wikibases.

After my experience trying to learn how to make real use of Wikibase I find the need to destroy instances, specially to restart the items/properties numbering. Today I found I can't reuse a subdomain name of a destroyed instance so, now I think it would be helpful to blank/reset an instance. For example, in our current project we'll have a lot of testing to do.

So, I'd welcome a simple feature for the complete reset of an instance.

Thanks.