Page MenuHomePhabricator

Investigate options for auto-save feature
Closed, ResolvedPublic4 Estimated Story PointsSpike

Description

For the auto-save feature wish, engineers have previously discussed the work involved for this project. Review any investigations, discussions, patches that have happened around this wish.

Acceptance criteria

Provide information about any docs, discussions, patches around this wish so that we can better determine how to move forward with this project.

Event Timeline

HMonroy renamed this task from Epic: gather the auto-saving information to Epic: Review and gather the Auto-save feature wish information.Mar 16 2023, 5:46 PM
HMonroy set the point value for this task to 4.
HMonroy renamed this task from Epic: Review and gather the Auto-save feature wish information to [4 hour spike] Gather Auto-save feature information.Mar 16 2023, 5:57 PM
HMonroy added a project: Spike.
HMonroy updated the task description. (Show Details)
Restricted Application changed the subtype of this task from "Task" to "Spike". · View Herald TranscriptMar 16 2023, 5:57 PM

This wish has been submitted multiple times over the years: https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2016/Categories/Editing#Auto-saving_edits, https://meta.wikimedia.org/wiki/Community_Wishlist_Survey_2022/Editing/Autosave_edited_or_new_unpublished_article.

In 2012, an autosave feature for edit textareas in localStorage patch was started. localStorage size is very limited to about 5MB and can contain only strings. In 2017, @TheDJ abandoned this patch, stating "No progress on this for the past year, abandoning."

In 2014, a ticket was submitted for this auto-save feature and there was an attempt to add auto save using IndexedDB. In 2016, @TheDJ abandoned the patch with reason: "For anyone who has time (unlike /me)".

It has also been discussed that SessionStorage is lost when you switch to a new tab, so localStorage would be an improvement.

In 2022, CommTech opened an investigation to look into ApiStashEdit as a possible solution.

@TheDJ We appreciate any insight you can provide on this problem. The auto save has been a popular wish over the years, and we are currently trying to figure out if there is a viable solution. We are investigating complexity so that we can decide if we CommTech can move forward with this project. Thank you!!

Is the idea with using stashedit that it'd be something like the following?

  1. Every 3 seconds the current page text is stashed, and each stashing returns a texthash response. This is the existing behaviour. (It's 1 second while the summary field is focused.)
  2. The base revision wgCurRevisionId and the returned hash is stored in localStorage.
  3. When a page is opened for editing, localStorage is checked for these values and if they're there (and the revision hasn't changed?) a request is made to get the stashed text, with action=stashedit&title=z&stashedtexthash=x&baserevid=y.
  4. If stashed text is returned, it's put into the textbox and we tell the user. If the base revision has changed, we could tell the user and give them the option to overwrite.
  5. When a page is saved, the localStorage item is deleted. It'd also be deleted if, on opening for editing, the saved revision ID doesn't match the current one.

The third point is where I'm confused: the stashedit API would need to be changed to return the text (at the moment it only returns the status when a texthash is requested) and that's not too hard — but there's nothing that links it to the current user. Do we assume that if someone knows the hash then they're allowed to be given the original text? (The hash is exactly that, a sha1 hash of the submitted text, so that might well make sense.)

I think there are some other flows too, such as for new pages or pages that have been renamed since being stashed.

The other issue is that the parser cache is not very long-lived (24 hours by default I think). Is that long enough? I guess it is, because this feature is meant to be an accident-recovery thing rather than any sort of draft system.


I understand that we don't want to look at server-side storage of page text, because of the issues with people saving private text and passing the user account to someone else who retrieves the data — and so the wiki becomes a means to share private data. However, we do want to store the data server-side because of the size limitations with localStorage, so is this okay? I feel a bit like we'd be just doing the same thing, and effectively creating an API for storing and retrieving arbitrary private data.

Correction: the max stashed edit lifetime is PageEditStash::MAX_CACHE_TTL, currently 5 minutes.

We have had advice from WMF legal that a server-side autosave feature should a) have a limit on how long text is stored (up to 90 days, in line with other data retention); and b) provide access to the stored text in case WMF needs to respond to warrants or subpoenas.


To clarify the key data-storage bits of how this could work as a feature building on top of the existing edit stash system (and all in MediaWiki core):

  • Increase the stashed edit cache lifetime to e.g. half an hour.
  • Change the action=stashedit API to return the text when a hash is passed.
  • Store the hashes (of the main text, summary, and section title) and base revision ID in localStorage.

This doesn't tie the stored text to a specific user, so there might be an issue with the requirement for access. Perhaps that's less of concern if the lifetime is only half an hour rather than three months.

The other thing that we haven't discussed here yet is the possibility of using IndexedDB to store the autosave data locally. I think I'd been thinking that this was out of the question due to browser support requirements, but that's not the case is it? IndexedDB doesn't have the size limitation of localStorage. It looks like it'd be the easier way to go, and avoids the legal issues of server-side storage.

The other thing that we haven't discussed here yet is the possibility of using IndexedDB to store the autosave data locally. I think I'd been thinking that this was out of the question due to browser support requirements, but that's not the case is it? IndexedDB doesn't have the size limitation of localStorage. It looks like it'd be the easier way to go, and avoids the legal issues of server-side storage.

This is the first time I've even heard of IndexedDB! https://caniuse.com/?search=indexeddb suggests we are safe to use it with respect to browser compatibility. That said, this seems like a promising solution! From what I read, IndexedDB is rather low-level. There are libraries that make it easier to work with, but I'm unsure if that's worth the while as they may need to first undergo security review.

More feedback from WMF legal:

I think building upon the edit stash feature provides good privacy safeguards.
The 30-min data retention, combined with the fact that the text hash isn't associated with user metadata significantly reduces the likelihood of potential privacy abuse.

I have a few questions. It is my understanding that the wikitext hash will be sent to the server in chunks. Through a yet-to-be created API, it will be possible to retrieve the wikitext by submitting the hash.

  • Do you intend to inform end-users through the UI that what they're typing will be stored, albeit for a short period?
  • Is the hash size large enough that it will be highly improbable that a user retrieves a wikitext which they did not author?
  • Within the 30-min window, who would have access to the stored hash?

It's an interesting point about telling users. I suspect not very many people know that edit stash already done this!


And yeah, let's figure out if cross-device recovery is needed, and if not then maybe IndexedDB is the way to go.

MusikAnimal renamed this task from [4 hour spike] Gather Auto-save feature information to Investigate options for auto-save feature.Jun 28 2023, 6:22 PM

It looks like client-side storage is the best way to go, using indexedDB (comments welcome on T340541). It's going to be simpler than server-side, and work in more useful situations. This feature should be implemented in core.

Proposed next tasks:

  • Decide on a name for this feature ('edit recovery'?)
  • Set up a phabricator project
  • Add feature flag e.g. $wgEnableEditRecovery
  • Create basic functionality to save in-progress form contents to indexedDB and restore it when opening the edit form
  • Figure out the rest of the feature's design and operation…

Although, regarding server- vs client-side storage, the former was asked for in the original wish:

Saving these edits online (to Wiki servers), so as to allow the user to carry on working on one page in multiple sittings/across multiple devices. (Just to clarify: until published by the user, these edits should remain private and not visible to anyone else than the user in question).

Thank you @Samwilson! I agree. I read through implementation options and T340541: Investigate IndexedDB as a data store for the auto-save wish, the client-side IndexDB solution sounds viable. It's a great starting point and it would also eliminate the legal implications.
We can iterate through this solution as we receive feedback.

We've renamed the feature; got a new Phab board, Edit Recovery; and further work will be happening in: T341844: Create MVP for Edit Recovery

Cross-device recovery will not be supported at the moment.

Other follow-up work should be created as separate tasks; this one is I think all done.