Page MenuHomePhabricator

Moving associated content out of the wikitext
Open, Needs TriagePublic

Description

Type of activity: Pre-scheduled session
Main topic: Handling wiki content beyond plaintext

The problem

There is a need to associate information to certain parts of a Page without actually storing this information in the wikitext. Such information would need to remain correctly associated if the contents of the page are re-arranged and invalidated when the source text has changed sufficiently.

Examples of this include:

  • ContentTranslation (CX): the translation restore feature, which allows translators to continue the translation they saved earlier.
  • MediaWiki-extensions-Translate: Instead of the current solution with <!--T:1--> comments in the wikitext.
  • Wikispeech: For storing annotations related to the pronunciation of individual words or phrases.

The reasons why such information might not be suitable to store in the wikitext is that it makes editing harder for users who are not interested in this information (and are editing using wikitext) or risks getting deleted/de-associated by users who are unaware of the purpose of the information.

To enable this we would most likely require providing stable element ids.

Expected outcome

A roadmap for how to implement this and clear stewardship of the issue, alternatively a clear decision that these types of features will not be supported (in the foreseeable future)

Current status of the discussion

This was last discussed prior to WikiDev16 but no conclusion was reached in part due to an unclear use case.

Links

Related Objects

Event Timeline

There might be a partial overlap with T147896

This has come up over and over and most recently, @cscott has been talking of an external annotation service at https://meta.wikimedia.org/wiki/Grants:IdeaLab/Amazing_Article_Annotations.

In my mind, I think the critical question to figure out is: Given a dom A and dom B (which is a dom derived from A after edits), if nodes nA (from A) and nB (from B) have the same id, what does that tell us in terms of properties of nA and nB?

The reason this project has not moved ahead is because different applications have different requirements about these properties. Anyway, I think it would be useful to identify clear application-specific requirements (for multiple applications) and then figuring out what parts of this need to be part of a service or even part of Parsoid, and what parts of it will be application-specific and reside in the application.

Here is a Nov 2013 discussion from the erstwhile Parsoid team which indicates that we've been stuck on this same problem since then. Over the next week, I am going to make up a wiki page with notes pulled from all over so we have a base doc to work off of.

Here are some suggested applications for out-of-band annotations:

  • Chat/suggested edits in reading view and VE
  • Translations (replacement of backend of <translate> extension)
  • Original-to-translated text correspondences in CX
  • More fine-grained (phrase-by-phrase) correspondences in CX generated by machine-assisted translation.
  • Pronunciation annotations for wikispeech.
  • Citation regions (there's also an inline proposal, using a heredoc template)
  • Storing proposed edits/Resolving edit conflicts (the edit is stored as an annotation on the base document, then the annotation is migrated to the current version before applying it)

I've made a simple prototype based on the hyptothes.is service.

I submitted T149660: Glossaries, pronunciations, and dictionaries and T149667: Build an article annotation service which overlap with this proposal quite a bit. But in particular I believe that "stable IDs" are *not* the preferred implementation approach here, after a bunch of failed attempts to come up with a stable ID definition which is acceptable to all use cases.

Instead I'm proposing (in T149667) a generalized "annotation" service, which associates the annotations with the revision they were initially applied to, allowing the different use cases to implement their preferred (and varying) mechanisms for pulling these annotations forward to the latest revision. That seems more fruitful than trying to come up with a single definition for "stable" which satisfies everyone.

Who would be the person facilitating this session? Please assign this task to that person if you are aiming to have this session pre-scheduled. Thank you!

@Lokal_Profil -- are you planning to attend the dev summit? Assign yourself if so.

I will probably be combining this in a joint session with one or more of:

So it wouldn't be completely ignored if @Lokal_Profil can't present it --- but I'm going to get bored hearing myself speak, and I've got a dog in the race (T149667) so I'd really like to solicit other presenters and opinions here.

[...] I think it would be useful to identify clear application-specific requirements [...]

Below is a description of how Wikispeech would use an annotation service and what would be required of it, based on initial discussion.

Goal

One or more words, in sequence, can be annotated. An annotation attaches a label to a target, which is a string in the article text. There are two types of annotations: foci and scopes. A focus is a string that will be labeled with some information relevant to Wikispeech, e.g. (non standard) pronunciation. A scope is a string containing a focus and represents text which could affect the validity of the focus, e.g. the scope of a focus that annotates a word could be a sentence.

Use Cases

Here are some use cases and likely outcomes:

  1. If a focus’ target is changed, including complete removal, remove focus and scope.
  2. If a change leaves a scope’s target unchanged, keep focus and scope as they are.
  3. If a scope’s target is changed, remove focus and scope.

All changes should be logged so that Wikispeech can determine what to do with them. For example, it may become apparent that some cases of 3 don’t require removal.

API Requirements

  1. It should be possible to both add and remove annotations.
  2. When a change is made, the annotation service should give each annotation a confidence score of how likely it is that that annotation was changed. This will be used in the use cases above, both for foci and scopes.
  3. To maintain the connection between foci and scopes, annotation need to have some kind of id.

@Lokal_Profil, are you planning to attend the dev summit?

@Lokal_Profil, are you planning to attend the dev summit?

@cscott Sorry for not getting back to you earlier. Crazy busy with other things on my end.

Yes both I and @Sebastian_Berlin-WMSE will be attending.
Combining this into a general session on Annotations (per T147602#2832989 / T148734#2789416) seems like the by far best solution.
For our (Wikispeech) part we are not married to any particular solution/implementation but T148734#2798199 our needs/thought so far.

@cscott @Lokal_Profil Hey! As developer summit is less than four weeks from now, we are working on a plan to incorporate the ‘unconference sessions’ that have been proposed so far and would be generated on the spot. Thus, could you confirm if you plan to facilitate this session at the summit? Also, if your answer is 'YES,' I would like to encourage you to update/ arrange the task description fields to appear in the following format:

Session title
Main topic
Type of activity
Description Move ‘The Problem,' ‘Expected Outcome,' ‘Current status of the discussion’ and ‘Links’ to this section
Proposed by Your name linked to your MediaWiki URL, or profile elsewhere on the internet
Preferred group size
Any supplies that you would need to run the session e.g. post-its
Interested attendees (sign up below)

  1. Add your name here

We will be reaching out to the summit participants next week asking them to express their interest in unconference sessions by signing up.

To maintain the consistency, please consider referring to the template of the following task description: https://phabricator.wikimedia.org/T149564.

@cscott @Lokal_Profil Hey! As developer summit is less than four weeks from now, we are working on a plan to incorporate the ‘unconference sessions’ that have been proposed so far and would be generated on the spot. Thus, could you confirm if you plan to facilitate this session at the summit? Also, if your answer is 'YES,' I would like to encourage you to update/ arrange the task description fields to appear in the following format:

Session title
Main topic
Type of activity
Description Move ‘The Problem,' ‘Expected Outcome,' ‘Current status of the discussion’ and ‘Links’ to this section
Proposed by Your name linked to your MediaWiki URL, or profile elsewhere on the internet
Preferred group size
Any supplies that you would need to run the session e.g. post-its
Interested attendees (sign up below)

  1. Add your name here

We will be reaching out to the summit participants next week asking them to express their interest in unconference sessions by signing up.

@srishakatux: It was my understanding that (per T147602#2832989) this had been folded into a general "Annotations" session? Do you still need the above?

@Micru: That could be one solution for storing the annotations. I believe that is an easier issue to solve then how to correctly handle how annotations are affected by updates to the "real" page.

@srishakatux: It was my understanding that (per T147602#2832989) this had been folded into a general "Annotations" session? Do you still need the above?

The general session I mentioned can be found at T151958

Ping to anyone subscribing that this has been proposed as an unconference session today (under Annotations / T151958)

Aklapper added a subscriber: Aklapper.

This task has been assigned to the same task owner for more than two years. Resetting task assignee due to inactivity, to decrease task cookie-licking and to get a slightly more realistic overview of plans. Please feel free to assign this task to yourself again if you still realistically work or plan to work on this task - it would be welcome!

For tips how to manage individual work in Phabricator (noisy notifications, lists of task, etc.), see https://phabricator.wikimedia.org/T228575#6237124 for available options.
(For the records, two emails were sent to assignee addresses before resetting assignees. See T228575 for more info and for potential feedback. Thanks!)