Page MenuHomePhabricator

Read UI content: Page toolbar
Open, Needs TriagePublic16 Estimated Story Points

Description

When it is time for the frontend to play the UI, we decided to start by reading the page toolbar as mentioned in T402617.
That is:

image.png (58×998 px, 11 KB)

There has always been some experimenting in a patch of this that could be useful for continuing: Reads the toolbar ("Main Page", "Discussion"...) on mouse over or focus (tabbing). This was also mentioned in T401183

Details

Event Timeline

Here are the relevant i18n-messages for the page-toolbar:

"vector-view-create": "Create",
"vector-view-edit": "Edit",
"vector-view-history": "View history",
"vector-view-view": "Read",
"vector-view-viewsource": "View source"
"nstab-talk": "",
"nstab-main": "Page",
"visualeditor-ca-editsource": "Edit source" <---- demands you to have the VisualEditor extension

"vector-view-create": "Create",

There's also "Create source"

"nstab-talk": "",

Is this missing?

"nstab-main": "Page",

This one depends on what namespace the page is in. For instance on ENWP it's "Article" in the main namespace and "Help page" in the "Help" namespace.

"vector-view-create": "Create",

There's also "Create source"

Ah thanks! Found it: visualeditor-ca-createsource


"nstab-talk": "",

Is this missing?

Yes this was weird! I could not in the i18n-files find the right messageKey that said "Talk", the only other thing I found was theese (using uselang=qqx):
"tooltip-ca-talk": "Discussion about the content page"

image.png (43×138 px, 2 KB)
title="(tooltip-ca-talk)(word-separator)(brackets: (accesskey-ca-talk))(word-separator)(brackets: Alt+Shift+(accesskey-ca-talk))"


"nstab-main": "Page",

This one depends on what namespace the page is in. For instance on ENWP it's "Article" in the main namespace and "Help page" in the "Help" namespace.

Hmm not sure what you mean? Or at least I can't find the correct i18n-message for this?

This one depends on what namespace the page is in. For instance on ENWP it's "Article" in the main namespace and "Help page" in the "Help" namespace.

Hmm not sure what you mean? Or at least I can't find the correct i18n-message for this?

Now I get it!
nstab-main (and similar like nstab-talk) are system messages that represent namespace tab labels. So the actual value depends on the namespace (e.g. “Article” for main, “Help page” for Help namespace), and the labels are defined via MediaWiki:Nstab-main, etc.

That’s why it looked like “Page” on my local wiki and “Artikel” on SVWP, it depends on what’s set on the wiki, not in the i18n files. Makes sense now!

If we only want to get one message at a time, when needed (e.g when a user mouse over an element), how can you get the message-key from the HTML? The HTML output does not expose the message keys directly, only the rendered text.
Is there a way in an API-call to call it with ?useLang=qqx and then collect the message keys from there or is that just for rendering full pages (e.g. via action=parse)?
@Sebastian_Berlin-WMSE

Maybe we don't need message keys? The segment hash should still work to identify the utterance. You should be able to use the segment API to get them from the text in the UI.

I think the main reason we added page ID for utterances was to avoid hash collisions. With all the sentences on all the pages on Wikipedia that could be an issue. Just the messages should be a much smaller set so hopefully that's not a worry.

There's still the issue with dynamic messages, but I don't know if using message keys would solve that. Or if it even makes sense to put those in the utterance storage at all.

After discussion, we considered the following approach for playing UI elements specifically the page toolbar:

  1. When the user hovers on an element, the visible text content of that element should be extracted.
  2. This text content is sent to the 'wikispeech-listen'-API using the text-parameter.
  3. The backend will then try to find a matching utterance, based on pre-synthesized segment, using the segment hash stored in the 'wikispeech_utterance' table in the database.
  4. If no match found, it will fallback to synthesizing the input text at runtime via segmentation and synthesis.

While we already have an implementation for sending message-keys for the API, this may not be relevant anymore, at least not for this task, since it turned out to be inconsistent regarding the message-keys. Some of the message-keys exists, but some are more dynamic regarding what namespace the page is currently on.

I will investigate pros and cons for both of these solutions, and come up with something that would best fit our goal.

  1. When the user hovers on an element, the visible text content of that element should be extracted.
  2. This text content is sent to the 'wikispeech-listen'-API using the text-parameter.
  3. The backend will then try to find a matching utterance, based on pre-synthesized segment, using the segment hash stored in the 'wikispeech_utterance' table in the database.
  4. If no match found, it will fallback to synthesizing the input text at runtime via segmentation and synthesis.

While this may be more of my question to myself, but isn't this how you normally get utterances? 😅 @Sebastian_Berlin-WMSE Except the hovering part 1 ofcourse..

For page content we don't send the actual text in 2. We send the segment hash. Don't remember if there was any reason we couldn't do that here.

For page content we don't send the actual text in 2. We send the segment hash. Don't remember if there was any reason we couldn't do that here.

Yes I see!

Another thought I had is that I also think it makes sense to add this as a setting for reading parts of content. But when I was thinking more about it, I wonder if it would be better to have ONE setting for EACH new content that we want to read..

I mean like this:

Read extra information for certain elements
  • Read links
  • Read menus
  • Inactivate highlighting ..

..... etc etc

Is this a relevant thought you think? @Sebastian_Berlin-WMSE We can also discuss this on our checkin tomorrow.

We decided that for now only create ONE setting for now for reading the page toolbar. Later, we can revisit how to structure these settings in a more intuitive way for users, for example by grouping them into sections such as “Article content,” “UI,” and “Other customization.”

Change #1226163 had a related patch set uploaded (by Viktoria Hillerud WMSE; author: Viktoria Hillerud WMSE):

[mediawiki/extensions/Wikispeech@master] Read UI content: Page toolbar

https://gerrit.wikimedia.org/r/1226163

I am having some issues while trying to target and read

image.png (87×525 px, 10 KB)

If I hover on "English" directly, it doesn't read it. But if I hover a bit below the actual text "English", but still in the same node, it reads. I can't seem to target the right node..

If it's just that item that causing issues I think you can skip it for now. It seems to only appear on local installation. I don't think I've ever seen it on a "proper" wiki.

In fact, this seems to be an easter egg. The only other alternative is "Igpay Atinlay" which changes the content into Pig Latin.

Ah interesting! Though, the "Tools" tab behaves the same way. and that appears in the other Wikis, should we still skip this for now? @Sebastian_Berlin-WMSE