Page MenuHomePhabricator

Step 4 : Add dynamic recording type selector
Closed, ResolvedPublic

Description

Same vue.store parameter as in T370618 can actually be selected in two steps :

  • At Step 4 (List) : it affects texts segmenting.
  • At Step 5 (Recording) : it affects audio parameters.

image.png (218×756 px, 17 KB)

Step 4 : in list loader

Convert text into array of objects :

Item markerItem's labelItem marker: matchesIterm marker: regexRegex: demoObject formatComment
#words#; \n#; \nr(/[\n#]+\s*)/g,"# ")https://regex101.com/r/piieBT/1{'item':'…',type:'word'}prepend \n# to input text first line or first line fails
##sentence##; \n##; \nr(/[\n##]+\s*)/g,"# ")https://regex101.com/r/piieBT/1{'item':'…',type:'sentence'}prepend \n##
###poems\n\s*###+\s*; \n\n\n+r(/(\n[\s\W]*){3,}/g, "# ")https://regex101.com/r/r6n0LN/1{'item':'…',type:'paragraph'}prepend \n###

Note: once properly coded, if regex proves incomplete, we will be able to refine those later.

Complementary to

Event Timeline

Yug updated the task description. (Show Details)
Yug triaged this task as High priority.Nov 15 2024, 11:10 AM
Yug renamed this task from Step 4 : Move recording type selector to List step to Step 4 : Add recording type selector.Nov 16 2024, 7:42 PM
Yug updated the task description. (Show Details)
Yug updated the task description. (Show Details)

Hi @Yug , I am taking up this task. I want to confirm that right now users can use double hash and triple hash to create separate paragraphs and poems. This is my understanding of the task. Can you please confirm it?

  1. In step 5, identify every term as a word/sentence/poem, and then automatically switch the recording type respectively
  1. Apart from the default separators of hashes, I also need to add separators given in the table below including "\n", "*", etc

*Question 1*
What if the user doesn't use #(hash) while adding the word list, then how do we identify it, as word, sentence or poem?

Potential solutions

  1. Keep all terms as words by default
  2. Specify typep of each term based on its length:
    • <3 words: word
    • 3<15 words: sentence
    • 15 words> : poem

*Question 2*
The separator matched for sentences and word looks same, and seems to wrong

Yug updated the task description. (Show Details)

Hello Pushkar,
It's almost that.

  1. Yes, default type== "word".
  2. Specify type : cannot use string's length since the definition of words depends on languages and writing systems, some don't have spaces (ex: Chinese, Korean, etc.).

Overall though

Adding support for recording long texts such as poems means :

  • the system must toggle between audio recording settings.
  • filenames taking as suffix the full multi-lines poem is likely to be problematic, so the system should accept shortcuts declaration in the user-input and local lists.
  • with all this, enrich the underlying Vue data model is necessary. Current list of items are simple an array of strings-words. New model should have array of objects (words, sentences or "poems") with multiple properties read:"[string to read]", type:"[word|sentence|poem]", saveAs:"[some short title standing for the long text read]"hhf.

Change will have side effects, edits will be requires at least on :

Also places where words lists were handled :

See also previous input cleaners:

Formatting

I've though about that and checked current lists : they all or nearly all use \n# as item separators. Let's consolidate that and push further this convention and build on it:

Splits from raw text to items

Raw text input are either local lists (ex: List:Cmn/HSK_2012) or user contributed text submitted into Step 4 via its input field.

Raw text input must be clean into an array of objects :

  • str.replace(/(?=\#)/g, '\n') // Adds a line break before each group of #
  • str.split(/^/m); // Split at the start of each line
  • code1 or code2 // Adds property string and type to object depending on number of # on.
    • no # announces a word
    • # separator announces a word
    • ## separator announces up to a few sentences
    • ###+ separator announces a poem.
  • str.replace(/#+/g,'') // remove #s
  • str.trim() // trim

Example input :

text
# water # tree # earth ## This big red stone ## That small blue fire

Regexes:

js
const rawInput = `###line1 of poem\n line 2##this is my paragraph and second item#a word
  #### Item 1
  ### Item 2
  # Item 3
  ## Item4
  # Item 5
`;
const addLineJump = (text) => text.replace(/(#+)/g, '\n$1')
console.log("1)", addLineJump(rawInput));

const split = (text) =>  text.split(/^\#/m).map(str => `#`+str.trim()).splice(1);
console.log("2)",split(addLineJump(rawInput)));

const addItemType = split(addLineJump(rawInput)).map(str => {
  const match = (str.match(/#/g) || []).length; // Count the number of '#'
  let type = 
    match == 0 || match == 1 ? "word"
  	:match === 2? "sentence" 
    : "poem";
  return { "displayAs": str, type };
});
console.log("3)",addItemType);

const removeSharps = addItemType.map(obj => { return { "displayAs": obj.displayAs.replace(/#+/g,'').trim(), "type": obj.type }});
console.log("4.",removeSharps)

Wanted output:

json
[
  { read:'water', type:'word' },
  { read:'tree', type:'word' },
  { read:'earth', type:'word' },
  { read:'This big red stone', type:'sentence'  },
  { read:'That small fire', type:'sentence' }
  { read:'..............', type:'poem' }
  ...
]

Rich input implies additional splits

Special arrow characters exist. These arrows help split each line, clarifying the par read and the part used for filename creation.

  • The ignore arrows allows data storages for minimalist low tech dictionaries service to come later, but the current recorder must ignore those data.
    • A → B (the ignore arrow): A is item to read in step 5, B is there as the L2 translation and is ignored. { read:'A', type:'word', saveAs:'A' }
  • The rename arrow allows a difference between text read (could be very long) and filename title (shorter).
    • A ⇐ B (the renaming arrow): B is item to read in step 5, A is the title to save, so we have the json saveAs:'A', which is use as a shorter string for file naming before uploading the file to Wikimedia Commons. { read:'B', type:'word', saveAs:'A' }
Examples
text
# water
# water → aqua, aqua
## Invictus§Title ⇐ Invictus (William Ernest Henley, 1931)
json
[
  { read:'water', type:'word', saveAs:'water' },
  { read:'water', type:'word', saveAs:'water' },        // just remove " → aqua, aqua"
  { read:'Invictus (William Ernest Henley, 1931)', type:'sentence', saveAs:'Invictus§Title' }
]

And finally for texts:

md
\n### Invictus ⇐ 
Invictus,
(William Ernest Henley, 1931)

Out of the night that covers me,
Black as the pit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.

In the fell clutch of circumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.

Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds and shall find me unafraid.

It matters not how strait the gate,
How charged with punishments the scroll,
I am the master of my fate :
I am the captain of my soul.

Then the JSON object:

json
{ saveAs:"Invictus", read:" 
Invictus,
(William Ernest Henley, 1931)

Out of the night that covers me,
Black as the pit from pole to pole,
I thank whatever gods may be
For my unconquerable soul.

In the fell clutch of circumstance
I have not winced nor cried aloud.
Under the bludgeonings of chance
My head is bloody, but unbowed.

Beyond this place of wrath and tears
Looms but the Horror of the shade,
And yet the menace of the years
Finds and shall find me unafraid.

It matters not how strait the gate,
How charged with punishments the scroll,
I am the master of my fate :
I am the captain of my soul.",
type:'poem' }

Effects on sound settings

At step 5, when recording, sound settings will vary according to type's value.

Discussion

I want to ask that will the regex will only be used by people through the input given on word list? or they will also be coming from generators.

Items from generators are already strictly formatted and clean. Their items are always (considered) to have type: word.

Can we put a textarea with multiple inputs instead of plain inputs?
Instead of making complicated regex for the users

The end user never see the regex, those are under-the-hood cleaner functions to convert a raw text into item-objects with different properties.

What are saveAs: and type: default values ?

If saveAs: is undefined, use read: value as filename suffix.
If type: is undefined, use type:'word' value.

User interface

Select type

In step 5, user can edit the time values for each type.
(Ideally these values should be persistant for each type, but persistance could be a later task).

Input field (later task)

Will be a later task.

As for the input field, we don't want it to become an editing area. Content should be prepared earlier and elsewhere. So keeping that input field as it or as a minimalist textarea :

html
<cdx-text-area class="list-input" v-model="textareaValue" placeholder="Input list with # for word, ## for sentence, and ### for long text." rows="2" />
<style>
.list-input {
  width: 60ch; /* Minimum width of 60 characters */
  resize: vertical; /* Only allow vertical resizing */
  min-height: calc(2em + 0.5rem); /* Minimum height for 2 lines */
  max-height: calc(8em + 2.5rem); /* Maximum height for 8 lines */
  overflow-y: auto; /* Add scrollbar if content exceeds max height */
}
</style>

Source: WM Codex > Textarea

Long string bug

UI fixes will be a later task.

Can test with the following raw text : List:Test/Rich_format

Yug closed this task as Resolved.EditedJan 27 2025, 9:49 AM

Hello @Pushkar7077 , let's close this ticket : the dynamic type selector have been implemented. This is done.

As for the side effects and smaller bugs, let's open smaller tasks.

Yug renamed this task from Step 4 : Add recording type selector to Step 4 : Add dynamic recording type selector.Jan 27 2025, 9:53 AM