Page MenuHomePhabricator

Enotif notification emails in some languages have excess unneeded newlines (due to wordwrap counting bytes instead of characters)
Open, LowPublic

Assigned To
None
Authored By
Nasirkhan
Oct 23 2013, 7:45 PM
Referenced Files
F41784781: image.png
Feb 5 2024, 7:17 PM
F32375700: Screenshot from 2020-10-06 15-23-00.png
Oct 6 2020, 1:32 PM
F12383: Original_Content_of_the_email.txt
Nov 22 2014, 2:28 AM
F12382: file_56063.txt
Nov 22 2014, 2:28 AM
F12381: Email_contents_how_it_should_be.png
Nov 22 2014, 2:28 AM
F12380: received_email.png
Nov 22 2014, 2:28 AM

Description

I am Nasir, an active contributor of Bengali WIkipedia. I use the email notifications for the change on my watch list pages and it is also enabled for all the notifications. It is helpful to get the notifications as email but the emails are not formatted as it should. I faced the following issues for the Bengali emails and i think these should be fixed soon.

  • Sometimes a new line appeases between the words of a page name, and sometimes new line appears after 4/5 words in a sentence. I think we are using a fixed DIV width for these texts. If it will be removed then it may be fixed.

Screenshot from 2020-10-06 15-23-00.png (1×1 px, 197 KB)

  • Unicode URLs are not readable at all. For the Bengali wiki urls we can make some changes to make the url more readable. like url format could be like this <a href="PAGE_LINK">http://bn.wikipedia.org/wiki/PAGE_NAME</a> - not covered in this ticket; see T72245 instead.

Version: 1.22.0
Severity: minor

Details

Reference
bz56063

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 2:28 AM
bzimport set Reference to bz56063.
bzimport added a subscriber: Unknown Object (MLST).

(In reply to comment #0)

  • Sometimes a new line appeases between the words of a page name, and

sometimes
new line appears after 4/5 words in a sentence. I think we are using a fixed
DIV width for these texts. If it will be removed then it may be fixed.

Attaching or pasting one such email would be helpful.

  • Unicode URLs are not readable at all. For the Bengali wiki urls we can make

some changes to make the url more readable. like url format could be like
this
<a href="PAGE_LINK">http://bn.wikipedia.org/wiki/PAGE_NAME</a>

HTML enotifs are done by [[mw:Echo]]; maybe ShortUrl could also be used in text enotifs, that should be another bug.

Created attachment 13552
Received email content format

Attached:

received_email.png (1×1 px, 87 KB)

Created attachment 13553
Email content format (how it should be)

Attached:

Email_contents_how_it_should_be.png (954×1 px, 77 KB)

Uploaded two attachments first one (Attachment 13552, Received email content format) is the screenshot of the received email and the second one (Attachment 13553, Email content format (how it should be)) shows how the email contents should be displayed.

Nasir Khan Saikat

I was not asking about the ShortUrl. The link we receive in the email are not hu PAGE_NAME are displayed properly in the email the email we
<a href="PAGE_LINK">http://bn.wikipedia.org/wiki/PAGE_NAME</a>

As written in comment 1, attaching one such example email (with header lines) would be very helpful (please make sure that no confidential data is included).

(In reply to comment #4)

Uploaded two attachments first one (Attachment 13552 [details], Received
email content
format) is the screenshot of the received email and the second one
(Attachment
13553 [details], Email content format (how it should be)) shows how the
email contents
should be displayed.

The URLs seem functional in both examples, so this bug seems to be about newlines in the non-URL text of the email notifications. How did you get Gmail to show the message correctly? How can we see it ourselves and how do we know it's a problem in MediaWiki?

On plain text and long URLs, see comment 1.

@Andre Klapper, are you asking me to export the original email content and attach that here?

@Nemo, If you open a Gmail email in print mode you can get the email texts in this format. First screenshot is the original message, i did nothing here, just captured the screenshot. And after opening the email in print mode i edited the source and took the second screenshot. The is a <br> (line break) after 3/4 wards in that source, i cleaned those and the texts look ok.

Yes, it is a problem of MediaWiki, because between the word of a page names (example উইকিপিডিয়া:প্রশাসকদের আলোচনাসভা) there is no way to add a line break. but i got these "উইকিপিডিয়া:প্রশাসকদের আলোচনাসভা" 3 words in 2 lines.

--Nasir Khan

Yes, full email content or export in .eml format would help: then everyone can open the email in Thunderbird. :)

Created attachment 13592
enotif from te.wiki

I remembered receiving some enotifs with weird newlines, so I retrieved one from te.wiki: the newlines are wrong there too, right?
(Again, for long URLs please file another report.)

Attached:

Created attachment 13594
Original Email Content bn.wiki

@Nemo, Yes the problem is similar as the te.wiki. There are some additional new line there. I have attached a copy of the bn.wiki email.
I do not know how to export as .eml form Gmail. Here i just attached the 'Original Contents' of that email. Please let me know if it is helpful or not.

Nasir Khan

Attached:

(In reply to comment #11)

@Nemo, Yes the problem is similar as the te.wiki. There are some additional
new
line there.

Ok great. Now that we've clarified the issue: Santhosh, do you know what may be causing this weird wrapping in plain text enotifs?

Is there any update in this bug?

Unfortunately not.

Somebody needs to investigate here what may be causing that weird wrapping in plain text enotifs...

Is there any update on this issue?

Is there any update on this issue?

MediaWiki email notifications are unmaintained, there are no updates in this component.

@Nemo_bis
so, what should be the proposed solution for this?

Silent patience, or pro-active work.

Likely caused this piece of code:

EmailNotification.php
395:            $this->body = wordwrap( strtr( $body, $postTransformKeys ), 72 );

wordwrap counts bytes and not characters.

Perhaps word wrapping is not needed, or unicode aware replacement needs to be created.

@Nikerabbit is there any plan to implement HTML emails in core? then it might be easier to resolve this and present the text in a more formatted way.

@Nikerabbit is there any plan to implement HTML emails in core? then it might be easier to resolve this and present the text in a more formatted way.

Well. We had one project for that for last GSoC, which never ended up well. https://phabricator.wikimedia.org/T15303

I wonder if we should introduce an option for a maximum character length of enotif links. When links get longer than that, the &title parameter in URLs would be skipped and replaced by curid or oldid (of course that's not possible for the special pages).

@Jony the people who are members of the MediaWiki-Email project are not automatically the best people to actually work on tasks; I'm personally a member for my own tracking purposes, and don't do any MediaWiki development work myself (beyond occasionally filing or commenting on Phab tasks).

Aklapper renamed this task from Email notifications in Indian language wikis have excess newlines to Enotif notification emails in some languages have excess unneeded newlines (due to wordwrap counting bytes instead of characters).Oct 6 2020, 1:31 PM
Aklapper updated the task description. (Show Details)
Aklapper updated the task description. (Show Details)
Aklapper updated the task description. (Show Details)

Note that I am not a PHP developer so this is likely wrong and non-performant:

I'm wondering if the hardcoded 72 in EmailNotification.php's line $this->body = wordwrap( strtr( $body, $postTransformKeys ), 72 ); could be replaced by a ratio of chars vs bytes, minus any URLs (which are currently still ASCII anyway, hence byte=char due to T72245). Something like this:

$totalbytes = mb_strlen($body, 'ASCII');
$totalchars = mb_strlen($body, 'UTF-8');
$linkbytes = 0;
$linkchars = 0;
# TODO: There might be more tokenizers in different languages to cover?
$token = strtok($body, " \n\t");
# Get tokens which are include the substring  '://' which should be URLs:
while ($token !== false) {
    $token = strtok(" \n\t");
    if (strpos($token, '://') !== false) {
        $linkbytes += mb_strlen($token, 'ASCII');
        $linkchars += mb_strlen($token, 'UTF-8');
    }
}
$nonlinkbytes = $totalbytes-$linkbytes;
$nonlinkchars = $totalchars-$linkchars;
$charsperlineratio = (int) (72 * ($nonlinkbytes / $nonlinkchars));

$this->body = wordwrap( strtr( $body, $postTransformKeys ), $charsperlineratio );

The same problem remains for the Russian language in Russian Wikipedia, it's not just Bengali that is affected.

A screenshot is a part of my Outlook app windows for Windows 11, screen resolution 2k.

image.png (1×1 px, 147 KB)