Page MenuHomePhabricator

"(last change)" in new messages box should link to combined diff of all changes since last visit
Closed, ResolvedPublic

Description

When your user talk page is changed, a box shows up saying "You have new messages (last change)" where "(last change)" links to the diff of the last revision of your user talk page. This should be changed to "(last changes)", linking to a combined diff of all changes you haven't seen yet (URL: index.php?diff=cur&oldid=NNNN). This would require a schema change, adding a revid field to the user_newtalk table. That field would record the revid of the last revision the user has seen.

To clarify:

  • Someone edits my talk page, creating revision 12345
  • A row is added to the user_newtalk table, with 12333 (the previous revision) as revid
  • More edits are made; the user_newtalk table isn't touched
  • I view a page and see a banner linking to diff=cur&oldid=12333
  • I view my user talk page, causing the row in the user_newtalk page to be deleted

Version: unspecified
Severity: enhancement

Details

Reference
bz12701

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 10:06 PM
bzimport set Reference to bz12701.

gutza wrote:

How does the current version of the code know when you last visited your talk page? There has to be a timestamp somewhere storing that information. Can't that timestamp be cross-checked with the talk page's revision history, thus determining the previous revision number? That would be your NNNN, and you wouldn't need any schema changes.

(In reply to comment #1)

How does the current version of the code know when you last visited your talk
page? There has to be a timestamp somewhere storing that information.

There isn't. Whenever someone edits my talk page, a row is added to the user_newtalk table (see http://www.mediawiki.org/wiki/User_newtalk_table ), triggering the new messages box. When I view my talk page, that row is deleted again.

I think this is already done by enotif?
http://www.mediawiki.org/wiki/Enotif suggest it's completely done by enotif, but if it wasn't the Update markers would simplify much of the work.

ayg wrote:

A timestamp would probably be better to store than a rev_id. It contains more information than rev_id, and also avoids some conceivable difficulties with referential integrity (e.g., revision is oversighted, now how do you get the diff?).

gutza wrote:

(In reply to comment #4)

A timestamp would probably be better to store than a rev_id.

True, but storing the rev_id is slightly less expensive, because you don't have to perform the extra step of actually identifying the rev_id when the time comes to show the diff (you already have it).

Also, a timestamp indicating what event?

ayg wrote:

(In reply to comment #5)

True, but storing the rev_id is slightly less expensive, because you don't have
to perform the extra step of actually identifying the rev_id when the time
comes to show the diff (you already have it).

One extra query that involves B-tree lookup of a single index entry is trivial.

Also, a timestamp indicating what event?

The last time the viewer looked at the most recent revision of the page.

gutza wrote:

(In reply to comment #6)

(In reply to comment #5)

Also, a timestamp indicating what event?

The last time the viewer looked at the most recent revision of the page.

Then, if comment #2 is correct (which I assume it is), you're basically proposing a rewrite of this feature (currently there's no entry in table user_newtalk at the moment when a viewer looks at the most recent revision of the page -- on the contrary, that's when the entry is deleted).

Maybe we can use a rev_id like this: If the rev_id is there, voila! Otherwise, find the biggest rev_id which is smaller than the expecte one (i.e. the last remaining revision before the one stored in user_newtalk). For certain, this algorithm should be implemented in the diff code... which sounds... irksome! Sorry for my flight of idea.

ayg wrote:

(In reply to comment #7)

Then, if comment #2 is correct (which I assume it is), you're basically
proposing a rewrite of this feature (currently there's no entry in table
user_newtalk at the moment when a viewer looks at the most recent revision of
the page -- on the contrary, that's when the entry is deleted).

Well, yes, since you point it out. More conveniently, we could make the timestamp be the timestamp of the insert. This would be almost completely equivalent, since the interval between the last view and the first unviewed change is irrelevant to this feature. The only catch would be if two pages were merged, or revisions were otherwise inserted out-of-sequence. In that case the current schema fails however you look at it, unless the merge alters the row (which it could be arranged to do, if desired).

Either way, it's cleaner to use a timestamp than a rev_id, since the important information is not really dependent on the specific rev_id, and so using a rev_id could *conceivably* cause referential-integrity problems as a consequence. Not that I view it as likely, but we may as well do The Right Thing to start with, if it costs nothing more.

(In reply to comment #8)

Maybe we can use a rev_id like this: If the rev_id is there, voila! Otherwise,
find the biggest rev_id which is smaller than the expecte one (i.e. the last
remaining revision before the one stored in user_newtalk).

Also possible, if you assume rev_id is strictly increasing (I'm not sure it is, but it's certainly fairly close). But it's easier to just use a timestamp.

gutza wrote:

(In reply to comment #9)

Also possible, if you assume rev_id is strictly increasing (I'm not sure it is,
but it's certainly fairly close). But it's easier to just use a timestamp.

From all I've seen, rev_id /is/ strictly increasing (I didn't look into the code, but I'm making this assumption in related code, and I never saw a failure).

I was thinking about the same solution as the one suggested by Huji in comment #8, but with a twist: instead of implementing that "smartness" in the specific "you have new messages" code, implement it in the algorithm that shows the actual diff. That way you solve all referential integrity problems with diffs.

Simetrical, your proposal does make sense, but I think it incurs unnecessary penalties for both coding (redesign as opposed to a minor hack) and execution times (if you think it through, you'll probably find that you need to make timestamp comparisons between two tables on every page being served, as opposed to checking whether a field exists in a single table based on a primary key). But then again, that's just my 2c.

Yes. The rev_id is strictly increasing because it is created like this (pasting SQL code):

CREATE TABLE /*$wgDBprefix*/revision (

rev_id int unsigned NOT NULL auto_increment,

(In reply to comment #11)

Yes. The rev_id is strictly increasing because it is created like this (pasting
SQL code):

CREATE TABLE /*$wgDBprefix*/revision (

rev_id int unsigned NOT NULL auto_increment,

That doesn't necessarily hold when two histories are merged: in that case, a rev with a lower rev_id could be more recent.

I think the timestamp is probably the cleanest way to go, even though it requires an extra query (which is indexed anyway).

ayg wrote:

(In reply to comment #11)

Yes. The rev_id is strictly increasing because it is created like this (pasting
SQL code):

CREATE TABLE /*$wgDBprefix*/revision (

rev_id int unsigned NOT NULL auto_increment,

Depending on the configuration, under high insert load autoincrements might not be increasing in time due to race conditions, at least conceivably. Probably MySQL-side race conditions are rare enough to be negligible for minor UI details like this, but I'm not sure offhand that there are no application-side race conditions. If the timestamp is calculated near the beginning of the request, say, then processing before the insert could easily cause reordering in some cases. I tried checking for such possibilities by querying a simplewiki dump on localhost, but the only query I could think of was a Cartesian join on the revision table, which promptly ate up all my RAM, started swapping, and froze my computer (had to use Alt-SysRq-K), so my empirical attempt to test this failed. :)

Anyway, probably anything has corner cases (using timestamps runs into issues with two revisions of the same page in the same second). It's really not terribly important if there's a rare UI oddity, and if there is we can always fix it later. There's no point in talking about it, someone should just do it, using whatever method they feel like.

*** Bug 16314 has been marked as a duplicate of this bug. ***

Wiki.Melancholie wrote:

*** Bug 8216 has been marked as a duplicate of this bug. ***

Wiki.Melancholie wrote:

When watching a page, it gets changed and you visit its history before viewing itself, you get that cool green 'updated' marker. This means that an oldid info is somewhere in the recentchanges table or probably watchlist table (also used by ENotif!).

It would be nice to have that available for the notification box too, as with SUL you just leave a message somewhere, not knowing if the user will actually see/read it (some users get literally "spammed" with talk messages etc.).

ahmad.m.sherif wrote:

proposed patch

Maybe this patch would help. It also adds how many unreaded messages you have in the notification.

Attached:

(In reply to comment #18)

Created an attachment (id=6129) [details]
proposed patch

Maybe this patch would help. It also adds how many unreaded messages you have
in the notification.

Adding Title::getPreviousRevisionIDAtOffset() is of no use, since we've got Revision::getPrevious() already. Also, there's only one entry per user in the user_newtalk table AFAIK, so returning the number of entries is useless as well in that case.

Also, instead of using the 'newmessage' and 'newmessages' messages, you should use {{PLURAL:}}.

ahmad.m.sherif wrote:

(In reply to comment #19)

Adding Title::getPreviousRevisionIDAtOffset() is of no use, since we've got
Revision::getPrevious() already. Also, there's only one entry per user in the
user_newtalk table AFAIK, so returning the number of entries is useless as well
in that case.

There is an entry for every edit for user's talk page. At least that's what in user_newtalk in my local db. So, there will be one entry or more, so i don't think that Revision::getPrevious() would help me that much.

Also, instead of using the 'newmessage' and 'newmessages' messages, you should
use {{PLURAL:}}.

This could be done, but after reviewing the patch to see if it can be committed to the trunk or thrown in the trash :).

(In reply to comment #20)

(In reply to comment #19)

Adding Title::getPreviousRevisionIDAtOffset() is of no use, since we've got
Revision::getPrevious() already. Also, there's only one entry per user in the
user_newtalk table AFAIK, so returning the number of entries is useless as well
in that case.

There is an entry for every edit for user's talk page. At least that's what in
user_newtalk in my local db. So, there will be one entry or more,

Right.

so i don't
think that Revision::getPrevious() would help me that much.

Those two aren't related. My point is that you're duplicating code from Revision::getPrevious() to Title::getPreviousRevisionIDAtOffset() for no good reason. To get the previous revision ID for a certain revision ID you should really just create a Revision object for it (if you don't have one already) with Revision::newFromID() and call getPrevious() on it.

Also, instead of using the 'newmessage' and 'newmessages' messages, you should
use {{PLURAL:}}.

This could be done, but after reviewing the patch to see if it can be committed
to the trunk or thrown in the trash :).

... or you could do it now; developers are typically more hesitant to commit patches that need fixing up.

ahmad.m.sherif wrote:

(In reply to comment #21)

Those two aren't related. My point is that you're duplicating code from
Revision::getPrevious() to Title::getPreviousRevisionIDAtOffset() for no good
reason. To get the previous revision ID for a certain revision ID you should
really just create a Revision object for it (if you don't have one already)
with Revision::newFromID() and call getPrevious() on it.

AFAIU, Revision::getPrevious(), just like Title::getPreviousRevisionID, gives me *only* the previous revision, I need, for example, 5th revision before the current revision, so I will have to use a loop or something using Revision::getPrevious() to get only the rev id to put it in the link. Title::getPreviousRevisionIDAtOffset() gives me the rev id directly or I'm missing something.

... or you could do it now; developers are typically more hesitant to commit
patches that need fixing up.

On a second thought, will {{PLURAL:...}} allow us to make a link from 'a new message' or 'new messages'? Sorry if the question is stupid :).

(In reply to comment #22)

(In reply to comment #21)

Those two aren't related. My point is that you're duplicating code from
Revision::getPrevious() to Title::getPreviousRevisionIDAtOffset() for no good
reason. To get the previous revision ID for a certain revision ID you should
really just create a Revision object for it (if you don't have one already)
with Revision::newFromID() and call getPrevious() on it.

AFAIU, Revision::getPrevious(), just like Title::getPreviousRevisionID, gives
me *only* the previous revision, I need, for example, 5th revision before the
current revision, so I will have to use a loop or something using
Revision::getPrevious() to get only the rev id to put it in the link.
Title::getPreviousRevisionIDAtOffset() gives me the rev id directly or I'm
missing something.

Hmm. This is probably justified, but it still gives me the creeps. Implementing this more cleanly would require a schema change though.

... or you could do it now; developers are typically more hesitant to commit
patches that need fixing up.

On a second thought, will {{PLURAL:...}} allow us to make a link from 'a new
message' or 'new messages'? Sorry if the question is stupid :).

It'll allow something like {{PLURAL:$1|a new message|$1 new messages}} where $1 is substituted with the number of messages.

ahmad.m.sherif wrote:

(In reply to comment #23)

It'll allow something like {{PLURAL:$1|a new message|$1 new messages}} where $1
is substituted with the number of messages.

Yes I know, but I'm talking about 'a new message' or '$1 new messages' being a link to the talk page.

(In reply to comment #24)

Yes I know, but I'm talking about 'a new message' or '$1 new messages' being a
link to the talk page.

That should still work; those two are independent.

Patch needs to be updated, doesn't apply cleanly anymore

sumanah wrote:

Ahmad Sherif: Thanks again for the patch. I've marked it "reviewed" because it no longer cleanly applies to our codebase. Are you interested in updating it, and then in using developer access to directly suggest it into our Git source control system?

https://www.mediawiki.org/wiki/Developer_access

https://www.mediawiki.org/wiki/Git/Workflow#How_to_submit_a_patch

(In reply to comment #27)

Sumana, I think it's unlikely Ahmad will reply. His last contribution on bugzilla seems to date back to July 2009, and, assuming he is User:AhmadSherif, he appears to be inactive at the Arabic Wikipedia (apparently his home wiki) since September 2010.

In any case, it appears Aaron had begun implementing this back in 2008. At
least he added a user_last_timestamp column to table user_newtalk (see r33520
and two follow-ups r83137 and r854185). But that appears to be all; I don't see
that Db field used anywhere.

With this field present, it is, however, relatively simple to implement the
feature requested here. (Although the name is a bit awkward: one cannot store
the time the user last viewed his or her talk page there, as that's when the
row is deleted, but it needs to store the timestamp of the first revision made
after the user last saw his or her talk page.)

But given that nothing happened for 3 - 4 years on this issue, I'd like to have
some input from Aaron or Roan as to whether that is still desired. If so, I
could invest some time to do this.

Also, there's two issues here on which I'd like to have Roan's or Aaron's
input:

  • First, counting revisions as "number of messages" as Ahmad did in his patch

strikes me as patently incorrect. Consider the same editor amending his last
message: that's two revisions, but one message. Or a vandal editing a talk page
and getting reverted: 2 revisions, zero messages. So I'd prefer to not attempt
counting revisions or messages at all, but just go for the plural in the "(last
change)" part, which then becomes "(last changes)" and a diff to the correct
revision, as originally suggested.

  • Second, Roan said in comment 19 that there was only one row per user talk

page in table user_newtalk. That may have been the intention (the code does an
INSERT IGNORE), but that's not what really happens. The table gets a new row
for each revision. The reason is, I think, that the table is lacking a primary
key; it should have a PRIMARY KEY (user_id, user_ip). I would suggest adding
such a primary key.

(In reply to comment #29)

But given that nothing happened for 3 - 4 years on this issue, I'd like to have
some input from Aaron or Roan as to whether that is still desired. If so, I
could invest some time to do this.

The lack of a PRIMARY KEY can be worked around. I put up a DRAFT change in gerrit for this. If you a gerrit/ldap account, I can make it visible to you to work on. It still needs polish/i18n tweaks.

(In reply to comment #30)

I put up a DRAFT change in
gerrit for this. If you a gerrit/ldap account, I can make it visible to you to
work on. It still needs polish/i18n tweaks.

Yes, I do. Please make it visible for me, then, and I'll go take a stab at this.

Code review: Gerrit change #10545

Bug assigned to patch submitter.

Hi Niklas, assigning to you as a possible review task. Feel free to unassign yourself if you don't think the assignment is appropriate. Aaron's not the best person to review since this was originally his code.

Looks like I missed this patch set before it was merged. The current solution suffers heavily from patchwork disease. It's bound to be pretty much impossible to translate in some languages.

Could this be rewritten to use only 2 messages instead of the current 4.

Current:

  1. You have $1 from {{PLURAL:$3|another user|$3 users}} ($2).
  2. You have $1 from many users ($2).
  3. {{PLURAL:$1|a new message|new messages}}
  4. last {{PLURAL:$1|change|changes}}

Should be:

  1. You have [$1 {{PLURAL:$2|a new message|new messages}}] from {{PLURAL:$3|another user|$3 users}} ([$4 last {{PLURAL:$1|change|changes]}}).
  2. You have [$1 {{PLURAL:$2|a new message|new messages}}] from many users ([$4 last {{PLURAL:$1|change|changes]}}).

Parameters:

  • $1: link to user talk page
  • $2: number of talk page edits since last read
  • $3: number of unique editors of user talk page since last read
  • $4: link to diff for talk page (last visited - to current)

The message probably has to be wrapped in a span so that the external link icon doesn't pop up.

Also, in this case I would have reused newmessageslink and newmessageslinkdiff (despite the fact that it is generally better to create new ones), since the context is the same, and it's preferable to get a non-pluralised message in your language that a patchset one with plural support. In this case, it's backwards compatible..

(In reply to comment #37)

Patch by Siebrand to revert this:

https://gerrit.wikimedia.org/r/#/c/18482/

Is now abandoned, closing this bug again