Page MenuHomePhabricator

After opening a diff, entry on Special:Watchlist sometimes stays unread (bold)
Open, HighPublicBUG REPORT

Assigned To
None
Authored By
JoKerozen
Mar 17 2019, 1:11 PM
Referenced Files
F31655418: Snímek z 2020-03-02 15-10-23.png
Mar 2 2020, 2:10 PM
F31655407: Snímek z 2020-03-02 15-03-16.png
Mar 2 2020, 2:05 PM
F31476967: Snímek z 2019-12-15 21-36-28.png
Dec 15 2019, 8:36 PM
F30050556: Screen Shot 2019-08-20 at 11.09.07 AM.png
Aug 20 2019, 6:10 PM
F30024388: Screen Shot 2019-08-13 at 8.09.14 PM.png
Aug 14 2019, 3:11 AM
F30024191: Visited page shown as unvisited 2019-08-14T02:16
Aug 14 2019, 2:25 AM
Tokens
"Burninate" token, awarded by Thgoiter."Burninate" token, awarded by SerDIDG."Burninate" token, awarded by IKhitron."Burninate" token, awarded by AntiCompositeNumber."Burninate" token, awarded by JJMC89."Burninate" token, awarded by Dvorapa.

Description

on march 15th (around 18:00 (CET) ) a brief bug was noticed concerning special:watchlist on fr:wikipedia.

Steps to Reproduce:
open a diff of your watchlist, reload the watchlist.

Actual Results:
the article title stay in bold.

Expected Results:
the article title should not be bold anymore.

the bug was "resolved" 25-30 minutes after I first spotted it.

Today (march 17th) around 13:30 (CET) I noticed the bug was strinking again. This time, it lasted around 20 mn.

Hewiki comment:
We already saw it every couple of hours. The API query for unread edits returns all rhe bold ones all the waiting time.

Related Objects

Mentioned In
T320865: Watchlist "unviewed" flag incorrectly updated when viewing old revisions or diffs
T321277: DiscussionTools replies are not marked as read
T307173: Edits are incorrectly marked as "updated since your last visit" in page history on betacommons
T275437: Jobs are not getting executed or executed really slowly
T144679: "updated since my last visit" markers not updated when I visited a page
T265513: Viewing pages in Watchlist with "Live updates" on doesn't remove bolding
T250770: Watchlist: `updateWatchlistNotification` job incorrectly marks revisions as "seen"
T188801: Migrate wl_notificationtimestamp updates to the job queue
T219497: Special:Watchlist "Live updates" should update status to read (unbold) upon reading, rather than wait for a new edit.
Mentioned Here
T320865: Watchlist "unviewed" flag incorrectly updated when viewing old revisions or diffs
T250775: Watchlist: While loading the page displays solid (black) bullet points
T250770: Watchlist: `updateWatchlistNotification` job incorrectly marks revisions as "seen"
T250025: Slow response times and 504 Gateway timeouts accross all wiki projects
T240518: Some jobs are not being processed / are processed slowly
T239368: Users on dewiki have been automatically switched to use the enhanced recent changes and watchlist
T226109: Write incident report for jobs not being executed on 1.34.0-wmf.10
T171717: Allow seeing the diff of all changes since last visit directly from Special:Watchlist
T222431: Watchlist doesn't properly track my visits

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

I've checked the JobQueue performance for clearUserWatchlist job and the delay is less then one second. I could find any evidence the problem is in the job queue itself, so I'm not sure what CPT can help with here.

I've checked the JobQueue performance for clearUserWatchlist job and the delay is less then one second. I could find any evidence the problem is in the job queue itself, so I'm not sure what CPT can help with here.

You can try it now, on cswiki I have bold almost all edits from today, even though all diff links are colored as visited by my browser, see screenshot:

Snímek z 2020-03-02 15-03-16.png (743×1 px, 345 KB)

After 5 minutes from the original screenshot and 15 minutes from me visiting that particular page, still one left:

Snímek z 2020-03-02 15-10-23.png (743×1 px, 344 KB)

The last link took 2 hours to be marked as read. It seems to me like the clear user watchlist job leaves some revisions out

Yes, it's become more prominent again in the last day or two on en.wikipedia.org.

Happening a lot in the last few days, including right now, on en.wikipedia.org

Happening a lot in the last few days, including right now, on en.wikipedia.org

Same for me, not just enwiki

I've been having general connection issues here and there today where Wikipedia doesn't load (to various failings; some of it was "hit go in browser and got nothing" and some of it was "got HTML but no CSS/JS") and also experienced this issue at the same time. Maybe related?

It happened to me as well in the last few days, and I noticed that while it is happening I can not upload files on Commons (I am using the upload wizard, and the upload gets queued forever and then says something like "server did not respond on time".

@Izno: General connection issues are a different topic than the watchlist staying bold, hence it is off-topic here. See T250025 - thanks.
@Ymblanter: "Uploading files to Commons is a different topic than the watchlist staying bold, hence it is off-topic here. Please file separate issues - thanks.

My only point was that the uploading problems occur consistently at the same time as the watchlist problems. In the situation when the issues are happening over a year and nobody understands why it might be helpful. If it is not, fine with me.

Happening a lot in the last few days, including right now, on en.wikipedia.org

It is also still happening very often in German Wikipedia. I am disappointed that the developers still did not find a solution for this quite old problem.

This problem is mainly due to JobQueue stalling in production which happens from time to time, but usually resolves itself within a few minutes.

However, in practice this should not be an issue given we only use the JobQueue for long-term storing of the seen markers. There is a 1 hour limited storage of upto 300 reviewed diffs in MW's "MainStash", which should have solved thid problem.

But, I too see this probelm still from time to time and it is consistently "forgetting" items within a minute. Not after an hour, not after 300 items.

This suggests to me what regardless of whether the JobQueue is working, there is likely a bug in the WatchedItemStore's logic for the stashed "1 hour / upto 300 item" value.

kostajh added subscribers: Krinkle, kostajh.

Here's a way to reproduce this locally with MediaWiki-Docker, you can put this in LocalSettings.php (h/t to @Krinkle for the tips)

$wgJobRunRate = 0;
$wgMainCacheType = CACHE_ACCEL;
$wgMainStash = 'db-replicated';

Then, create TestOne and TestTwo articles with an anonymous user, and as another user add them to your watchlist. Now edit the articles with the anonymous user and reload the watchlist. You'll see both appear bold (unseen), but then if you click to open them you'll start seeing the silliness noted above where sometimes the revisions are bold, sometimes they're not, etc. As far as I can tell, running php maintenance/runJobs.php always brings the revisions back to the expected seen/unseen state.

In debugging this, I ended up in WatchedItemStore.php, in the resetNotificationTimestamp() method, where eventually you get to this bit of code

// Mark the item as read immediately in lightweight 
$this->stash->merge(
	$this->getPageSeenTimestampsKey( $user ),
		function ( $cache, $key, $current ) use ( $title, $seenTime ) {
				$value = $current ?: new MapCacheLRU( 300 );

Some odd things about this: the documentation for merge() says the arguments to the method are BagOStuff, cache key, current value, TTL., we don't know what types current value could be, but OK, let's look at at the specific implementation in MediumSpecificBagOStuff. There we end up in mergeViaCas() where current value (that gets passed as an argument to merge() is defined as

$currentValue = $this->resolveSegments(
	$key,
	$this->doGet( $key, $flags, $token )
);

And resolveSegments() says it returns * @return string|null|bool The combined string, false if missing, null on error.

But, back to the code above:

function ( $cache, $key, $current ) use ( $title, $seenTime ) {
	$value = $current ?: new MapCacheLRU( 300 );

When the unseen/seen bug occurs, $current is not string|false|null, but is instead an instance of MapCacheLRU, and the catch is that its maxCacheKeys value is set to null, instead of 300 (as we set it when $current is null). So, if the MapCacheLRU is holding the seen data of a recent item, it is promptly expunged when we do $value->set( $subKey, $seenTime ); later in that method, because MapCacheLRU's set() implementation will empty the cache if the new size is greater than $this->maxCacheKeys, which is null in this case.

This doesn't totally explain the all the problems noted here but seems to be part of the puzzle anyway.

$current is the value stored in the cache key, which can be anything. In this case, it looks like a MapCacheLRU object is stored in the cache key, which presumably gets serialized when stored in the cache and unserialized when retrieved from the cache. MapCacheLRU::serialize() does not return, and MapCacheLRU::unserialize() does not restore, the maxKeys property, so it would seem that that's (part of) the problem.

Change 589183 had a related patch set uploaded (by Catrope; owner: Catrope):
[mediawiki/core@master] MapCacheLRU: Serialize maxCacheKeys property

https://gerrit.wikimedia.org/r/589183

Change 589183 merged by jenkins-bot:
[mediawiki/core@master] MapCacheLRU: Serialize maxCacheKeys property

https://gerrit.wikimedia.org/r/589183

Change 589262 had a related patch set uploaded (by Kosta Harlan; owner: Kosta Harlan):
[mediawiki/core@master] Allow updating notification timestamp where timestamp is not null

https://gerrit.wikimedia.org/r/589262

Change 589358 had a related patch set uploaded (by Krinkle; owner: Krinkle):
[mediawiki/core@master] objectcache: Add regression test for MapCacheLRU serialization

https://gerrit.wikimedia.org/r/589358

Change 589358 merged by jenkins-bot:
[mediawiki/core@master] objectcache: Add regression test for MapCacheLRU serialization

https://gerrit.wikimedia.org/r/589358

A quick update for those watching this task; https://gerrit.wikimedia.org/r/589183 was merged and will ride the next train. Next week there are no deployments so it should be in group 2 wikis on Thursday April 30. At that point I would expect that the bug in this task description will no longer be present.

There is a remaining issue, that might be deserving of its own task:

  • you are watching an article
  • someone makes three edits to that article with revisions 2, 3 and 4
  • revisions 2, 3 and 4 show up on your watchlist because you have the "Expand watchlist to show all changes, not just the most recent" preference enabled
  • revisions 2, 3 and 4 are in bold / unseen, as expected
  • you click on the diff for revision 2
  • you reload the watchlist, before the updateWatchlistNotification job has executed
  • expected behavior: when reloading watchlist, revision 2 is marked as "seen" and revisions 3 and 4 remain "unseen"
  • actual behavior: revisions 2, 3 and 4 are marked as "seen"
    • addendum to actual behavior: when the updateWatchlistNotification job (which was created when you clicked on the diff for revision 2) runs, the watchlist is now in the correct, expected state where revision 2 is marked as "seen" and revisions 3 and 4 are marked as "unseen"

Change 589262 abandoned by Kosta Harlan:
Allow updating notification timestamp where timestamp is not null

https://gerrit.wikimedia.org/r/589262

There is a remaining issue, that might be deserving of its own task: [snip]

Yes please, file one! This one has also been driving me nuts and I've been just on the wrong side of the lazy mark to file it myself. (It also occurs a lot due to interaction with RevisionTimeline.)

There is a remaining issue, that might be deserving of its own task: [snip]

Yes please, file one! This one has also been driving me nuts and I've been just on the wrong side of the lazy mark to file it myself. (It also occurs a lot due to interaction with RevisionTimeline.)

Filed as T250770: Watchlist: `updateWatchlistNotification` job incorrectly marks revisions as "seen" - @Izno, please add your comments re interaction issues with RevisionTimeline to that ticket (and anything else that seems to be relevant).

Checked in betalabs the failing cases mentioned in the comments above. Also checked the overall Watchlist functionality - all seems to be normal. I'll follow up with testing in production after the deployment.

Filed a minor bug - T250775: Watchlist: While loading the page displays solid (black) bullet points

Note: Additional case to check

  • wikidata edits on Watchlist still displayed "Unseen" after visiting the wikidata diff page (clicking on diff)
Etonkovidova claimed this task.

Checked in frwiki and enwiki wmf.30 - did not see the issue happening (my watchlist is rather small, so there might be some issues with bigger watchlists).

Two cases when seen status is still not applied

  • when a diff is viewed in incognito window
  • when wikidata update is seen, but not the wiki page where the wikidata item is used

It happened yesterday for the last time.

It happened yesterday for the last time.

So, the ticket should should be re-open?

@IKhitron - since my testing did not show any obvious problems, it seems that I am missing something. Could you please add some info on the following?

  • how big the watchlist is
  • do you have filters in place when you reload watchlist
  • and do filters, like 'unseen', also display some items with solid circles/bold titles
  • About 4,000 pages.
  • A categorization filter only.
  • Sorry, I do not understand. How can one see filters?
  • About 4,000 pages.
  • A categorization filter only.
  • Sorry, I do not understand. How can one see filters?

@IKhitron can you please post detailed steps of what you did, what you expected to see, and what you saw?

@IKhitron can you please post detailed steps of what you did, what you expected to see, and what you saw?

Sure. But there will be a lot of irrelevant details there.

  1. Open special:watchlist, grouped by days, let's say that now the hour is T.
  2. Choose some line (page A).
  3. Somebody else edits this page since T.
  4. Click on the golden key gadget button. It takes the page name from this line, finds the bottom entry of this page in the shown watchlist, gets the last seen version from diff URL, and opens in another browser tab a diff between this version and cur. So, the user can see a diff of all unseen version at once, those that are in watchlist "7 days, 1000 edits", and those after the watchlist page reload.
  5. Open some page (B) in a third tab that is watched by me, but was not edited from the last time I visited it.
  6. Edit this page.
  7. Return to the watchlist tab.
  8. Click on "What's new?" gadget button. It makes api calls to print all the pages in watchlist edited from T, with links to diffs befween the last seen version and cur.
  9. Expected always and got most of the time: page A and page B are not in the list shown, beacause I just saw the current versions of both.
  10. Got sometimes: page A is shown with a link to the diff from the paragraph 2 above as unseen. Page B is shown with a link to my last edit as unseen.

I made this edit at 2020-05-08 03:08. As of 2020-05-08 03:25, it was shown as unseen on my watchlist, but it shouldn't be since I've clearly seen the page because I made the edit.

Watchlist URL after navigating to https://en.wikipedia.org/wiki/Special:Watchlist: https://en.wikipedia.org/w/index.php?watchlistactivity=unseen&hidepreviousrevisions=1&limit=1000&days=30&damaging__verylikelybad_color=c5&goodfaith__verylikelybad_color=c5&userExpLevel__unregistered_color=c4&userExpLevel__newcomer_color=c4&userExpLevel__learner_color=c3&title=Special:Watchlist&urlversion=2

Taken 2020-05-08 03:20:20 with scripts enabled: F31807414

Taken 2020-05-08 03:25 with scripts disabled (&safemode=1) after a hard refresh: F31807415

Thanks, @IKhitron and @JJMC89 - I will be looking into it today.

I'm not sure if we can reopen this, but the issue still occurs: I have visited https://cs.wikipedia.org/w/index.php?title=Diskuse_s_wikipedistou:Martin_Urbanec&curid=1041170&diff=18497394&oldid=18497390 4 hours ago, but still see it bold in my watchlist

@IKhitron - as far as I could see the gadgets you mention are not on Special:Preferences#mw-prefsection-gadgets list?

It seems that the issue happens when the Watchlist has a large list of pages. However, I'm still struggling to reproduce to reproduce the issue.

@IKhitron - as far as I could see the gadgets you mention are not on Special:Preferences#mw-prefsection-gadgets list?

It is there. WLM, the last one in Watchlist subsection.

@IKhitron - as far as I could see the gadgets you mention are not on Special:Preferences#mw-prefsection-gadgets list?

It is there. WLM, the last one in Watchlist subsection.

Thx!

Doing it on en.wiki right now (Tue Jun 2 2020 7:53 PM PDT). My watchlist is 2,492 entries.

Doing it on en.wiki right now (Tue Jun 2 2020 7:53 PM PDT). My watchlist is 2,492 entries.

@Joeyconnick Could you please provide specific steps to about what you did, what you saw, and what you expected to see? And also let us know if you have any watchlist gadgets enabled?

What I did:
looked at pages that my watchlist indicated had been edited since last time I saw them.

What I saw:
when I returned to the watchlist after either editing said pages, or just viewing their current version, I either saw:
a) the page in its same previous location in the watchlist, but unbolded and with a hollow bullet to its left (for pages I had simply viewed) OR
b) the page at the top of my watchlist, with me listed as the last person having edited it, UNbold and with a hollow bullet

What I expected:
a) the page should not have appeared anywhere in my watchlist as it had no unseen edits OR
b) the page should not have appeared anywhere in my watchlist as I had just edited it

Apparently I have these gadgets:

  • Geonotice: display notices on your watchlist about events in your region
  • Display watchlist notices (This loads the base style for the watchlist. Please do not disable this option.)
  • Display green collapsible arrows and green bullets for changed pages in your watchlist, page history and recent changes (unavailable with the improved Watchlist user interface)
  • Display pages on your watchlist that have changed since your last visit in bold (see customizing watchlists for more options)
  • Subtle update marker: Tone down the "Changed since last visit" indicator on history pages. (By default it renders as a green-filled bar, enabling this gadget changes it to green text.)

In the last twenty minutes this was happened on many wikis at the same time, thanks @DannyS712 for his global watchlist script. And stopped at the same time, so it is not local.

Thanks @Joeyconnick. Could you please also either screenshot your active filters and/or paste the URL you see when you access Special:Watchlist? e.g. for me it's Special:Watchlist?watchlistactivity=unseen&limit=250&days=7&enhanced=1&urlversion=2

Sure:

watchlistactivity=unseen&hidebots=1&hidepreviousrevisions=1&limit=500&days=7&title=Special:Watchlist&urlversion=2

And we're back where it's happening again, right now, en.wikipedia.org

It's much worse in last week. And there is a partial way to recognize the "wrong" diffs. If you open the Special:Watchlist with the filter "unseen only", all the edits should be bolded by definition. Some of the "wrong" ones are not bolded.

I just had the bug again 3 times today (with the same page).

I just had the bug again 3 times today (with the same page).

Yes there was some massive regression or issue today. My watchlist was affected for at least 3 or 4 hours on en.wikipedia.org

And again around noon pacific yesterday (Sun Feb 21) on en.wikipedia.org... and again now (9am pacific Mon Feb 22).

This has been happening for the last two days or so on en.WP.

@Jonesey95 The cause seems to be the same as always. There is a huge backlog of jobs. This causes even high prio tasks like registering that you viewed a page and reflecting that in the watchlist, to be delayed. https://grafana.wikimedia.org/d/LSeAShkGz/jobqueue?orgId=1

it also seems that at 10:00 AM UTC someone changed something as the rate of job processing seems to have dramatically increased from that time. Although it seems that even with this new rate it will still take several days to bring the queue down to 0 again.

@Jonesey95 The cause seems to be the same as always. There is a huge backlog of jobs. This causes even high prio tasks like registering that you viewed a page and reflecting that in the watchlist, to be delayed. https://grafana.wikimedia.org/d/LSeAShkGz/jobqueue?orgId=1

Sometimes, the page is well marked as read, then “updated since your last visit” notice comes back. I just experienced it, and @AlexisJazz experienced it in April 2022. How a job delay can explain that phenomenon?

There is also a report at T320865 about the wrong revisions of pages being marked as read, no idea if it's related.

Sometimes, the page is well marked as read, then “updated since your last visit” notice comes back. I just experienced it, and @AlexisJazz experienced it in April 2022. How a job delay can explain that phenomenon?

I'm only guessing, but this could be caused by replication lag. If the second visit is served by a replica database server that didn't yet receive the "mark as read" update, you might see it reappear even though it is marked as read in the master database.

[…]
Sometimes, the page is well marked as read, then “updated since your last visit” notice comes back. I just experienced it, and @AlexisJazz experienced it in April 2022. How a job delay can explain that phenomenon?

It is explained as follows:

  • Visit a page. This immediately marks it as read in the upto-1-hour upto-300-items temporary stash, and queues a job to persist the mark state to the database.
  • An hour passes and/or the stash exceeds 300 items. The job has not yet run. The temporary stash expires.
  • The item now appears unread again until the job runs.

If the mark state information were to be stored as well on the client side, e.g. in a cookie, then the user experience wouldn't be dependent (or would be considerably less dependent) on stashes and job queues, am I right? Since there is no "mark as unread", you can just do a logical OR of the local and remote information.

How a job delay can explain [notice come back after having disappeared]?

I'm only guessing, but this could be caused by replication lag.

[…] The item now appears unread again until the job runs.

In both explanations (thank you, those are clear!), the revisions would be marked as read after a while. I’m checking a case on fr.wiktionary: 22 hours later, it is still “unread”. What is the earliest time I can be sure the job has actually failed? Is that kind of job abortion logged?

e.g. in a cookie

https://en.wikipedia.org/wiki/LocalStorage would be used in that case. The date for the diff entry doesn't appear to be machine-readable so this would have to done based on the revid.

But technically feasible. Question is, does this happen often enough to justify some localStorage use and the effort to write a script that does this?

For anyone interested in taking it on, I would do it like this (simplified):

When the user views a page (mw.config.get('wgAction')):

  1. Read the localStorage object this gadget maintains or create an empty object ( window.DejaDiffObj = JSON.parse((mw.storage.get('DejaDiff')||'{}')) )
  2. Put mw.config.get('wgCurRevisionId'), the current date ( new Date().getTime() ) and a CRC32 checksum of page title (see https://en.wikipedia.org/w/index.php?title=MediaWiki:Gadget-EditNoticesOnMobile.js for a function) in the object.
  3. Loop over existing entries and remove those older than a few days.
  4. Write the object. ( mw.storage.setItem('DejaDiff',JSON.stringify(window.DejaDiffObj) )

When the user views a history page:

  1. Check if any updated marker exists ( $('.mw-history-line-updated')[0] ). If not, end script.
  2. Read the localStorage object.
  3. Create CRC32 checksum for the page title, check if there's an entry in the localStorage object, obtain the stored revision ID.
  4. If yes, loop over entries in $('.mw-history-line-updated'), read the revision ID ( $('.mw-history-line-updated')[0].attributes['data-mw-revid'].value ).
  5. If it's lower than the stored revision ID, remove the updatedmarker ( $('.mw-history-line-updated')[0].querySelectorAll('.updatedmarker')[0].remove() )

I’m checking a case on fr.wiktionary: 22 hours later, it is still “unread”. What is the earliest time I can be sure the job has actually failed? Is that kind of job abortion logged?

Still in “not seen” status. The job surely aborted, but I don’t notice any issue in ActivityUpdateJob graph. Is this the concerned job? Are job abortion logged?