Page MenuHomePhabricator

Investigate instances of successfully posted comments without an associated logged saveSuccess event in EditAttemptStep
Closed, ResolvedPublic

Description

There are some edit sessions that are logged as successfully posted comments (recorded in talk_page_edit as being published); however, there is not an associated save event (event.action = 'saveSuccess') in EditAttemptStep.

While reviewing data for the New Topic Tool AB test in T277825, we found that this impacted only discussiontool sessions and was found to impact about 14% of AB test sessions where a new topic was published. It's possible this may be related to the connection-related issues mentioned in T304771 but further investigation is needed to confirm.

The purpose of this ticket is to confirm the extent and specific type of events where this issue occurs and identify the cause.

Requirements:

  • Review data to confirm the extent of impact and specific type of events where this issue occurs. This will include confirming if those sessions only come from discussiontools (event.integration = discussiontools), if they all include init events, and if either an abort or SaveFailure action was logged.
  • Identify the possible source of the issue

QA

To QA this:

  • be in a state will will log events (trackdebug=1 or similar), with console state preserved between pageloads (because a lot of logging is happening alongside a reload).
  • verify that a saveSuccess event is sent when:
    • you leave a normal reply, to make sure I didn't break what was already working
    • you post a new topic on a blank page (which will reload the page)
    • you leave a reply while on the mobile version of the page (which will reload the page)

You can't easily test the final case, since it relies on an unlikely API failure.

NOTE: the work to verify the extent to which gerrit:777833 addresses the issue this ticket is naming will happen in T305595.

Event Timeline

Copying in a few things from a discussion thread:

Is it common? If it's not (or if it is and we have bigger problems), this could be timeouts, where we submit the edit and then the connection breaks such that we can't tell whether the save succeeded or not.
Looking at the logging, technically it could also be edits being saved to other pages (e.g. through templates) if we couldn't purge the cache for the current page. Not sure how easy that'd be to work out, but if there's a common API failure there it'd cause the same issue.

@MNeisler other possible-useful things to know:

  • is this only happening in new-topic sessions?
  • is this happening only when creating a new page?

Change 777833 had a related patch set uploaded (by DLynch; author: DLynch):

[mediawiki/extensions/DiscussionTools@master] Log saveSuccess more consistently

https://gerrit.wikimedia.org/r/777833

Cases where saveSuccess wasn't logged:

  • creating a new page with the New Topic tool
  • any replies on mobile
  • successful replies made through transclusions which then couldn't purge the current page

These were all cases where we abandoned the post-save process early to reload the page.

With this patch saveSuccess will either mean "the comment successfully posted and we have finished inline-reloading the page" or "the comment successfully posted and we are about to fully-reload the page".

Change 777833 merged by jenkins-bot:

[mediawiki/extensions/DiscussionTools@master] Log saveSuccess more consistently

https://gerrit.wikimedia.org/r/777833

DLynch added a project: Editing QA.

To QA this:

  • be in a state will will log events (trackdebug=1 or similar), with console state preserved between pageloads (because a lot of logging is happening alongside a reload).
  • verify that a saveSuccess event is sent when:
    • you leave a normal reply, to make sure I didn't break what was already working
    • you post a new topic on a blank page (which will reload the page)
    • you leave a reply while on the mobile version of the page (which will reload the page)

You can't easily test the final case, since it relies on an unlikely API failure.

@DLynch Cases:

1:

Screenshot 2022-04-11 at 07.09.22.png (574×2 px, 213 KB)

2:

Screenshot 2022-04-11 at 07.12.15.png (1×1 px, 460 KB)

  1. No event was triggered, Is that normal?

You can't easily test the final case, since it relies on an unlikely API failure.

Does this mean I won't see any event get triggered?

No event was triggered, Is that normal?

On mobile? It should have an event, but I haven't verified. I'll double check in mobile mode myself.

Does this mean I won't see any event get triggered?

I mean I don't think you can deliberately reproduce the scenario. If you could it'd now trigger the event.

On mobile? It should have an event, but I haven't verified. I'll double check in mobile mode myself.

Testing locally, I see a saveSuccess event when leaving a reply on a useformat=mobile page with DT. I'd need to know more details about how you're testing to know why you might not be seeing one.

On mobile? It should have an event, but I haven't verified. I'll double check in mobile mode myself.

Testing locally, I see a saveSuccess event when leaving a reply on a useformat=mobile page with DT. I'd need to know more details about how you're testing to know why you might not be seeing one.

Help take a look at https://photos.app.goo.gl/WEupmznN6G7PFVgQ6 and https://photos.app.goo.gl/ATGkoPe2DKwsjGVe6. I might be doing something wrong, thanks.

The second one isn't testing DiscussionTools -- that's a Flow page. Totally different system.

The second one isn't testing DiscussionTools -- that's a Flow page. Totally different system.

I figured. The experience was different. I was exploring what the Structured Discussions on user talk pref did. What am I doing wrong in the first though?

Oh, sorry. On the first one that's not DiscussionTools either -- that's the MobileFrontend new topic tool. If DT was enabled, it would replace it, but it seems that it's not.

DT would look like this:

image.png (1×1 px, 258 KB)

Finally got past this. However,

Screenshot 2022-04-13 at 20.33.07.png (412×3 px, 434 KB)
is what my console looks like.

I mean I don't think you can deliberately reproduce the scenario. If you could it'd now trigger the event.

Given the above, I can verify this since it works fine for the other cases work fine as seen in T305541#7843982

@EAkinloose that screenshot cuts off halfway through the event before where I'd expect saveSuccess to be -- is that because nothing followed it?

If I go to: https://en.m.wikipedia.beta.wmflabs.org/wiki/User_talk:TestEsther/Archive_1?dtenable=1&trackdebug=1 and post via the DT form, I see a saveSuccess event:

image.png (518×874 px, 185 KB)

@EAkinloose that screenshot cuts off halfway through the event before where I'd expect saveSuccess to be -- is that because nothing followed it?

If I go to: https://en.m.wikipedia.beta.wmflabs.org/wiki/User_talk:TestEsther/Archive_1?dtenable=1&trackdebug=1 and post via the DT form, I see a saveSuccess event:

image.png (518×874 px, 185 KB)

Screenshot 2022-04-19 at 11.41.34.png (754×3 px, 574 KB)
makes more sense as it is similar to your experience. We're good here.

Is the 502 Error something to worry about?

Is the 502 Error something to worry about?

Not sure -- it's probably just just something not being correctly configured on the beta cluster. 502 is bad gateway, so I'd guess whatever intake-analytics does isn't passing things through to some other server correctly. Note that just visiting https://intake-analytics.wikimedia.beta.wmflabs.org/ shows a 502.