Page MenuHomePhabricator

Edit counter gives wrong result in replace.py
Closed, ResolvedPublic

Assigned To
Authored By
Xqt
Oct 3 2016, 10:20 AM
Referenced Files
F9168394: withoutchanges.png
Aug 24 2017, 7:13 PM
F9168393: wiki.png
Aug 24 2017, 7:13 PM
F8944579: date script.py
Aug 4 2017, 9:27 AM
F8944572: picture1.png
Aug 4 2017, 9:24 AM
F8857221: botovanje.png
Jul 26 2017, 3:06 PM
F8856555: Snímek z 2017-07-26 14-55-17.png
Jul 26 2017, 12:55 PM
F8856520: Snímek z 2017-07-26 14-51-54.png
Jul 26 2017, 12:52 PM
F8856503: Snímek z 2017-07-26 14-49-42.png
Jul 26 2017, 12:51 PM

Description

When editing a page no page save message is given and the page count remains 0:

C:\pwb\GIT\core>pwb.py replace -fix:isbn -page:"Bernhard Windscheid"
Retrieving 1 pages from wikipedia:de.


>>> Bernhard Windscheid <<<
@@ -52 +52 @@
- * Gabor Hamza: Entstehung und Entwicklung der modernen Privatrechtsordnungen u
nd die römischrechtliche Tradition, Budapest 2009, S. 193-200. <nowiki>ISBN 978
963 284 095 5</nowiki>
+ * Gabor Hamza: Entstehung und Entwicklung der modernen Privatrechtsordnungen u
nd die römischrechtliche Tradition, Budapest 2009, S. 193-200. ISBN 978-963-284-
095-5

Do you want to accept these changes? ([y]es, [N]o, [e]dit, open in [b]rowser, [a
]ll, [q]uit): y

0 pages changed.

But the page was changed by the previous run:

C:\pwb\GIT\core>pwb.py replace -fix:isbn -page:"Bernhard Windscheid"
Retrieving 1 pages from wikipedia:de.
No changes were necessary in [[Bernhard Windscheid]]

0 pages changed.

Event Timeline

Xqt triaged this task as Medium priority.Oct 3 2016, 10:20 AM
Xqt added a project: Regression.
Xqt renamed this task from replace.py gives a from edit result to replace.py gives a wrong edit result.Oct 3 2016, 5:36 PM

Yes, I had problem like this. Thank you @Dvorapa

In T147178#3467784, @Zoranzoki21 wrote:

Yes, I had problem like this. Thank you @Dvorapa

I now have not this problem. Now I have clear real result.

pajton.png (342×597 px, 82 KB)

https://sr.wikipedia.org/w/index.php?title=%D0%9E%D0%B7%D0%B8_%D0%9E%D0%B7%D0%B1%D0%BE%D1%80%D0%BD&diff=prev&oldid=14763727

@Zoranzoki21 Per previous discussions (also in duplicate tasks) the wrong behavior appears only sometimes and the right cause is still unknown

I found a different behaviour when answering "y" vs. "a". With "a" I got the right result.

Hmm. For previous I used custom script.. Now I used python pwb.py replace "online" "editng" -namespace:User -page:Корисник:Zoranzoki21

pajviki.png (342×597 px, 61 KB)

But, it is changed: https://sr.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA:Zoranzoki21&diff=14763753&oldid=14762743

@Zoranzoki21 Per previous discussions (also in duplicate tasks) the wrong behavior appears only sometimes and the right cause is still unknown

Ok.

I found a different behaviour when answering "y" vs. "a". With "a" I got the right result.

@Mpaa in duplicate task got right result with y too

I found a different behaviour when answering "y" vs. "a". With "a" I got the right result.

Yes, I now get real result with a.

ScreenShot_20170726142413.png (342×597 px, 61 KB)

https://sr.wikipedia.org/w/index.php?title=%D0%9A%D0%BE%D1%80%D0%B8%D1%81%D0%BD%D0%B8%D0%BA:Zoranzoki21&diff=14763768&oldid=14763758

@Xqt For y there is _replace_async_callback called and page is saved asynchronously, but for a there is _replace_sync_callback called and page is saved synchronously. The counter is inside the callback. I think Page.save in async mode is maybe returning its callback too late and the last increment is made after page count is outputted. Print bot.changed_pages after callback and compare with output should give a clue.

I added some print messages to see how it works:

Snímek z 2017-07-26 14-55-17.png (462×734 px, 43 KB)

I do not know, how I got real result with y with custom script in user-fixes.py.

With replace without using custom script in user-fixes.py i got 0 pages changed.

Could you provide that custom fix? NOT the screenshot, rather the text.

In T147178#3500033, @Zoranzoki21 wrote:

I do not know, how I got real result with y with custom script in user-fixes.py.

With replace without using custom script in user-fixes.py i got 0 pages changed.

Now I not got real result with custom script in user-fixes.py

picture1.png (342×597 px, 68 KB)

Could you provide that custom fix? NOT the screenshot, rather the text.

Custom script is for changing from English way of writing dates on Serbian.. Script is in file:

I updated bot using git/gerrit. Now he does not make changes and I have real result: 0 changes. QQQ.. Username of my bot is ZoranBot

withoutchanges.png (800×1 px, 209 KB)

wiki.png (800×1 px, 209 KB)

Per my investigation in T147178#3474439 and T147178#3474606: For option yes (y) there is async callback returned after page count is outputted. Still needs an investigation, why async callback is returned so late and how to make async callback return earlier.

(a)sync_callback is executed after a response from the wiki is received. When the last page from the generator is being treated, an asynchronous save action will launch a second thread, which handles the response. However, the main thread continues running and immediately exits the script (with printing the final message), before the second thread is finished.

T151727 & T171713 are two ways how to improve this.

The funniest fact at the end: this script has sync/async saving swapped, compared to other standard bots. In other words, with -always ([a]) it saves synchronously, in manual mode ([y]) asynchronously.

AFAIK, there is only one _putthread, where all _async requests are sent. Am I wrong?

If you change stopme() to use _flush(True) instead of _flush(False), it works.
Maybe it can help to understand?

Framawiki renamed this task from replace.py gives a wrong edit result to Edit counter gives wrong result in replace.py.Aug 30 2017, 9:27 PM

As this bug is still alive a year later, and it only concerns an output that is not essential, I'll submit a patch to temporary disable it.

Change 374894 had a related patch set uploaded (by Framawiki; owner: Framawiki):
[pywikibot/core@master] replace.py: Temporary comment unessential wrong count output

https://gerrit.wikimedia.org/r/374894

If you change stopme() to use _flush(True) instead of _flush(False), it works.
Maybe it can help to understand?

I think the reason is that by the time we try to _putthread.join(1) in _flush(), the main thread is already dead.
Using _flush(True) probably just gives some time to be able to join the thread before the main dies ...

@Framawiki Personally I don't think this should be disabled. Better approximate count than nothing. And this bug is noticeable only when a really low count of pages was edited. For hundreds of pages or more one extra or one missing is not a big deal.

@matej_suchanek @Mpaa Apart from what has been already mentioned (mainly T151727), there are some other possibilities, how to deal with this issue. First of all maybe something like this: 0 pages changed, 1 pages still in progress?? I would prefer this solution and it seems to me as good first task one. Another option could be to pass n pages changed summary on the second thread (and return that output after all jobs of second thread are done after callback is out somehow, but I don't really know how this could work here). Or just make first thread wait for async callback, but this approach is not really async at all.

The funniest fact at the end: this script has sync/async saving swapped, compared to other standard bots. In other words, with -always ([a]) it saves synchronously, in manual mode ([y]) asynchronously.

@matej_suchanek Maybe the funniest, but completely logical. When reviewing one page after another using y or n options, user doesn't want to wait for script to process and save a page. User just wants to see diff and make a decision, see another diff and make another decision, process/save stuff should run in the background (second thread) and not waste user's time. In always mode user usually doesn't care, when script saves the page, it can be synced or asynced, whatever (but better synced in order to have a clean and readable terminal/log output.

Using _flush(True) probably just gives some time to be able to join the thread before the main dies ...

This seems to me like the right direction

This comment was removed by Mpaa.

As this bug is still alive a year later, and it only concerns an output that is not essential, I'll submit a patch to temporary disable it.

Please don't! Something is not always perfect, so do we throw it away totally? Better warn the users that result may be approximate. Some people use feature A, some people use feature B. To mark features used by others as unneccessary or not essential is not the right direction.

The funniest fact at the end: this script has sync/async saving swapped, compared to other standard bots. In other words, with -always ([a]) it saves synchronously, in manual mode ([y]) asynchronously.

That's on purpose. With always it runs without intervention, at its best own speed that can be reached with synchron mode. (It won't be faster from async.) In manual mode the best final speed of the human work is reached with asynchronous saving, although the price is that after pressing q or getting to the end of the list it may take several minutes to save the cached pages while the human in front of the computer can go and have his lunch.
(Sorry for changing the modes at first, I corrected my comment.)

The funniest fact at the end: this script has sync/async saving swapped, compared to other standard bots. In other words, with -always ([a]) it saves synchronously, in manual mode ([y]) asynchronously.

@matej_suchanek Maybe the funniest, but completely logical.

That's on purpose. With always it runs without intervention, at its best own speed that can be reached with synchron mode. (It won't be faster from async.) In manual mode the best final speed of the human work is reached with asynchronous saving, although the price is that after pressing q or getting to the end of the list it may take several minutes to save the cached pages while the human in front of the computer can go and have his lunch.
(Sorry for changing the modes at first, I corrected my comment.)

The point was that the rest of the framework does the opposite (or at least BaseBot extensions).

I didn't contribute to bot.py, but someone will surely explain the reason.

As this bug is still alive a year later, and it only concerns an output that is not essential, I'll submit a patch to temporary disable it.

Please don't! Something is not always perfect, so do we throw it away totally? Better warn the users that result may be approximate. Some people use feature A, some people use feature B. To mark features used by others as unneccessary or not essential is not the right direction.

I've updated my patch : I purpose to say Approximately %s pages changed.

Relying on page_put_queue.qsize() > 0 only, as a condition to keep the while loop alive is not safe as the last item in the queue can be fetched by async_manager() before page_put_queue.qsize() is checked and then found empty. In this case _flush() returns before all pending requests are served.
I thought about an additional Queue as form of synchronization. If some other form of feedback of the state of async_manager() is found, that would be also feasible.
Posting my patch for comments.

Change 375448 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] Make _flush aware of _putthread ongoing tasks

https://gerrit.wikimedia.org/r/375448

I've updated my patch : I purpose to say Approximately %s pages changed.

Don't see the advantage but might be a difference in interpreting the counter meaning.
Couldn't this be solved using the default BaseBot Counter behaviour=

Change 374894 abandoned by Framawiki:
replace.py: Add a warning about the approximating count

https://gerrit.wikimedia.org/r/374894

Change 375448 had a related patch set uploaded (by Mpaa; owner: Mpaa):
[pywikibot/core@master] Make _flush aware of _putthread ongoing tasks

https://gerrit.wikimedia.org/r/375448

Change 375448 merged by jenkins-bot:
[pywikibot/core@master] Make _flush aware of _putthread ongoing tasks

https://gerrit.wikimedia.org/r/375448

Xqt claimed this task.