Page MenuHomePhabricator

pdlarchiver.py issue
Closed, ResolvedPublicBUG REPORT

Description

Steps to replicate the issue (include links if applicable):

  • run "python3 /data/project/shared/pywikibot/stable/pwb.py archivebot -page:PotsdamLamb User:MiszaBot/config"
  • User:MiszaBot/config is set correctly with the variable of %d

What happens?:

When I run pdlarchiver.py with the miszabot/config (which is set up correctly), it tells me, "Error messages with ‘%’ style is deprecated in favour for str.format() style"
On Thursday, July 21, 2022, I ran it as a test with the %d, and it worked, except that it did not archive the pages correctly as they put them in the 1st archive instead of completing archive 3 and going into archive 4.
New release on Friday, %d did not work, I switched it to %s, then it worked again

What should have happened instead?:
It should have been archived with the d% and put in the archive. I am developing this for the simple Wikipedia, and I cannot expect everyone to change their variables.

I have made changes to my TP to figure it out. If you need to undo the changes, please feel free to do so.
Software version (skip for WMF-hosted wikis like Wikipedia):

Other information (browser name/version, screenshots, etc.):

Event Timeline

I made tests on simlewiki:

D:\pwb\GIT\core>pwb archivebot -page:PotsdamLamb User:MiszaBot/config -simulate -lang:simple
Processing [[simple:User talk:PotsdamLamb]]
3 thread(s) found on [[simple:User talk:PotsdamLamb]]
Looking for: {{User:MiszaBot/config}} in [[simple:User talk:PotsdamLamb]]
Processing 3 threads
12 thread(s) found on [[simple:User talk:PotsdamLamb/Archives/2022/July]]
Archiving 1 thread(s).
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives/2022/July]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb]] saved

D:\pwb\GIT\core>

This works as expected but changing the archive string to use the counter starting with 4:

{{User:MiszaBot/config
|archive = User talk:PotsdamLamb/Archives/%(counter)s
|algo = old(1d)
|counter = 4
|maxarchivesize = 150K
|archiveheader = {{Automatic archive navigator}}
|minthreadstoarchive = 1
|minthreadsleft = 0
}}

fails obviously:

D:\pwb\GIT\core>pwb archivebot -page:PotsdamLamb User:MiszaBot/config -simulate -lang:simple
Processing [[simple:User talk:PotsdamLamb]]
3 thread(s) found on [[simple:User talk:PotsdamLamb]]
Looking for: {{User:MiszaBot/config}} in [[simple:User talk:PotsdamLamb]]
Processing 3 threads
Archiving 1 thread(s).
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives/1]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb]] saved

D:\pwb\GIT\core>
Xqt changed the task status from Open to In Progress.Jul 24 2022, 1:54 PM
Xqt claimed this task.
Xqt triaged this task as Medium priority.

Ok I guess what happened. You probably changed the archive string like here. If the counter is higher than 1 and a archive is not found than it decreases the counter until it founds an archive page. Here is the code of this:

while counter > 1 and not archive.exists():
    # This may happen when either:
    # 1. a previous version of the bot run and reset
    #    the counter without archiving anything
    #    (number #3 above)
    # 2. era changed between runs.
    # Decrease the counter.
    # TODO: This can be VERY slow, use preloading
    # or binary search.
    counter -= 1
    params = self.get_params(thread.timestamp, counter)
    archive = self.get_archive_page(
        pattern % params, params)

I found these archives:

image.png (289×486 px, 27 KB)

For example if you try to archive archive = User talk:PotsdamLamb/Archive%(counter)d and the counter is 4, the bot will count down until it finds the User talk:PotsdamLamb/Archive1 page.

By the way you may always use %(<variable>)s instead of %(<variable>)d. The d variant is for numbers only and will fail for strings whereas s is for both.
Due to the "Error messages with ‘%’ style is deprecated": looks your code of archivebot.py (and perhaps also the framework) is a bit outdated; I guess about one year.

Xqt raised the priority of this task from Medium to Needs Triage.

Xqt I am using the shared repository and I just am in progress of setting it up. Like I had stated it ran fine Thursday night on a test but then Friday morning I had not made the change yet on my page as I was trying to figure out what was going on with the error about the %d. So from what I am reading above I have no choice but to use %s. I saw the new release on Friday and that is where I first realized going through the release notes that d was being removed in the next release.

@PotsdamLamb: I found the problem with %d and creates this task: T313692. Here I just focused the behaviour otf the wrong archive place. The failure was introduced with last stable release. Sorry for that. I try to fix it soon.

No worries! Please let me know when you have it fixed, and I will run mine and see what happens. Should I change my variables back to "d"?

No worries! Please let me know when you have it fixed, and I will run mine and see what happens. Should I change my variables back to "d"?

@PotsdamLamb: I made a patch in https://gerrit.wikimedia.org/r/816323. Are you able to test it. It should now work with s and d but it would be good to test it with d.

Hey ran into another issue after your patch. It should be saving them as User talk:PotsdamLamb/Archives 1 instead, it is saving it as User talk:PotsdamLamb/Archives/1 or User talk:PotsdamLamb/Archives 1 (when I added a space between /Archives and /%(counter)s. It should be a space per Miszabot/Config. You can see the ones it created at https://simple.wikipedia.org/wiki/Special:PrefixIndex?prefix=PotsdamLamb&namespace=3

to save the archive under User talk:PotsdamLamb/Archives 1 the archive entry of the template has to be

|archive = User talk:PotsdamLamb/Archives %(counter)s

You can test your bot's work with the -simulate option; no real changed will be made then.

So I did a test and now I got "TypeError: %d format: a number is required, not str"

I made the change and put the d back in from the patch, now it is failing again.

I have it set as

archive = User talk:PotsdamLamb/Archive %(counter)d

It works for me:

D:\pwb\GIT\core>pwb archivebot -page:PotsdamLamb User:MiszaBot/config -simulate -lang:simple
Processing [[simple:User talk:PotsdamLamb]]
315 thread(s) found on [[simple:User talk:PotsdamLamb]]
Looking for: {{User:MiszaBot/config}} in [[simple:User talk:PotsdamLamb]]
Processing 315 threads
Archiving 304 thread(s).
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives 1]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives 2]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives 3]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb/Archives 4]] saved
SIMULATION: edit action blocked.
Page [[User talk:PotsdamLamb]] saved

D:\pwb\GIT\core>

I guess your Pywikibot is not the actual one. It must be 7.6.0.dev0 or 7.5.1. Seems your version is still 7.5.0. Run pwb version to verify your release. Is you bot running on PAWS or toolforge?

I am on toolforge. Do I need to do a git pull again?

I am on toolforge. Do I need to do a git pull again?

The preinstalled code needs few hours to be updated on toolforge but it will be done within one day I guess. If you have installed your own Pywikibot repository you have to git pull it again if it is 7.5.0 and you are using the stable release. For master the release number is not changed. To verify the current release you have to run pwb version. You should get sth. like Pywikibot: [ssh] pywikibot-core (517de0a, g16831, 2022/07/24, 19:36:50, master). You should see 517de0a which refers to rPWBC517de0a.

So I did both, including the pull, and it still shows I have the outdated version.

Pywikibot: [https] r-pywikibot-core.git (8d74c1f, g1, 2022/07/22, 08:17:47, OUTDATED)
Release version: 7.5.0

Seems you are using the preinstalled stable release somehow. This should change in few hours to 7.5.1. I am afk and cannot check it now.

Yes, if I try to use the master I get fatal: remote error: pywikibot/master unavailable

Change 816703 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@master] [BUGFIX] Add localized "archive" variables to archivebot.py

https://gerrit.wikimedia.org/r/816703

master is rPWB517de0a and stable is 7.5.1 on toolforge.

I changed my mind and reverted the patch I made yesterday. Instead of this I also reverted an very old patch which causal leads to this bug and was undetected for 7 years. In addition I introduced several new variable fields for "archive" template parameter.

Change 816703 merged by jenkins-bot:

[pywikibot/core@master] [BUGFIX] Add localized "archive" variables to archivebot.py

https://gerrit.wikimedia.org/r/816703

Hey, so I did a test on one of our admin's talk page subpages, and it changes the actual variables in the user's config on the page:

Before I ran it: "|archive = User talk:Auntof6/Newsletters/Archives/%(year)d %(month)d"
After I ran it: "|archive = User talk:Auntof6/Newsletters/Archives/%(year)s %(month)s"

Is this the expected behavior from the patch this morning? I was under the impression it was supposed to accept the "s" as the "d".

Thanks,
Chuck

Change 817363 had a related patch set uploaded (by Xqt; author: Xqt):

[pywikibot/core@stable] [backport] backport archivebot.py from master

https://gerrit.wikimedia.org/r/817363

Change 817363 merged by Xqt:

[pywikibot/core@stable] [backport] backport archivebot.py from master

https://gerrit.wikimedia.org/r/817363