Page MenuHomePhabricator

archivebot.py didn't archive thread whose sign warp in a element
Closed, ResolvedPublicBUG REPORT

Description

Steps to reproduce:

Expected behavior
Bot should parse timestamp in a element.

Current behavior
It may be identified as unsigned.

Event Timeline

I am sorry, I don't have it. Can you point me the difference links from here and show me what should be happen instead?

Diff link. As this talkpage has marker template placed, that message (left at 2022-01-31T15:48:04 (UTC)) should already be archived.

{{User:Wcam/ArchiveConfig
|archive = User talk:Sunny00217/存檔/%(year)d-%(quarter)d
|algo = old(7d)
|counter = 1
|archiveheader = {{User:Sunny00217/Template:User_talk_save/%(year)d}}
|minthreadsleft = 0
|minthreadstoarchive = 1
}}

Diff link. As this talkpage has marker template placed, that message (left at 2022-01-31T15:48:04 (UTC)) should already be archived.

Ah, that template might be the problem. The timestamp was probably not recognized. See the example at T170034

Not sure if this is the problem, but if the wikilove message could be archived successfully (both of them contains signature inside table), a normal template should not be the reason.

| style="vertical-align:middle; padding-left:20px;" | <div style="font-size:x-large; padding-bottom:5px;">'''新年快樂!'''</div>感謝您過去一年來對中文維基百科的貢獻!祝閣下[[新年]]快樂,萬事如意!—— '''[[使用者:Ericliu1912|Eric Liu]]'''<sub> 創造は生命('''[[使用者討論:Ericliu1912|留言]].[[使用者:Ericliu1912#訪客芳名錄|留名]].[[維基百科:維基學生會|學生會]]''')</sub> 2022年1月31日 (一) 18:48 (UTC)<div style="font-size:x-small; text-align:right; padding-top:5px;">{{color|grey|(模板使用方法參見[[使用者:Ericliu1912/維基友愛模板|此處]])}
(Pdb) dateDict
{'time': {'value': '18:48', 'start': 309, 'end': 314}, 'hour': {'value': '18', 'start': 309, 'end': 311}, 'minute': {'value': '48', 'start': 312, 'end': 314}, 'tzinfo': {'value': 'UTC', 'start': 316, 'end': 319}, 'year': {'value': '2022', 'start': 294, 'end': 298}, 'month': {'value': '1月', 'start': 299, 'end': 301}, 'day': {'value': '5', 'start': 381, 'end': 382}}

It does not match because it matches 5 as day and it is too far from the rest, so it is filtered out by _valid_date_dict_positions().
The design is such that, given a pattern, it attempts to match the rightmost element.

May be we could call also removeHTMLParts() in addition to removeDisabledParts().
It seems unlikely that dates are within html tags.

Change 821799 had a related patch set uploaded (by Mpaa; author: Mpaa):

[pywikibot/core@master] [bugfix] timestripper should timestamps in HTML elements

https://gerrit.wikimedia.org/r/821799

Change 821799 merged by jenkins-bot:

[pywikibot/core@master] [bugfix] timestripper should skip HTML elements

https://gerrit.wikimedia.org/r/821799

Mpaa claimed this task.