Page MenuHomePhabricator

Frequent exception while trying to extract anchors from task
Closed, ResolvedPublicBUG REPORT

Description

Stuff like this keeps showing up in tools-bastion-03:/data/project/wikibugs/wikibugs.log:

2018-07-06 22:30:13,786 - wikibugs.wb2-phab - ERROR - Could not retrieve anchor for OrderedDict([('PHID-XACT-TASK-t42yllvmbvbxyhm', 'PHID-XACT-TASK-t42yllvmbvbxyhm')])
Traceback (most recent call last):
  File "/data/project/wikibugs/wikibugs2/wikibugs.py", line 227, in process_event
    if self.raise_errors:
  File "/data/project/wikibugs/wikibugs2/wikibugs.py", line 189, in get_lowest_anchor_for_task_and_XACTs
    if anchors:
  File "/data/project/wikibugs/wikibugs2/wikibugs.py", line 176, in get_anchors_for_task
IndexError: list index out of range

Event Timeline

bd808 triaged this task as Medium priority.Feb 13 2024, 9:52 PM
bd808 changed the subtype of this task from "Task" to "Bug Report".
bd808 added a subscriber: Volans.

The get_anchors_for_task method that is raising the error is:

def get_anchors_for_task(self, task_page):
    """
    :param url: url to task
    :type url: basestring
    :returns dict(phid => anchor)
    """
    data_dict_str = task_page.split(
        '<script type="text/javascript">JX.Stratcom.mergeData(0,'
    )[1].split(
        ");\nJX.onload"
    )[0]

    data_dict = json.loads(data_dict_str)
    return {x[u'phid']: x[u'anchor'] for x in data_dict if u'phid' in x and u'anchor' in x}

The task_page input to this function is the HTML source of a page such as https://phabricator.wikimedia.org/T199007 (this task) as retrieved by the requests library. The code is doing something a bit cryptic, but it is essentially trying to extract a JSON blob from the HTML source so that it can parse it and return some cherry picked data from the result.

Grepping through the source of this task's page and a handful of others I picked semi-randomly leads me to believe that PhabricatorPhorge's upstream source was changed at some point such that the desired data is no longer embedded in a javascript source block. Instead I believe that current versions of Phorge embed this data in the data-javelin-init-data property of a <data data-javelin-init-kind="merge"/> tag embedded in the page. I also think this may have been broken for 6 years now without too much notice except when folks are staring at the bot's logs for some other reason.

Change 1003127 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[labs/tools/wikibugs2@master] wikibugs: Extract XACT to page anchor mappings from data-javelin-init-data

https://gerrit.wikimedia.org/r/1003127

bd808 changed the task status from Open to In Progress.Feb 14 2024, 12:29 AM
bd808 claimed this task.
bd808 moved this task from To Do to Needs Review/Feedback on the User-bd808 board.

T1177: Get anchors from API instead of screen scraping is a better long term fix than the patch I have proposed as this is the screen scraping that task proposes to replace.

I also think this may have been broken for 6 years now without too much notice except when folks are staring at the bot's logs for some other reason.

The corollary of that is that maybe linking to a specific comment/action isn't that relevant to begin with - so just removing the functionality might be a sensible option as well :-)

Change 1003127 merged by jenkins-bot:

[labs/tools/wikibugs2@master] wikibugs: Extract XACT to page anchor mappings from data-javelin-init-data

https://gerrit.wikimedia.org/r/1003127

Mentioned in SAL (#wikimedia-cloud) [2024-02-17T00:02:45Z] <wmbot~bd808@tools-sgebastion-11> Restarted wikibugs-phab job to pick up fix for T199007

I'm not seeing any new files created in $HOME/errors/XACT-anchor, so I think this may have worked. If it did then this comment should trigger an IRC message that includes a link to this comment rather than just the general task.

[00:23]  < wikibugs> Wikibugs, Patch-For-Review, User-bd808: Frequent exception while trying to extract anchors from task - https://phabricator.wikimedia.org/T199007#9552605 (bd808) I'm not seeing any new files created in $HOME/errors/XACT-anchor, so I think this may have worked. If it did then this comment should trigge...

Change 1004314 had a related patch set uploaded (by BryanDavis; author: Bryan Davis):

[labs/tools/wikibugs2@master] test_wikibugs2: update test_add_project assertion

https://gerrit.wikimedia.org/r/1004314

Change 1004314 merged by jenkins-bot:

[labs/tools/wikibugs2@master] test_wikibugs2: update test_add_project assertion

https://gerrit.wikimedia.org/r/1004314