Page MenuHomePhabricator

log_page not populated on overwrite log item during race condition
Open, MediumPublic

Description

From https://lists.wikimedia.org/pipermail/mediawiki-api/2014-December/003402.html

I found similar cases that UPLOAD update is lack of page_id. recentchange
page
<http://commons.wikimedia.org/w/api.php?action=query&format=xml&rccontinue=20141201163618|142850059&continue=-||&list=recentchanges&rclimit=500&rcnamespace=0%7C2%7C4%7C6%7C10%7C14%7C100%7C828&rcprop=comment%7Cflags%7Cids%7Cloginfo%7Csizes%7Ctimestamp%7Ctitle%7Cuser>

<rc type="log" ns="6" title="File:Siegelmarke Der Persönliche Adjutant
Seiner Königlichen Hoheit des Prinzen Georg von Preussen W0204563.jpg"
pageid="0" revid="0" old_revid="0"rcid="142847103" user="Nicolas Rück (WMDE)
" oldlen="0" newlen="0" timestamp="2014-12-01T16:29:40Z"
comment="[[Commons:GLAMwiki
Toolset Project|GWToolset]]: Creating mediafile for Nicolas Rück (WMDE)."
logid="103976391" logtype="upload" logaction="*overwrite*" img_sha1="
swlwd4u41zhuaa0sq0x2pzuhzkr7xna" img_timestamp="2014-12-01T16:29:40Z"/>

<rc type="log" ns="6" title="File:Siegelmarke Glogau - Saganer Fürstenthums
Landschaft W0204559.jpg" pageid="0" revid="0" old_revid="0" rcid="142847096"
 user="Nicolas Rück (WMDE)" oldlen="0" newlen="0" timestamp="
2014-12-01T16:29:40Z" comment="[[Commons:GLAMwiki Toolset
Project|GWToolset]]: Creating mediafile for Nicolas Rück (WMDE)."logid="
103976394" logtype="upload" logaction="*overwrite*" img_sha1="
3o7mqqobcibbnnwj8x166gmrxj5rbtb" img_timestamp="2014-12-01T16:29:40Z"/>

<rc type="log" ns="6" title="File:Siegelmarke Königlich Preussisches Haus -
Archiv W0204564.jpg" pageid="0" revid="0" old_revid="0" rcid="142847109"
user="Nicolas Rück (WMDE)"oldlen="0" newlen="0" timestamp="
2014-12-01T16:29:40Z" comment="[[Commons:GLAMwiki Toolset
Project|GWToolset]]: Creating mediafile for Nicolas Rück (WMDE)." logid="
103976393"logtype="upload" logaction="*overwrite*" img_sha1="
sx4uuswlcbh3itwrr4dy67u0s9ca18h" img_timestamp="2014-12-01T16:29:40Z"/>

It seems the lack of page_id problem only shows on *logaction="overwrite"*
situation. Could you please take a look and see if this is a systematic bug?

Thanks,
Yijun

Investigated, appears to be a race condition (Same file given to gwtoolset twice, uploaded within 1 second of each other). What I think is happening is:

in LocalFile::recordUpload2 during the second (overwrite) upload:

`
$exists = $descTitle->exists();
`

returns false due to slave lag, or perhaps the other image save hasn't even created the page yet

Eventually we get to the create the description page code:

                        $content = ContentHandler::makeContent( $pageText, $descTitle );
                        $status = $wikiPage->doEditContent(
                                $content,
                                $comment,
                                EDIT_NEW | EDIT_SUPPRESS_RC,
                                false,
                                $user
                        );

                        $dbw->begin( __METHOD__ ); // XXX; doEdit() uses a transaction
                        // Now that the page exists, make an RC entry.
                        $logEntry->publish( $logId );
                        if ( isset( $status->value['revision'] ) ) {
                                $dbw->update( 'logging',
                                        array( 'log_page' => $status->value['revision']->getPage() ),
                                        array( 'log_id' => $logId ),
                                        __METHOD__
                                );
                        }
                        $dbw->commit( __METHOD__ ); // commit before anything bad can happen
`

At this point the page exists in the db, so $wikiPage->doEditContent fails (due to the EDIT_NEW). Note that in the examples of this bug, there is only one entry in the page history of the file. As a result, $status->value['revision'] is not set, so we do not update the logging table entry's log_page. The recent changes entry also has wrong rc_cur_id, since that comes directly from $descTitle->getArticleId(), which if we're at this code branch, would be cached and 0, and since the doEditContent failed, $this->mTitle->resetArticleID( $newid ); was not executed.

Suggested solution:
In the case that doEditContent fails with "edit-already-exists", we should probably have some code that does something along the lines of $descEdit->getArticleID( Title::GAID_FOR_UPDATE ); before the rc publish line, to force reloading $descTitle from master, and then use that value in the logging table UPDATE code.

Event Timeline

Bawolff raised the priority of this task from to Needs Triage.
Bawolff updated the task description. (Show Details)
Bawolff changed Security from none to None.
Bawolff added subscribers: Bawolff, dan-nl, Gilles.
Gilles triaged this task as Medium priority.Dec 2 2014, 9:00 AM

Solution above makes sense to me.