Page MenuHomePhabricator

Deletion log excerpt (mw-warning-with-logexcerpt) not shown when only curid given and page has been deleted
Open, NormalPublic

Description

With this <https://de.wikipedia.org/w/index.php?curid=1774043 Permalink> (missing original linktext) you will get the "Ungültiger Titel"-page (Bad title) instead of an page containig a "missing mw-warning-with-logexcerpt mw-content-ltr"-log-info-section.

this behavior is no good idea in case of permalinks.

hint: curid 1774043 was :de:Diskussion:Holocaust/Archiv2 https://de.wikipedia.org/wiki/Diskussion:Holocaust/Archiv2


Version: 1.24rc
Severity: minor
URL: https://de.wikipedia.org/w/index.php?curid=1774043

Details

Reference
bz71578

Event Timeline

bzimport raised the priority of this task from to Normal.
bzimport set Reference to bz71578.
bzimport added a subscriber: Unknown Object (MLST).
Boshomi created this task.Oct 2 2014, 10:17 PM

Thanks for taking the time to report this!

Is that a general problem "on deleted pages"? Are there more examples?

I tried to rephase the summary; let me know if I'm mis-stating something.

Tgr added a subscriber: Tgr.Dec 28 2015, 4:41 PM

The nice behavior would be to look up the title in MediaWiki::parseTitle() and from there MediaWiki would handle it the same way as if a title was specified. Or maybe even to make it redirect to the title.

Unfortunately looking up the title from a deleted page id is messy. page table records are deleted physically; page ID is stored in the revision table but not indexed and overwritten with the new ID on undeletion (cf. T28123); the only reliable method seems to be to find the last delete log event with the given page id. Not sure how effective that is, though; that would be something like SELECT page_id FROM logging WHERE log_page = $page_id AND log_type = 'delete' AND log_action = 'delete' ORDER BY log_timestamp DESC LIMIT 1 (or maybe order by log_page) and the relevant indexes are (log_page, log_timestamp) and (log_type, log_action, log_timestamp), so there is no single covering index, and a single page can have lots of log records. (Tens of thousands in extreme cases, I'd guess? The number of patrol log records would be proportional with the number of revisions.)

Tgr added a subscriber: jcrespo.Jan 3 2016, 8:31 PM

@jcrespo what do you think of the previous comment? Is it feasible to use that query, or to add a new index that covers it (although I don't think the impact of this bug would justify adding a new index to one of our largest tables)? This would only get invoked for URLs with ?curid=XXX and no page name, so the query would not be invoked often.

Please note patch https://gerrit.wikimedia.org/r/#/c/239319/

Please have performance / security on the loop for changes related to this,
they will have more information regarding ongoing concerns. Once they
provide feedback, I will be happy to help with schema changes if needed,
although by the look of it, T64615 may be a soft-blocker (although probably
easier to fix for a specific query).

Prtksxna removed a subscriber: Prtksxna.Jan 7 2016, 12:21 AM
Tgr added a comment.Jan 12 2016, 12:13 PM

Thanks for pointing that out. 404 pages can easily be reached by bots, while curid URLs cannot, so I don't think DOS by well-meaning but poorly behaving bots is a concern here... I'll make sure to add you and the perf team for code review though.