Page MenuHomePhabricator

Create a repository for last-visited revisions until they are visited. proposing "LVR" as an acronym for "last-visited-revision"
Closed, InvalidPublic

Description

References: http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete
cycle does not preserve oldid)
http://bugzilla.wikipedia.org/show_bug.cgi?id=454 (Enotif 1.33)

Introducing an abbreviation: last-visited revision (LVR; lvr)

I would like to ask you something with respect to a suggestion to improve the
recent-changes and page-history behaviour.

Status:

As far as I understand the software and Brion, old_id is *not* a permanent and
fixed identifier for a certain revision of a page. It is only valid for a while
and under certain circumstances. However, the Email-Notification patch (Enotif)
and some users request a "(diff-to-my-last-visited-revision)" (lvr-diff) link,
which *is* implemented in the recent Enotif 1.33 patch coming this weekend -
based on old_id - which works unless that lvr revision is deleted in the
databases e.g. by scheduled RC pruning.

Problems:

  1. old_id cannot be used as 100%secure pointer to a certain revision (eg. LVR)

of a page.

  1. Currently, any older revision of a page is deleted after a while

Question to you and proposal:

Given, that the RC History may be pruned after a while and that old_id can
change due to a delete/undelete cycle of that page, I propose to built an
LVR-REPOSITORY (last-visited-revision), which can be compressed.

If a certain pageX is watched by a UserZ, I herewith propose to permanently save
the "last visited revision (lvr) of pageX". This is the page revision just
before the UserZ got an enotif, because someone else edited the pageX to
revision (lvr+1). Enotif 1.33 already has this implemented and knows (lvr), but
does currently not save the page content. This pageX(lvr) must now neither be
touched by regular RC history pruning nor by delete/undelete cycles and must be
saved. To free memory resources, it needs theoretically only be saved *until*
the watching UserZ visits the *current* revision of page - as this action
automatically clears the notification flag (this mechanism being open for
further improvements).

In the worst case, we need a repository of size "total number of watched pages
of all watching users". For example, if 1.000 users have 50 pages in each of
their watchlists, we need a repository for 50.000 pages, which stores the
"last-visited-revisions" for all watched pages for all users.

Please let me know, how you think about my proposal. Enotif could manage the
repository, as it keeps track of users visiting their watch-listed pages. The
repository can be a separate database or realized as flag in the old and rc
databases, which forbids the RC pruning or other routines to manipulate (eg.
delete) that certain LVR.

Invitation

If you have another idea, or if I have overlooked something, which can happen,
please let me know this by mailto:mail@tgries.de?Subject=LVR .

Thanks in advance
Tom
Berlin


Version: 1.4.x
Severity: enhancement

Details

Reference
bz804

Event Timeline

bzimport raised the priority of this task from to Low.Nov 21 2014, 7:03 PM
bzimport set Reference to bz804.

rowan.collins wrote:

I don't really see the need for this; as far as I know, old revisions are
*never* deleted automatically in MediaWiki, and it is perfectly reasonable for
manual deletions to prevent diffs being made against the deleted revisions
(indeed, it would be something of a security glitch if it didn't, since it would
allow any user to view deleted pages, which are normally considered restricted
data).

So all that is actually needed for an lvr-enabled watchlist (which is bug 536)
is revision IDs that are guaranteed to last; the only currently identified
impediments are therefore bug 181 (current revisions do not have a lasting ID in
current DB schema), and bug 603 (old_id not preserved across delete-undelete
cycle).

Unless I'm missing something, such as plans to introduce automatic pruning of
old revisions, I suggest closing this as invalid, or as a duplicate of bug 536.
If there are plans for pruning, perhaps this could be rephrased as a blocker for
that - i.e. if you are developing a pruning system, you need to mark lv
revisions with a do_not_prune flag.

(In reply to comment #1)

I don't really see the need for this; as far as I know, old revisions are
*never* deleted automatically in MediaWiki

I am not sure, whether you are right. How can one get to older revisions of a
page, let's say (worst case), to the initial version ? This appears to be
limited to a certain maximum "go-back" number (say, 500 as a maximum). When I
use page history, I see that I can have 500 at maximum.

But I see, what you mean. And if you are right, that currently all revision can
be retrieved, than only the "keep permanent old_id" problem as in
http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete cycle does
not preserve old_id) need to be solved.

This weekend, I'll have a look to the 1.3.7 and CVS code, if everything fits
together. Thank you you for your valuable comments !
Than I can closed this bugzilla and track only the other
http://bugzilla.wikipedia.org/show_bug.cgi?id=536 . Provisionally, as it looks
now, this 804 depends at least on 536, so I decided to set this dependency flag now.

Tom

Dear Brion,

instead of so ultra-quickly invalidating this bug without any comment, which
could be regarded as unfriendly, please could you as the master brain please
indicate and answer the herein-stated question:

whether really ALL OLD VERSIONS are kept ?

I am not sure. If, and only if really all old versions are kept, then this 804
is a 100% duplicate of bugzilla536 and can be deleted.

We mini-developers cannot overlook all brion-vibber-features of MediaWiki and I
would kindly inspire you to give understandable explanations - from which
certainly many mini-developers can learn, don't you think so ? I guess, that
anyone admires you and the co-workers and your admirable results, as I do, but
the documentation of the MediaWiki code is something, which leaves sometimes
doubts about its meaning and mechanisms, at least for me.

Tom

Yes, every revision is kept in the old table.

(In reply to comment #4)

Yes, every revision is kept in the old table.

Can you then please program (a.s.a.p.) a quick ad-hoc fix to the
http://bugzilla.wikipedia.org/show_bug.cgi?id=603 (delete/undelete cycle doesn
not prevserve old_id) problem ?

Then everyone would be happy:

  • The Enotif patch can permanently point to a certain revision ("lvr") of a page
  • As stated elsewhere, the Enotif patch will very soon display the requested

marker (on the "lvr" revision) - regardless whether the user actually enabled or
disabled to receive mails (I hope, that this is clear: every user can define, if
he/she wants to receive such MAILS. The "lvr" (or updated) marker is shown
independently from sending the mail)

  • Several bugzillas can be closed, when I come up with the "lvr" marker as a

by-product of Enotif .

I guess, that closing this bugzilla is fine now, as the question a) is dealt
with in bugzilla603 (old_id) and question b) is answered now by Brion (yes, all
revisions are kept).

I am happy now, really.

So to summarise:
Can you, Brion, program or propose a QUICK solution to the
http://bugzilla.wikipedia.org/show_bug.cgi?id=603 problem ?

We need permanent IDs. What's about the usage of md5() in this context, perhaps
this leads to a solution: use md5(namespace:page_title:revision-id) as unique
number ? I have made excellent experiences with using md5() in several of my
other programs and I also know the md5() collision paper
http://eprint.iacr.org/2004/199/ , but this discovery shouldn't be a problem for
us. May I say "us" now ?)

Tom

rowan.collins wrote:

Just thought I'd clear up some misunderstandings:

(In reply to comment #5)

We need permanent IDs. What's about the usage of md5() in this context, perhaps
this leads to a solution: use md5(namespace:page_title:revision-id) as unique
number ?

There isn't really any need for a new ID: every revision currently has a unique
key in the "old" table of the database (which in a future version will also
include the current revision of each page). The only reason bug 603 exists is
that the "archive" table doesn't include this information, only the timestamp
and contents - so when a revision is undeleted, it is simply given a new,
unique, value as its old_id.

When I use page history, I see that I can have 500 at maximum.

I think you're misunderstanding the interface here: if you click the "next 50"
link enough times, you will simply continue through the history until you reach
the very first revision; the other links "(20 | 50 | 100 | 250 | 500)" are for
setting how many revisions to show *on each page*. So clicking the "500" will go
to a page with the first 500 revisions and - if there are more - a link labelled
"next 500". [e.g.
http://meta.wikimedia.org/w/wiki.phtml?title=Main+Page&action=history&limit=500&offset=0

  • that page actually has somewhere over 1000 revisions stored]

[btw, were it valid this would have blocked bug 536, not been blocked by it]

(In reply to comment #6)

(In reply to comment #5)
There isn't really any need for a new ID: every revision currently has a unique
key in the "old" table of the database (which in a future version will also
include the current revision of each page). The only reason bug 603 exists is
that the "archive" table doesn't include this information, only the timestamp
and contents - so when a revision is undeleted, it is simply given a new,
unique, value as its old_id.

Thanky you Rowan for explaining. I also think, that you fully understand, what I
need, perhaps others, too:

a permanent revision number for a (namespace:page_title;revision), regardless
where the content acutally is saved in the database tables.

After an accidental or intendend deletion, this revision id must flagged as
("invisible, but in use") - thus not be given free ! - so that normal user
accesses to that specific revision are prohibited --- until possibly this page
revision is later undeleted by WikiAdmin. In this case, exactly that id needs to
be re-born again.

Can someone program this for the next release ? It would close many bugzillas at
once ...

Anyway, thank you so much for explaining.
Tom
Berlin

P.S. Have you tried my Enotif patch, see
http://bugzilla.wikipedia.org/show_bug.cgi?id=454 ?