Page MenuHomePhabricator

Implement revisionized properties table
Closed, DeclinedPublic

Description

Right now there is no way to attach information to an article without lozing it over time (like "page_props" table does)

By having a revisionized / versioned properties table (like page_props) many (if not, all) of the following will be possible:

  1. Store page protection settings with the revision, undoing/rollbacking will bring back protection info. As will deletion/undeletion
  1. Move categories out of of wikitext. It has been proposed to do this before (ie. store only in categorylinks and report changes in a null-revision edit summary, like with protection currently) - however that is prone to abuse since undoing a revision would mean having to manually copy/paste categories from the history page edit summaries.
  1. Maybe move langlinks out of wikitext ?
  1. File properties [2]
  1. Custom data for exentions (the prop_type column can be used by extension to store other information, that would otherwise have to be stored in a new-table. Some extensions appear to be doing this currently which could cause many tables for the same purpose on a single wiki).

Using a versioned properties table will solve these problems. The revision is connected to a set of properties, and undoing the revision will re-use the previous set of properties (just like a rollback re-uses the same mw_text.oldid_id / mw_revision.rev_id)

It also saves storage in the database as the set is only re-saved when something has actually changed.

In other words, if the user only made a change in the article text, the same old propid is used. If only props are modifed, the same textid/oldid stays in use.

The properties table would have it's sets identified by a unique id, stored in a column in the mw_revision table [1]. The properties table would either be it's own incrementing integer or use the revision id. Comparison:

  • properties-id
    • Since multiple rows belong together the id spans multiple rows. An incrental ID that spans multiple rows is not supported in MySQL and the only solution I can think of is either keeping track of the id elsewhere, or getting the last row and using the next number. Both are not clean.
  • revision-id
    • Using the revision-id is a lot easier. The revision data is saved, the revision-id is known and used to store the properties. This also makes it easy to track which revision last modified the properties (since the id matches the revision-id that created the set of properties).

I think using the revision-id is probably the better choise. Only down side could be that it may cause confusion since it would look like the revision-id, not sure if that's an issue.

Highlights:

  • Store data in properties table, versioned and each set has it's own id. If only props change, same text.oldid is used, if only text changes same propid is used.
  • Connected revision to a set of properties, like text id (ie. a rollback re-uses the oldid that revision, same would be for properties. Rolling back an edit creates a null-revision with the same old text id and properties id.

Krinkle

[1] Adding a column to mw_revision is expensive to say the least but I'm not sure there's a clean and long-term effient way around it.
[2] bug 25624 and http://www.mediawiki.org/wiki/License_integration

See also:

  • (bug 167) Use a dedicated interface for adding interwiki/category links, not wikitext
  • (bug 25624) Making license and author information api accessible
  • (bug 835) Syntax to transclude a page without categories and langlinks
  • (bug 22293) Show previous protection level in protection log
  • more...

Version: unspecified
Severity: enhancement
See Also:
https://bugzilla.wikimedia.org/show_bug.cgi?id=4433

Details

Reference
bz28488

Event Timeline

bzimport raised the priority of this task from to Medium.Nov 21 2014, 11:37 PM
bzimport set Reference to bz28488.
bzimport added a subscriber: Unknown Object (MLST).

Also, maybe tackleable with this:

  • (bug 28476) Rejecting a page move does not undo the change made to the title.
  • (bug 4433) rollback link for a page move should revert the move

Hopefully this isn't too stupid a question ;).

So in this scheme we have a table that would have an entry for categorylinks something (roughly) like:

revision id: 123
prop_type: categorylink
cl_to: some category
cl_from: some page_id

And say you wanted to grab everything in category foo. How would you do that since its now hard to distinguish between current entries and historical entries.


As an aside, a versioned links table would also help with bug 7148 (show category additions/removals on watchlist)

Fyi:

This request (alteast the way I intended it) does not suggest to deprecate any tables (including categorylinks) at all.
A central, effecient, clean categorylinks table is perfet (aggregated to only contain the current status, which it does now).

However, if you would want to go that route (I didn't mean to suggest that, but it's an interesting thought nonetheless), it doesn't have to be a problem:

  • example start --

Article [[Page]] was categorized in Lorem and Foo. In Foo it is sorted under "Mysort".

mw_revision:

  • example row of an edit that changed categories

rev_id: 123
rev_text_id: 120
rev_comment: "Re-categorized [removed: [[Category:Foo|Foo]]; added: [[Category:Bar|Bar]] ]"
// comment is like the nulledits for changing protection settings
rev_props_id: 8

mw_magicpropsthingtable:
prop_id: 7 | prop_type: categorylink | prop_val: 'Lorem'
prop_id: 7 | prop_type: categorylink | prop_val: 'Foo'
prop_id: 8 | prop_type: categorylink | prop_val: 'Lorem'
prop_id: 8 | prop_type: categorylink | prop_val: 'Bar'

  • example end --

While writing this I just realized the sortykey would have to be stored as well, and also that this value doesn't have to be indexed, only retrieved when needed. So it may be better to use a single row [1] and serialize it into a blob:

mw_magicpropsthingtable:
prop_id: 7 | prop_blob: serialize(array(
'categorylinks' => array(

array( 'Lorem', 'Mysort' ),
array( 'Bar', ''),

))
prop_id: 8 | prop_blob: serialize(array(
'categorylinks' => array(

array( 'Foo', '' ),
array( 'Lorem', 'Mysort'),

))

The prop_blob would a multi-line text (like log_params) or serialized php (like old_flags, as example above).

Krinkle

[1]: This would also solve the problem with getting an id for prop_id, it can be an auto-increment now.

(In reply to comment #0)

  1. Move categories out of of wikitext. It has been proposed to do this before

(ie. store only in categorylinks and report changes in a null-revision edit
summary, like with protection currently) - however that is prone to abuse since
undoing a revision would mean having to manually copy/paste categories from the
history page edit summaries.

I'd be skeptical of this one. It would take up a lot of space and many categories are only included via templates. Not sure how this would work out.

(In reply to comment #4)

(In reply to comment #0)

  1. Move categories out of of wikitext. It has been proposed to do this before

(ie. store only in categorylinks and report changes in a null-revision edit
summary, like with protection currently) - however that is prone to abuse since
undoing a revision would mean having to manually copy/paste categories from the
history page edit summaries.

I'd be skeptical of this one. It would take up a lot of space and many
categories are only included via templates. Not sure how this would work out.

See bug 167.

The original use case (https://www.mediawiki.org/wiki/License_integration_MediaWiki) has effectively been solved since then by WikibaseMediaInfo ("Structured data on Commons").

That implementation did not end up with the data indexed for past versions (it would require ad-hoc re-parsing). But that seems fine.

The other use cases described about categories and lang links, are now solvable by MCR instead.