Page MenuHomePhabricator

Add page creator index to MediaWiki core
Open, LowPublic

Description

There's currently no way to easily get the creator of a page. This makes it difficult to write reports about who wrote the most articles or make a list of the pages a person created.

I think it'd be nice to have a page_creator column in the page table to track who has created a page. It'll have some edge cases like page moves, but I think this is acceptable.

So this would have four pieces, I guess:

  • add a page_creator column to the page table (255 bytes, has to support anon page creations);
  • add an index to this column (so that you can sort by user);
  • write a maintenance script to populate this column; and
  • fix up Extension:RenameUser to make sure it accounts for this column in user renames.

Version: unspecified
Severity: enhancement
URL: https://www.mediawiki.org/wiki/Manual:Page_table

Details

Reference
bz42135

Event Timeline

bzimport raised the priority of this task from to Low.Nov 22 2014, 1:07 AM
bzimport set Reference to bz42135.
bzimport added a subscriber: Unknown Object (MLST).

Another solution is to log the creation of pages on Special:Log (bug 10331).

Or we could store the first revision.

(In reply to comment #2)

Or we could store the first revision.

Store the first revision? You mean a separate table that just stores the data from revision, but limited to the first revision only?

Store the first revision's ID? Then you can go look up the ID.

Actually maybe you could do something like this instead:
select rev_user from revision where rev_parent_id=0 and rev_page=$pageid

(In reply to comment #2)

Or we could store the first revision.

Okay, let's index the first revision. It should be simple enough to add a page_first column (or similar) to complement page_latest.

(In reply to MZMcBride from comment #0)

There's currently no way to easily get the creator of a page. This makes it
difficult to write reports about who wrote the most articles or make a list
of the pages a person created.

FYI, the latter can be done with rev_parent_id=0 and the username in the where clause.

@MZMcBride thanks for the ping. A page_creator column would work, but I don't see why we can't just use the logging column. It wouldn't need any schema change, and we would get the UI for listing pages by creator for free.

@PiRSquared17 rev_parent_id=0 is a good idea, I never thought of that. I wonder how reliable it is... May fail for older revisions that existed before rev_parent_id was introduced.

@MZMcBride thanks for the ping. A page_creator column would work, but I don't see why we can't just use the logging column. It wouldn't need any schema change, and we would get the UI for listing pages by creator for free.

Wouldn't using the logging table mean that any page creations for the past 15 years would be ignored? If so, that seems like a non-starter.

I currently maintain https://en.wikipedia.org/wiki/Wikipedia:List_of_Wikipedians_by_article_count. When I re-implemented it after the death of German Toolserver I, I used @Krenair's suggestion of storing the first revision ID instead of the author name, as the revision ID is a lot less likely to change and is a lot shorter to store.

TameeshB removed a subscriber: TameeshB.
TameeshB added a subscriber: TameeshB.

Unclear which of the approaches is preferred, hence removing good first task tag. Feel free to re-add good first task once it's been made clear for a contributor how to proceed.

With T12331 solved, it seems the scope of this task is effectively reduced to the issue of historical data and whether or not we want to backfill it.

The only purpose of backfilling it would be so that for old pages, you can save 1 or 2 clicks by finding it on Special:Log, instead of oldest revision on action=history.

  • Benefit: Consistent user experience - There will be an entry in the log for all created pages.
  • Downside: For all data before T12331, the data would potentially be a lie, given we simply do not have accurate historical data to accurately prove it. "Import", "undelete" and other actions may have altered the history.

If this does get used, perhaps there should be a way to update this value to reflect things like page splits/merges/etc?