Page MenuHomePhabricator

Specify PageTypeHandler
Open, Needs TriagePublic

Description

The "page type" mechanism is intended to provide a compact way for extensions to control the behavior (display, editing, move, delete, etc) of certain wiki pages. The concept of page types is already present in MediaWiki, but not explicitly modeled or specified. Rather, the behavior is hard coded into special cases.

Examples of existing "page types" in MediaWiki core:

  • article (the default)
  • talk page (talk namespaces)
  • category page (the Category namespace)
  • file description page (the File namespace)
  • script page (Js or CSS, user or global, active content to be interpreted by the browser) (.js and .css suffix in the MediaWiki and the User namespace)
  • system message (MediaWiki namespace)
  • conversion table (MediaWiki namespace with Conversiontable/ prefix)
  • ...

All of these trigger special behavior during display, editing, purging, moving, etc.

PageTypeHandler would model this concept, allowing extensions to easily define their own page types. It should control at least:

  • layout for the view action. The PageTypeHandler may be aware of certain content slots, and may show their content as appropriate. It may or may not show additional slots using a generic layout mechanism.
  • editing mechanism. The PageTypeHandler may be aware of certain slots, and may provide an integrated editing experience for their content. It should provide a some way to access editing interfaces for any additional slots.
  • which slots are allowed, required, and desired on pages of this type.
  • Generic action overrides (to replace ContentHandler::getActionOverrides)
  • behavior to be triggered upon creation, modification, and deletion, as currently encoded in WikiPage::doEditUpdates and WikiPage::doDeletionUpdates
  • Constraints on page moves. Since page moves change the title, the may change the page type. In such a case, the old and the new page type handler have to both agree that the move is possible. This replaces ContentHandler::canBeUsedOn.

Ideally, a page's "type" is determined solely by it's title, regardless of database state. It would typically be based by the namespace, but in some cases, a title suffix or prefix may also trigger a certain page type.

However, several extensions exist that cause special behavior to be triggered based on the content model (of the main slot). it may be necessary to retain this behavior, at least for a transition period. Whether it may be unavoidable in principle or even desirable remains under discussion as of this writing.

Related Objects

StatusSubtypeAssignedTask
Declineddchen
OpenNone
OpenNone
DuplicateNone
OpenFeatureNone
OpenBUG REPORTNone
OpenNone
StalledNone
OpenFeatureNone
DuplicateNone
ResolvedNone
OpenNone
OpenNone
OpenFeatureNone
OpenNone
ResolvedNone
ResolvedNone
OpenFeatureNone
OpenNone
OpenFeatureNone
StalledNone
OpenNone
OpenNone
OpenNone
OpenNone
Resolveddaniel
Resolveddaniel
OpenNone
OpenNone

Event Timeline

Copied from https://gerrit.wikimedia.org/r/c/mediawiki/core/+/434544/32/includes/Title.php#1023 for further discussion here:

@Anomie

This should be deprecated somehow since the Title doesn't have *a* content model anymore. But I'm not sure if making it be based on the page type actually makes sense.

The calls in core seem geared towards preventing someone from changing an existing page('s main slot) to CONTENT_MODEL_CSS, CONTENT_MODEL_JAVASCRIPT, or CONTENT_MODEL_JSON without the user rights necessary to edit pages with those content models. Extension hooks might do other model-based checks.

If you want a separate page type for those pages, then the PageTypeRegistry matchers will have to allow inspecting the type of the main slot so it won't depend directly on the title text and namespace as you state here. If you base the page type only on title text and namespace, then you break everything that's using a non-default content model.

@Tgr

BTW, I don't see that page moves actually call this method. MovePage calls ContentHandler::canBeUsedOn() instead, since it moves the page without changing the model.

That's probably a bug though. MovePage calls $newTitle->getUserPermissionsErrors('edit') and that can trigger content type checks, except at that point the new title has no content type. (But to handle that correctly not just the content type would have to be faked but stuff like exits() as well.)

@daniel

If you want a separate page type for those pages, then the PageTypeRegistry matchers will have to allow inspecting the type of the main slot so it won't depend directly on the title text and namespace as you state here. If you base the page type only on title text and namespace, then you break everything that's using a non-default content model.

I don't see what that would break. The page type would determine what slots can exist, and what models they can have. As always, this needs to be done in a way that allows old content that does not conform to not cause any critical issues, so they can still be viewed, deleted, restored, etc.

The content model is used to be influenced directly by page moves, because rev_content_model would usually be null, so the actual model would be derived from from the title. With the MCR schema, this is no longer the case, and page moves do not impact the model at all. Moves may in the future change the page type; in that case, the old and the new page type handler would both have to agree that the move is possible (e.g. the page type handler for wikibase entities would deny page moves into or out of entity namespaces).

As for the permission checks: they should be moved out of Title entirely, as per T208768. Once that exist, we can deprecate this method. Until then, we have no replacement for this method, so we can't deprecate it.

@Anomie

I don't see what that would break. The page type would determine what slots can exist, and what models they can have. As always, this needs to be done in a way that allows old content that does not conform to not cause any critical issues, so they can still be viewed, deleted, restored, etc.

Which "page type" should User:Foo/Bar have? Which "page type" should User:Foo/Bar.js have?

If you didn't give a specific answer for one or both, then your statement that the page type depends directly on namespace+title isn't possible. It would also have to depend on the reason you couldn't give a specific answer.

If the answers are not the same for both, you're breaking existing behavior if someone set the main slot's content model to 'javascript' for the former or 'wikitext' for the latter. For the things this method is used for, MediaWiki determines the treatment based on the current model rather than the default model for these pages.

If the answers are the same, then you can't determine whether a move or content model change should be allowed just on the basis of the page type. It also needs knowledge of the specific models involved.

Moves may in the future change the page type

By that do you mean "until that maybe-future moves that would change the page type would be forbidden"? Depending on how granular your page types are, that could break stuff. Would "FooBar" and "User:Example/FooBar" have the same page type? "User:Example/FooBar" and "Template:FooBar"?

@Anomie wrote

Which "page type" should User:Foo/Bar have? Which "page type" should User:Foo/Bar.js have?

  • User:Foo/Bar would be have the default page type ("regular" or "article" or something).
  • User:Foo/Bar.js would have the "script" type (or "active content" or something)

If the answers are not the same for both, you're breaking existing behavior if someone set the main slot's content model to 'javascript' for the former or 'wikitext' for the latter.

JavaScript content in a "regular" page would be fine, it would just not have the special "active" behavior that it has as a user script of site script. It would be rendered as JS, and edited as JS, but not used as JS.

The "wikitext" content model would indeed be unsupported on User:Foo/Bar.js, though it wouldn't break existing pages. Maybe they could even remain editable. But they should trigger a warning, and it should not be possible to create more pages like that.

For the things this method is used for, MediaWiki determines the treatment based on the current model rather than the default model for these pages.

But since we no longer have *one* model, this is now dubious. We could still determine the page type based on the main slot's content model, but I'd really really like to avoid that. It just causes confusion and nasty edge cases.

If the answers are the same, then you can't determine whether a move or content model change should be allowed just on the basis of the page type. It also needs knowledge of the specific models involved.

Whether a content model change is ok would be decided by the pages PageTypeHandler, and that decision may indeed be based on knowledge about the current and the future model, and may involve asking the ContentHandlers about it. But it no longer has anything to do with page moves.

With the MCR schema, a page move *never* automatically triggers a content model change. A page move may cause the page *type* to change - to which both the old and the new PageTypeHandler would have to agree. But this has nothing to do with the content model.

Moves may in the future change the page type

By that do you mean "until that maybe-future moves that would change the page type would be forbidden"?

No, page moves just no longer influence the content model. Because the default model is no longer used after page creation, since the MCR model always stores the content model for every revision.

Am I mistaken about this? Did I miss something, and page moves still do impact the model somehow?

A pages "type" is determined solely by it's title, regardless of database state. It's typically based by the namespace, but in some cases, a title suffix or prefix may also trigger a certain page type.

That's how content models used to be determined. Then you added the ability to have the content model be determined explicitly in the database rather than having to be determined based on the namespace+title.

Why do the reasons for making content model able to be specified in the database not apply to page type? Or why was making content model able to be specified in the database a mistake?

Examples of existing "page types" in MediaWiki core:

  • article (the default)
  • talk page (talk namespaces)
  • category page (the Category namespace)
  • file description page (the File namespace)
  • script page (Js or CSS, user or global, active content to be interpreted by the browser) (.js and .css suffix in the MediaWiki and the User namespace)
  • system message (MediaWiki namespace)
  • conversion table (MediaWiki namespace with Conversiontable/ prefix)
  • ...

What is the difference between "article" and "talk page"?

"script page", "system message", and "conversion table" seem to me like they should be content models for the main slot rather than being separate page types. Other than that, what would be the difference from "article"?

It might be possible to rework "file description page" as being a regular page with multiple slots (for the file and description), although that would probably be a fair bit of work.

PageTypeHandler would model this concept, allowing extensions to easily define their own page types. It should control at least:

  • layout for the view action. The PageTypeHandler may be aware of certain content slots, and may show their content as appropriate. It may or may not show additional slots using a generic layout mechanism.

Seems generally sensible, but I'd be wary about any type taking advantage of the "or may not show additional slots".

  • editing mechanism. The PageTypeHandler may be aware of certain slots, and may provide an integrated editing experience for their content. It should provide a some way to access editing interfaces for any additional slots.

Editing interfaces should probably be determined by the content model or slot rather than the page type.

  • which slots are allowed, required, and desired on pages of this type.

Why should a "template" or "scribunto module" page type have to know about a "documentation" or "templatestyles stylesheet" slot in order to declare it as allowed/desired/required?

It seems to me that page types and allowed slots should be kept separate. An extension could still have the same level of control if it needs it.

  • Generic action overrides (to replace ContentHandler::getActionOverrides)

IMO we should first evaluate the use cases for action overrides and see whether the concept still makes sense. If so, then yes, PageTypeHandler seems a decent place for it.

  • behavior to be triggered upon creation, modification, and deletion, as currently encoded in WikiPage::doEditUpdates and WikiPage::doDeletionUpdates

If a page already has a page type, sure, but I'm skeptical of using this as a reason for creating a page type that would otherwise be identical to "article" or the like.

  • Constraints on page moves. Since page moves change the title, the may change the page type. In such a case, the old and the new page type handler have to both agree that the move is possible. This replaces ContentHandler::canBeUsedOn.

If a move changes the page type, sure. But again this shouldn't be a reason for creating a page type that would otherwise be identical to "article" or the like.

Examples of existing "page types" in MediaWiki core:

  • article (the default)
  • talk page (talk namespaces)

What is the difference between "article" and "talk page"?

If nothing else, talk pages get the new section tab by default, and article pages don't.

It might be possible to rework "file description page" as being a regular page with multiple slots (for the file and description), although that would probably be a fair bit of work.

Yeah, that's the long-term dream.

JavaScript content in a "regular" page would be fine, it would just not have the special "active" behavior that it has as a user script of site script. It would be rendered as JS, and edited as JS, but not used as JS.

If I'm understanding you correctly, this change would mean that such pages would suddenly no longer require the editmyuserjs or edituserjs rights to be moved/edited/etc, and loading such pages via importScript() or the like would suddenly stop working.

The "wikitext" content model would indeed be unsupported on User:Foo/Bar.js, though it wouldn't break existing pages. Maybe they could even remain editable. But they should trigger a warning, and it should not be possible to create more pages like that.

It would mean that any such existing pages would suddenly become subject to needing the editmyuserjs or edituserjs rights.

If the answers are the same, then you can't determine whether a move or content model change should be allowed just on the basis of the page type. It also needs knowledge of the specific models involved.

Whether a content model change is ok would be decided by the pages PageTypeHandler, and that decision may indeed be based on knowledge about the current and the future model, and may involve asking the ContentHandlers about it. But it no longer has anything to do with page moves.

With the MCR schema, a page move *never* automatically triggers a content model change. A page move may cause the page *type* to change - to which both the old and the new PageTypeHandler would have to agree. But this has nothing to do with the content model.

You seem to be overlooking the "or content model change" part. That's referring to the operation currently performed by Special:ChangeContentModel or by specifying a different model to ApiEditPage. EditPage can do it too, I think, although there's currently no UI for that so some form fields would have to be added by a user script or something.

You also seem to be either overlooking the fact that the new title might not allow certain slots/models that are present at the old title, or else you're assuming that every page with a different set of slots or allowed models must have an arbitrarily different page type.

Moves may in the future change the page type

By that do you mean "until that maybe-future moves that would change the page type would be forbidden"?

No, page moves just no longer influence the content model. Because the default model is no longer used after page creation, since the MCR model always stores the content model for every revision.

Am I mistaken about this? Did I miss something, and page moves still do impact the model somehow?

You missed that we're talking about page types here, not content models.

A pages "type" is determined solely by it's title, regardless of database state.

How would that handle

  • CSS/JS/JSON pages which have the wrong extension and/or namespace? (In an ideal world those would not exist, but they do, and are sometimes used heavily.) If the answer is "those would still handle editing/rendering according to their content type", that sounds like introducing a page type for the sake of having it, without it doing anything actually useful.
  • User subpages which can be used to simulate various namespaces like template or module?
  • Extensions like LiquidThreads, StructuredDiscussions and CollaborationKit which have a "convert this page" option, sometimes for the sake of fine-grained migration, sometimes because they are only meant to handle specific workflows / use cases (which can by no means be determined from title)?
  • Also things like PageForms or Translate which might or might not be applied to any given page depending on which workflow is more convenient to the user? (Maybe those should not be page types, but they do fit squarely into the "overriding editing behavior" bucket.)
  • Things integrating into the MediaWiki namespace to reduce cognitive load (they are rare, most users don't need to understand them and it makes clear their technical and potentially sensitive nature) despite being conceptually unrelated to messages, like blacklists, definition pages, campaigns, gadgets?
  • Fairly different uses of the File: namespace, depending on whether the file is image/video/audio/tabular data/etc. Usually the type can be determined from the extension but there are edge cases like .ogx.

I have changed the description to say that the page type would "ideally" only depend on the title, not db state, but it's still unclear whether this is feasible.

A pages "type" is determined solely by it's title, regardless of database state. It's typically based by the namespace, but in some cases, a title suffix or prefix may also trigger a certain page type.

That's how content models used to be determined. Then you added the ability to have the content model be determined explicitly in the database rather than having to be determined based on the namespace+title.

Recording the content model in the database was part of the initial design, the "default" model was intended be used during page creation. Nulling the field in the data base was an optimization to preserve space, which caused trouble when pages where moved in a way that would change the model. The initial idea was to simply forbid that.

Why do the reasons for making content model able to be specified in the database not apply to page type? Or why was making content model able to be specified in the database a mistake?

Recording the model for each revision is a good idea (tm), because it is necessary to know the model in order to interpret the blob, to know how to render it. This needs to be preserved through import/export, deletion/undeletion, history merge, etc. So recording the model was not a mistake, making it nullable was.

Recording the pages content model was indeed a mistake, I think. And letting people choose the model freely when creating a page was never the goal. Maybe it would be nice to allow people to choose whether they want to use wikitext or markdown, but it would likely cause a lot of confusion. I don't see any good use. But you do need to record the revision's model, so you can e.g. handle syntax versions, or do something like "old talk pages are wikitext, new talk pages are whatever new thing".

What is the difference between "article" and "talk page"?

They behave differently during page moves, for instance. There are about 20 calls to Title::isTalkPage in core, each of them encoding some kind of special case treatment on the page level.

"script page", "system message", and "conversion table" seem to me like they should be content models for the main slot rather than being separate page types. Other than that, what would be the difference from "article"?

System messages should be several content models (wikitext, plain text, html, ...), so can script pages (JS, CSS). Conversion tables should have a dedicated model with nice rendering and editing.

But the idea here is that page level behavior should not be controlled by content models. It's misplaced in the ContentHandler.

For instance, pages in the MediaWiki namespace trigger an update of the Localization cached when edited. That has nothing to do with their content model (which, currently, is wikitext). We currently have this behavior hard-coded to be triggered by the namespace. I propose to formalize the concept of a page type to control this kind of behavior. There's quite a few things in the MediaWiki namespace that do not belong into the localization cache at all (we recently had trouble with global notice banners sitting there, but also conversion tables, JS and CSS for gadgets and site-wide scripts, etc).

It might be possible to rework "file description page" as being a regular page with multiple slots (for the file and description), although that would probably be a fair bit of work.

Doubtful. While the file itself could be managed by a slot (probably as a file system path), and EXIF data could be in a "derived" slot (if we introduce that concept), you would still want to show file usage info (and upload history, though that would also be available through the general page history).

The idea here is to replace the subclasses of Article and WikiPage we have for File and Category pages with a better system (especially since both Article and WikiPage should go away in the not-too-far future).

I'd be wary about any type taking advantage of the "or may not show additional slots".

As long as such additional slots are exposed *somewhere* (e.g. via action=info), I think it would be ok. I'd expect most page types to show all slots, though (unless the slot itself says it wants to be hidden, of course).

  • editing mechanism. The PageTypeHandler may be aware of certain slots, and may provide an integrated editing experience for their content. It should provide a some way to access editing interfaces for any additional slots.

Editing interfaces should probably be determined by the content model or slot rather than the page type.

Every slot should provide an editing mechanism, and most should provide an editing UI that can be used stand-alone. But a page type handler should be able to override this to provide a nicely integrated interface for atomic editing that does not consist of a concatenation of independent forms. This would provide a lot of flexibility for interactive editing workflows that span multiple slots. Writing such integrated editing UIs is quite a bit of work, but for high profile use cases, may well be worthwhile. Why would you want to prevent that?

  • which slots are allowed, required, and desired on pages of this type.

Why should a "template" or "scribunto module" page type have to know about a "documentation" or "templatestyles stylesheet" slot in order to declare it as allowed/desired/required?

It shouldn't. But TemplateData slots should be able to declare that they are allowed only on template pages (the type, not the namespace). Talk pages could declare that they don't support quality assessments, and file pages could declare that they require a media slot (when we start handling the actual media files in slots).

It seems to me that page types and allowed slots should be kept separate. An extension could still have the same level of control if it needs it.

I think in practice, I want page types to be able to require certain slots, and slot handlers to restrict the slot to certain types of pages (and not rely on the namespace for that).

  • Generic action overrides (to replace ContentHandler::getActionOverrides)

IMO we should first evaluate the use cases for action overrides and see whether the concept still makes sense. If so, then yes, PageTypeHandler seems a decent place for it.

We use them quite a bit, both in core and in extensions, so there seems to be a need for control over what "move" or "undo" or "purge" means for a given page.

  • behavior to be triggered upon creation, modification, and deletion, as currently encoded in WikiPage::doEditUpdates and WikiPage::doDeletionUpdates

If a page already has a page type, sure, but I'm skeptical of using this as a reason for creating a page type that would otherwise be identical to "article" or the like.

So you prefer we keep hard-coded special cases for namespaces and title suffixes scattered across the code base? Why? This makes things really hard to maintain. You seem to feels that the introduction of a page type is something expensive, while I think that it removes tangles from the code base.

  • Constraints on page moves. Since page moves change the title, the may change the page type. In such a case, the old and the new page type handler have to both agree that the move is possible. This replaces ContentHandler::canBeUsedOn.

If a move changes the page type, sure. But again this shouldn't be a reason for creating a page type that would otherwise be identical to "article" or the like.

Why not? And - what's the alternative? Hard coded special cases? Hooks? How are they better?

(tangent: Perhaps this comes down to a basic disagreement about "declarative overhead". To some, code that declares but does not do anything is overhead that should be minimized. To me, declarations provide guarantees that prevent bugs and provide freedom for modification. They convey intent and meaning to the runtime and to IDEs, rather than instructions - to me, that seems quite valuable. This may be worth exploring some more at some other venue).

If I'm understanding you correctly, this change would mean that such pages would suddenly no longer require the editmyuserjs or edituserjs rights to be moved/edited/etc, and loading such pages via importScript() or the like would suddenly stop working.

Yes, indeed. It should be obvious (at least to the adept) whether a certain page contains "active" script or not.

But I'm not saying this should happen "suddenly". We'd have to determine how disruptive this is, how pages can be migrated, whether we need some kind of compat mode, for what time, etc.

The "wikitext" content model would indeed be unsupported on User:Foo/Bar.js, though it wouldn't break existing pages. Maybe they could even remain editable. But they should trigger a warning, and it should not be possible to create more pages like that.

It would mean that any such existing pages would suddenly become subject to needing the editmyuserjs or edituserjs rights.

Indeed. To be honest, I cannot think of any good reason for such pages to exist. Is there any need for them?

But see above for "suddenly".

You seem to be overlooking the "or content model change" part. That's referring to the operation currently performed by Special:ChangeContentModel or by specifying a different model to ApiEditPage. EditPage can do it too, I think, although there's currently no UI for that so some form fields would have to be added by a user script or something.

Explicit content model changes would work exactly as before, except that the page type handler (and the slot role handler) could veto them. The main slot on a "normal" page could have pretty much any model. Most slots would be restrictive about which models they allow. Some page types (e.g. Wikibase Entity pages) would restrict the main slot to a specific model.

You also seem to be either overlooking the fact that the new title might not allow certain slots/models that are present at the old title, or else you're assuming that every page with a different set of slots or allowed models must have an arbitrarily different page type.

This is why both the old and the new page type handler have to approve of the rename. The new handler could veto a rename that would put content models in slots where they are not supported. We already do this kind of thing e.g. to prevent moves from or to Wikibase Entity namespaces, by overriding the move action - which is rather hacky.

I expect most page type handlers would be pretty lenient about what models/models they accept, though.

You missed that we're talking about page types here, not content models.

We were talking about replacing ContentModel::canBeUsedOn.

The "wikitext" content model would indeed be unsupported on User:Foo/Bar.js, though it wouldn't break existing pages. Maybe they could even remain editable. But they should trigger a warning, and it should not be possible to create more pages like that.

It would mean that any such existing pages would suddenly become subject to needing the editmyuserjs or edituserjs rights.

Indeed. To be honest, I cannot think of any good reason for such pages to exist. Is there any need for them?

But see above for "suddenly".

Correct me if I misunderstood, but that means that the article about Node.js of the English Wikipedia couldn't be created with the suggested changes with the wikitext content model? Or that I could not start an article about a different JavaScript framework as a subpage of my user page due to having it disallowed creating *.js pages with the wikitext content model?

A pages "type" is determined solely by it's title, regardless of database state.

How would that handle

  • CSS/JS/JSON pages which have the wrong extension and/or namespace? (In an ideal world those would not exist, but they do, and are sometimes used heavily.) If the answer is "those would still handle editing/rendering according to their content type", that sounds like introducing a page type for the sake of having it, without it doing anything actually useful.

They would render as before, and be edited as before. It would however not be possible to *use* them as JS or CSS pages.

The idea is to split content behavior (rendering, editing) from page behavior (layout, caching, purging, use of the content for some "magical" purpose).

  • User subpages which can be used to simulate various namespaces like template or module?

If that is to be supported by the page type mechanism, it would have to follow a clear naming convention that MediaWiki knows about.

  • Extensions like LiquidThreads, StructuredDiscussions and CollaborationKit which have a "convert this page" option, sometimes for the sake of fine-grained migration, sometimes because they are only meant to handle specific workflows / use cases (which can by no means be determined from title)?

In *theory*, this would simply mean using a different content model for the talk page, and keeping the same page type handler, because the pages behavior doesn't change, just the behavior of the page's content. In practice, some extensions, like Flow, have a need to override actions - which would become the domain of a page type handler.

If page type is determined by title, conversion to another type can only be done by a rename. Since talk pages are not free to change their name, this is problematic, unless we implement a more flexible notion of "associated namespaces".

If this cannot be overcome, this would mean that I'd have to let go of the idea of letting page type depend *only* on the title, and allow for it to be derived from the main slot's model (at least for B/C) or be stored in the database explicitly. I'm not totally opposed to that approach, but I'd really like to avoid it.

  • Also things like PageForms or Translate which might or might not be applied to any given page depending on which workflow is more convenient to the user? (Maybe those should not be page types, but they do fit squarely into the "overriding editing behavior" bucket.)

Not sure, to be honest. Could be done with page types, could be done with content models. Could perhaps even be done by using extra slots to cause a change in edit behavior. Maybe we need yet another way to override EditPage.

I'm thinking that Translate should probably be a page type and maybe also use a dedicated content model. The page type could be triggered by a title suffix, such as /fr, in certain namespaces. But I have not looked at this closely.

  • Things integrating into the MediaWiki namespace to reduce cognitive load (they are rare, most users don't need to understand them and it makes clear their technical and potentially sensitive nature) despite being conceptually unrelated to messages, like blacklists, definition pages, campaigns, gadgets?

I'd either move them to a different namespace, or give them a prefix or suffice to identify them. Doesn't have to happen right away - until it happens, they will be treated like interface messages, just like they are now.

  • Fairly different uses of the File: namespace, depending on whether the file is image/video/audio/tabular data/etc. Usually the type can be determined from the extension but there are edge cases like .ogx.

I don't see that the page behaves differently with respect to things like renaming. For now, the differences are covered by MediaHandler. In the future, they could be different content models (subclasses of a generic "media" model that records a file system path, or uses the file system directly for blob storage).

Correct me if I misunderstood, but that means that the article about Node.js of the English Wikipedia couldn't be created with the suggested changes with the wikitext content model?

No, the .js suffix would only trigger the special page type when used on a user sub-page or in the MediaWiki namespace.

Or that I could not start an article about a different JavaScript framework as a subpage of my user page due to having it disallowed creating *.js pages with the wikitext content model?

Yes, true: User:Agabi10/Node.js would be treated as an "active" page, and the "script" page type may not like wikitext there (it *could* allow wikitext as well, but that would be confusing and partially defeat the point). You would have to rename it to User:Agabi10/Node-js or User:Agabi10/Node.js.wikitext to avoid this. Do you think that would be terrible?

I just remembered that some people create pages like User:Foo/xzy.js with wikitext content, because these are "user protected" so only they themselves can edit them. That's quite a nasty hack that could easily be made obsolete by introducing another prefix or suffix like User:Foo/own/xzy or User:Foo/xzy.protected or something that would trigger the same "only this user can edit this" behavior.

Yes, true: User:Agabi10/Node.js would be treated as an "active" page, and the "script" page type may not like wikitext there (it *could* allow wikitext as well, but that would be confusing and partially defeat the point). You would have to rename it to User:Agabi10/Node-js or User:Agabi10/Node.js.wikitext to avoid this. Do you think that would be terrible?

I don't think that would be terrible, but I don't see any benefit on making the page type depend on the page title neither. I rather have the page type defined on the database and allow setting it the same way the content models can be set. At least from my understanding page types define mostly how the information is displayed and the available actions for that page. I don't see how it can be more intuitive making it depend on the title of the page than allowing the user selecting the desired among the list of available options on page creation.

Based on the namespace there could be a list of allowed page types and if there are more than one page types available configured for that namespace it could ask the user on creation which of the available page types they want to create with a short description of what each of the available types is used for.

I don't think that would be terrible, but I don't see any benefit on making the page type depend on the page title neither. I rather have the page type defined on the database and allow setting it the same way the content models can be set. At least from my understanding page types define mostly how the information is displayed and the available actions for that page. I don't see how it can be more intuitive making it depend on the title of the page than allowing the user selecting the desired among the list of available options on page creation.

The page type is intended to define what actions are available, and how they operate. The most visible of them are certainly view and edit, but things like "this page can be loaded like a module" or "this page cannot be moved into another namespace" or "this page cannot contain text in the main slot" are also quite relevant to the user.

The main reason I think it would be nice to tie this to the title is that people are used to the fact that pages in different namespaces behave differently. It would be a lot more surprising to find that some pages can me moved to the user namespace but others cannot, based on some flag set during page creation. Also, this way, no special operation or UI needs to be defined for "changing the page type". Though one could argue that tying that operations to page moves in a problem, not a feature.

Anyway, my intention was to formalize an existing concept: different behavior based on namespace and title suffix. We have a lot of that, hard coded in core and hooked in by extensions. I'd like to keep this idea the same as far as users are concerned, and make it clearer and more flexible in code.

daniel renamed this task from Introduce PageTypeHandler to Specify PageTypeHandler.Dec 10 2018, 3:48 PM

@Krinkle the topic of "page types" came up in a conversation we had a while ago, and you asked me if this was written down somewhere. Took me a while to find this old ticket, but here it is :)

@Krinkle the topic of "page types" came up in a conversation we had a while ago, and you asked me if this was written down somewhere. Took me a while to find this old ticket, but here it is :)

I feel biased against this for the sole reason that having namespaces, and content models, and content formats, and revision slots, and slot roles; is already far more abstraction than I feel is reasonably understandable or indeed helpful and necceccary. I find it hard to motiviate myself to entertain adding another concept, no matter the rationale or context, until we can first remove some of the existing ones.

I feel biased against this for the sole reason that having namespaces, and content models, and content formats, and revision slots, and slot roles; is already far more abstraction than I feel is reasonably understandable or indeed helpful and necceccary. I find it hard to motiviate myself to entertain adding another concept, no matter the rationale or context, until we can first remove some of the existing ones.

You will be happy to know that content formats have been removed :) Any mention is merely backwards compatibility.

Also... slots and roles are very closely bound together, to the point of the terms being used interchangably... which is unfortunate.

Which of the existing abstractions seem suitable to address the needs outliend in this task? In the past, there have been attempts to attach such behaviors to the content model of the main slot, or to namespaces, or to title suffixes. What would you suggest?

[…] What would you suggest?

I think it depends on what problem we're trying to solve. I recognise that some of the things in the task description can be interpreted and make a reasonable case in the abstract that a "page type" concept could be one way of thinking about some of these things. But that's imho insufficient by itself given we're talking about changing something in an existing platform, not creating something that is actually a new feature.

Once we have one or more concrete problem descriptions, we can think about whether and how to solve them, possibly not with a single high-level solution per-se.

From the task description:

Ideally, a page's "type" is determined solely by it's title, regardless of database state. It would typically be based by the namespace, but in some cases, a title suffix or prefix may also trigger a certain page type.

This is something I'm definitely in favour of. Especially as otherwise we get into scenarios that are imho incompatible with end-user expectations or behaviour that our current interface can model and explain. Eg. moving a category page elsewhere would not take the page type and that layout with it, both because it can't (it would no longer be connected to a category and thus have no members) but also because it would break the operational control of the feature (the page type must only be used in certain contexts, it's not merely a default for the category namespace, it must only ever be used there). This is notably different from content models, which are indeed reasonable to exist anywhere, with title namespace/suffix merely offering a default.

Use cases:

  • pageview layout

This suggests to me that control over choosing the type would likely be the result of an (ideally cheap) runtime "filter" hook (ref T212482), based on a TitleValue. If this is only for pageview layout, though, then I'm not sure a new abstraction is warranted. Whether the hook is for a page type, connected to a subclass, where the logic lies, or whether the logic is elsewhere and explicitly invoked by the hook without a central registry and new set of names in the middle, is not particularly better or worse either way. My bias in that case tends to be for fewer moving parts and more direct connections between the two code paths instead hiding it and making the connection less obvious.

  • action overrides
  • editing mechanism (action=edit)

It's not obvious to me that these generally wouldn't be associated with the content model. It seems a reasonable fit and seems like it would work for Wikibase, LQT, and Flow. An overview of exceptions would help here. Similar for the other listed use cases, examples of where this currently feels like a bad fit would help, as well as what problem or cost this is perceived as causing long-term.