Page MenuHomePhabricator

Create design for Pywikibot class mappings for Flow objects
Closed, ResolvedPublic

Assigned To
Authored By
happy5214
May 12 2015, 2:31 AM
Referenced Files
F173219: Phase 1 diagram.png
Jun 2 2015, 7:34 PM
F167782: Pywikibot-Flow class diagram.v4.png
May 22 2015, 9:30 AM
F166294: Pywikibot-Flow class diagram.v3.png
May 19 2015, 8:01 AM
F165531: Pywikibot-Flow class diagram.v2.png
May 17 2015, 8:48 AM
F164178: Pywikibot-Flow class diagram.v1.png
May 13 2015, 2:23 AM
F164177: Design.xmi
May 13 2015, 2:23 AM

Description

The first step in creating a Pywikibot mapping to Flow objects is recognizing what needs to be done and devising a plan. A design must be created to represent the hierarchy of the new Pywikibot classes in relation to the existing code and the relations between these new classes and the Flow API and internal architectural design. This design will take the form of a chart or diagram, perhaps using UML, with the following rules:

  1. There will be classes representing Flow boards and topics, which will both subclass the existing Page class.
  2. There will be a class representing posts to Flow topics, which will not subclass the Page class.
  3. Boards will have descriptions (internally known as "headers"), while each topic will have a title and summary.
  4. The diagram will show the relationships between operations on the Pywikibot objects and Flow API calls where they exist.

This task is due by the start of coding.

Event Timeline

happy5214 claimed this task.
happy5214 raised the priority of this task from to Medium.
happy5214 updated the task description. (Show Details)
happy5214 added a project: Pywikibot-Flow.
happy5214 added subscribers: Aklapper, happy5214, Legoktm and 13 others.

I have created an early first draft of a design. The method names in this version reflect their respective API calls, and these will probably not be their final names. I must stress that this design will certainly not be the design actually used in the implementation, and that this is just a first step. Much work remains in creating a more natural and Pythonic interface for these API calls and ensuring the use of appropriate data types for fields. I just wanted everyone to see what I've accomplished so far.

XMI format:

PNG format:

Pywikibot-Flow class diagram.v1.png (416×381 px, 11 KB)

PS I know Topic.title should have unicode as its type, but my UML editor has already switched to Python 3 types and I've been too lazy to change them. I also will add the header/description and topic summary fields in the next revision.

I just wanted everyone to see what I've accomplished so far.

Neat! From a quick look, I think the undo_* functions aren't needed, the bot would just want to fetch the previous content and then update it.

PS I know Topic.title should have unicode as its type, but my UML editor has already switched to Python 3 types and I've been too lazy to change them. I also will add the header/description and topic summary fields in the next revision.

Python 3 is preferred by some of us ;-)

Much work remains in creating a more natural and Pythonic interface for these API calls and ensuring the use of appropriate data types for fields.

I agree, this is an area that will need refinement. I need to find the current Pywikibot API documentation so I can see what the current style they use is.

Also, a board can have 0 or more topics (it can exist with only a header), but a topic indeed will always have at least one post.

Also, you provide no way to go from the topic to the posts, or from posts to parents/replies (latter may not be needed).

Description (header) and summary may deserve to be their own entities, particularly because they are versioned.

Is any class actually a subclass of Page?

Is any class actually a subclass of Page?

Category, FilePage, and User are Page subclasses. The Wikibase page classes inherit WikibasePage, which in turn inherits BasePage. Looking at it, BasePage might be the better option as a superclass for a FlowPage superclass. I'll have to look into it more.

Version 2 is up, and a version 3 should be posted by Monday night/Tuesday morning.

Pywikibot-Flow class diagram.v2.png (466×807 px, 21 KB)

I created a new superclass for revision-type objects (TopicSummary, Header, and Post), which I named FlowRevision. It probably isn't necessary due to Python's duck typing, and I'll probably remove it in v3. I also changed method names to remove redundant wording. I redid the inheritance for Topic and Board, making them subclass FlowPage, which inherits from BasePage instead of Page.

This is still very much a WIP. I will still have to research how API calls are made in Pywikibot. It looks like the site objects handle that.

Is topic locking implemented using page protection? Or ... Is page protection used anywhere in the Flow design?

Version 2 is up, and a version 3 should be posted by Monday night/Tuesday morning.

Pywikibot-Flow class diagram.v2.png (466×807 px, 21 KB)

I created a new superclass for revision-type objects (TopicSummary, Header, and Post), which I named FlowRevision.

Note we have a Revision class for Page revisions.

It probably isn't necessary due to Python's duck typing, and I'll probably remove it in v3. I also changed method names to remove redundant wording. I redid the inheritance for Topic and Board, making them subclass FlowPage, which inherits from BasePage instead of Page.

Looks good as a foundation.

I will still have to research how API calls are made in Pywikibot. It looks like the site objects handle that.

Yes. Page objects are the API for the script writer , and Site objects contain python-ised equivalent wrapper for each endpoint in the MediaWiki API.

Is topic locking implemented using page protection?

No. Lock is a separate moderation state implemented by Flow.

Or ... Is page protection used anywhere in the Flow design?

We respect protection for the topic and board pages. If a board page is protected, you can't do any post on the page, or any topic within it. If a topic page is protected, you can't do any post on that topic.

@Mattflaschen How well do you think 'ordinary', core revisions represented by the Revision class match up with the subclasses of FlowRevision as I've described them? Thinking about it, I can't really find a direct use for FlowRevision, and TopicSummary, etc. will probably have multiple Revision members, one for each revision.

Version 3 is up.

Pywikibot-Flow class diagram.v3.png (506×847 px, 27 KB)

I ditched the old FlowRevision superclass in favor of another class of the same name, representing individual Flow revisions.

Each attribute with plural name represents a generator method returning the implied objects.

Topic.title is a property returning the contents of the title post, or Topic.root.get(). .get() on the Header, Post, and TopicSummary objects returns the current contents of the object as a Unicode string. .get() on Board and Topic is not yet defined. Come to think of it, do topics and boards even have their own set of revisions?

I plan on making at most two more revisions before my self-imposed deadline for this task, which is Saturday night/Sunday morning. Comments are needed.

regarding 'unicode', it is ok the way it is, however it is important to note that we always use unicode (Python 2)/str (Python 3) for wiki content, for obvious reasons. We tend to refer to the Python 2 'str' as bytes, as that is the Python 3 name for the equivalent datatype.

Page has a 'text' property, which is the wiki content.

I think Topic.title is problematic, as str.title() and BasePage.title() exist as methods, and do different things (already confusing enough), so adding a attribute (not a method) will make this situation even more confusing, especially as BasePage.title() refers to a public unique identifier of the instance (the page title on the wiki).

Which classes have a unique public identifier? Where will this be stored, and how will it be accessed? WikibasePage stores its unique public identifier in an attribute called id, and has a method getID to obtain that identifier.

@Mattflaschen How well do you think 'ordinary', core revisions represented by the Revision class match up with the subclasses of FlowRevision as I've described them? Thinking about it, I can't really find a direct use for FlowRevision, and TopicSummary, etc. will probably have multiple Revision members, one for each revision.

If Revision is the Pywikibot class for core revisions, you will probably not be able to use it at all. Although Flow has full revisioning, and from a user point of view it's similar, it's implemented totally differently (e.g. the revision IDs are UUIDs).

Topic.title is a property returning the contents of the title post, or Topic.root.get(). .get() on the Header, Post, and TopicSummary objects returns the current contents of the object as a Unicode string. .get() on Board and Topic is not yet defined. Come to think of it, do topics and boards even have their own set of revisions?

Internally, a topic is represented as a topic title post. That is versioned, can be moderated, and has replies (the topic's top-level posts). Boards are not versioned; the board history just shows history of objects that belong to the board.

Which classes have a unique public identifier?

Everything (board, topic, post, revision) is identified by a public UUID.

Version 4 is now available for hopefully prompt review:

Pywikibot-Flow class diagram.v4.png (520×947 px, 31 KB)

I included UUID attributes and accessor methods for FlowPage (inherited by Board and Topic), Post, and FlowRevision. I struggled with how to handle the topic title API calls, first creating a Post subclass for topic roots before just deciding to create methods on Topic to handle editing topic titles. I would have just used the edit process for Post objects (modify the text attribute and call save()), but I figured there's a reason why topic title edits are handled separately in the Flow API.

Topic has two additional convenience methods: get_replies(), which returns root.replies; and get_topic_title(), which returns root.get(). I would not be surprised if all of the "attributes" end up as Python property-style function sets or something similar. getUUID() could just be the getter of a UUID property. The revisions "attribute" on Header, Post, and TopicSummary will probably be implemented by generator methods.

Finally, a question. Are all Flow workflows represented by pages and, if so, could FlowPage be renamed to Workflow?

It looks good. However, part of the task is to show which Flow API calls will be used.

One of the issues we'll have to deal with (I don't remember if we discussed this before; if not, my bad) is format. Basically:

  • Topic titles are always plaintext (no HTML or wiki markup allowed)
  • Anything else can be in one of three formats, "html" (standard Parsoid), "wikitext", or "fixed-html" (only for views)

Not sure how to represent that; maybe a Content object with wikitext and HTML, that sends the correct type for POSTs, and (optionally) could convert behind the scene for GETs (using flow-parsoid-utils).

I like the idea of using properties for uuid, get_replies, etc. (if a field doesn't suffice)

Topic title should probably work the same as posts (with save() method). The API difference can be behind the scenes.

Pywikibot itself internally uses wikitext almost entirely. Now I don't know what Flow internally uses, but when we are going to support HTML it should send the HTML to the server which parses it back into wikitext. Otherwise I don't think it's useful to handle HTML. And as wikitext is the probably the basic format, HTML doesn't really provide advantages which would justify supporting it. And pywikibot has support for any API calls so if someone needs HTML for some reason they can build the queries themselves.

Pywikibot should be able to interface with wikitext-only Flow instances.

Pywikibot itself internally uses wikitext almost entirely. Now I don't know what Flow internally uses, but when we are going to support HTML it should send the HTML to the server which parses it back into wikitext.

Flow internally uses Parsoid HTML on the WMF setup (which we also use locally); the version storing wikitext internally is not actively used by developers (though Jenkins uses it). Sending HTML to the server is fine for the Parsoid/store HTML setup. Sending either wikitext or HTML to the server is supported when Flow is using Parsoid.

Pywikibot should be able to interface with wikitext-only Flow instances.

Maybe, if we don't drop Parsoid support. However, I don't consider this a hard requirement for the summer GSOC.

In T98819#1314112, @Mattflaschen wrote:

Pywikibot should be able to interface with wikitext-only Flow instances.

Maybe, if we don't drop Parsoid support.

Why do you keep insisting?

I don't consider this a hard requirement for the summer GSOC.

If the mapping is well designed, it shouldn't be a problem to support wikitext, now or later.

Given that pywikibot is heavily designed around using wikitext , and all existing 'add notice to talk page' in scripts is using wikitext, I think this project has a higher chance of avoiding bikeshedding and -2's in code review if it focuses on implementing (only) wikitext support first.

We know what property and methods look like for wikitext, and how they should behave in corner cases. Also script writers are mostly going to prefer to interact with wikitext, as that is what they are most comfortable with. We need to make their transition from Discussion wikitext pages to 'Flow things' as simple as possible.

The design should still emcompass html support, and should put an implementation for it up for review. Semantics around two different/alternative content formats is going to be one of the interesting problems to solve.

In T98819#1314112, @Mattflaschen wrote:

Pywikibot should be able to interface with wikitext-only Flow instances.

Maybe, if we don't drop Parsoid support. However, I don't consider this a hard requirement for the summer GSOC.

Is there a phab task about dropping Parsoid support?

Is there a phab task about dropping Parsoid support?

Sorry for the typo. I meant to write "drop non-Parsoid support". Parsoid is currently the best-supported Flow setup.

It's T88908: Drop support for non-Parsoid configuration. That task resolution is not really final. It only reflects the short-term view.

Thanks for clarifying.

Recapping, in our last meeting we agreed that the Flow support in Pywikibot would focus on implementing wikitext, with html support as a nice to have that can be a subsequent patch.

I went for a phase-based approach to the design this time. This is the design for the first phase of coding:

Phase 1 diagram.png (402×1 px, 36 KB)

It includes provisions for handling content format, and focuses on loading boards, topics, and posts; creating new topics; and replying to existing posts. It also includes API calls to be added in this phase. I hope it is obvious which calls will be made by which methods. More phase-based designs will follow.

An amendment to the above Phase 1 design is to add a method to APISite, named new_topic, to use the new-topic Flow API submodule to create new topics. This method will return a dict and have four parameters:

  • page (Board): The board this topic will belong to.
  • title (unicode): The topic title.
  • content (unicode): The contents of the first post.
  • format (unicode; either 'wikitext' or 'html'): The content format of the initial post. Defaults to 'wikitext'.

It will be called by Board.new_topic() to create a new topic on that board. That method will then create a new Topic object using the information returned by the API call and then return the object to the calling code. A Post object will be returned by Post.reply() in a similar fashion.

Another amendment is that all **kwargs parameters have been removed. They will be added back on a case-by-case basis if needed.