Notes from meeting with @MattFlaschen, @Ebernhardson, and @MatthiasMullie (originally at https://etherpad.wikimedia.org/p/flow-db-dumps)
Do we need this, or is it redundant to all history?
This is more denormalized data that will exist within in the full revision dump, but could still be usefull to spell out explicitly
* For this, we discussed having a list of revisions inside each object (this part needs a little more discussion).
What about moderation and unmoderating topics?
Creates a revision of the topic title.
How do we represent topics that were part of a board at one time, but not later. E.g. moved between boards, or added to additional boards (topic in multiple boards simultaneously)
Haven't decided how to represent that in the db.
Board could just be a query
Or it could be a human-curated list of topics.
Since neither move nor "topic in multiple boards" is implemented yet, the "all history" version can include the end state list of topics (which
should also be the super-set) and to see when they were created/deleted/etc. you just check the history of the topic title.
Attaching and detaching from a board (those two actions should be able to represent moving as well as simultaneous attachment)
could be represented as part of the history of the topic title.
This should allow us to just add it to the <topic> part of the "all history" later when it's implemented.
Do we have to worry about orphan topics, or should be we forbid that?
However, we definitely need to revisit this before adding support for either of those features.
Should the same mechanism be used for deleted topics, or should that just show the moderation on the topic title?
* Don't need to represent full history of moderation, but should only include visible items and show current moderation/lock status.
* Same format as the document topic XML
<board id="sasd..." title="...">
<topic id="sasd3423" />
<topic id="sasd3424" />
```<topic id="sasd3423" dumpVersion="1">
<!-- summary also contains categories -->
<post id="sasd3423" timestamp="..." user="..." lastEditUser="" lastModerationUser="">
<p data-parsoid="..."></p> <!-- Should we inline this as XML, or as CDATA? Parsoid guarantees HTML, but not any version, nor XML. Maybe XHTML? Or just CDATA. -->
<status isHidden="1" user="" reason="" />
<moderatedPost isSuppressed="1" moderatingUser="" title="Something"> <- as long as we properly document all nodes, I'm happy with either (<moderatedPost/> or <post><moderation></post>)
Suppression shouldn't reveal anything at all, so it just vanishes from the dump.
If you suppress a board, it should suppress an the topics so they should also not be in dumps.
If you suppress a topic (directly or indirects), all posts should be suppressed
Need to look into what happens to post visibility if you replied to a moderated post before it was moderated (i.e. I reply, then the parent post
is later hidden, deleted, or suppressed); we think it varies between types, but needs to be checked.
<moderatedTopic isDeleted="1" moderatingUser="" title="Topic title">
By doing like this, we can allow topics to be part of multiple boards.
data-parsoid and data-mw will probably be stripped (data-parsoid already has). But they both might be useful to reusers. Add back in?
How to handle moderated posts? Idea should be that current version exposes all data that an anonymous user would see (for example, you can still see the title of a deleted post)
It's public who created a deleted topic, and posted to it, but not clear if it should be in the current version.
moderationPost/moderationTopic vs. status/moderated sub-element. Not sure, advantages to both
status might be better given that some moderations like lock might be fairly common.