Page MenuHomePhabricator

display a list of backed up etherpads somewhere
Closed, ResolvedPublicFeature

Description

Why

  • useful
  • since these were gathered from public places such as Phab tickets, mailing lists, and wiki pages, these were already public so we probably don't need to obscure these

What

Notes

Details

Related Changes in GitLab:
TitleReferenceAuthorSource BranchDest Branch
Draw the rest of the owltoolforge-repos/etherpad-backup!1bd808work/bd808/draw-the-owlmain
Customize query in GitLab

Event Timeline

Tkarcher claimed this task.
Tkarcher added subscribers: bd808, Tkarcher.

Done: I created a pad_data.json from the output of the directory listing and an index.html file to present the data in a searchable table, and put both in the public_html folder, accessible via https://etherpad-backup.toolforge.org/ .

As this backup is meant to be relatively static / constant and will most likely not be updated after the new Etherpad instance goes live (as there will be a new solution based on S3 storage), I didn't implement any automatic update, and changes to the data need to be done manually after adding more pads to the backup:

  • output the current directory listing to a text file (ls -l --time-style=full-iso >listing.txt)
  • convert this text file to JSON with a couple of manual search & replace actions
  • replace pad_data.json

Feel free to reopen the task in case I missed anything.

(Pinging @bd808 to let him know about the update)

(Pinging @bd808 to let him know about the update)

Thanks. Some slightly different solution will be needed when we get the object storage backend working, but this seems like a great interim fix. @Novem_Linguae I will count on you to poke us if we forget to do something to support your use case in the next generation system.

I just tweaked the deployed solution for this so that it stopped loading js and css from 3rd-party servers. The dataframe files were not available on the cdnjs mirror that Toolforge hosts; we are now serving these files directly from the tool.

I was hoping that the next generation solution would be a bit lighter for the client via server-side paging. Unfortunately it is looking like a similar client-side pagination of a big json list will still be the most reasonable solution. The object storage API does support pagination, but it does not have ordering or server-side searching support. I should be able to do something in the app to cache a JSON version of the upstream bucket contents so that every landing page view does not trigger multiple object storage API calls to produce a JSON listing.

Thanks for improving the handling of the external ressources. Regarding the size of the json list: This was a quick solution without optimization in mind. If the size becomes a problem in future, we can easily squeeze it in half by switching from JSON objects to JSON arrays (see https://datatables.net/manual/data/#Data-source-types) and reducing the date precision:

Current solution (752KB):

{"size": 6237, "last_modified": "2026-03-06 13:43:16", "title": ","},
{"size": 5091, "last_modified": "2026-02-14 09:24:35", "title": "-1-to-100-Team"},
{"size": 647, "last_modified": "2026-02-12 20:26:59", "title": "-lpi-9dwqfFbYfOt-pde"},

Array format with reduced date precision (383KB):

[6237,"2026-03-06",","],
[5091,"2026-02-14","-1-to-100-Team"],
[647,"2026-02-12","-lpi-9dwqfFbYfOt-pde"],

And if that's still not enough, we could also split the json files by decade and let the user pick a period first: Pads backed up before 2027, between 2027 and 2037 and so on.