Page MenuHomePhabricator

Consider whether we want to dump private and/or closed wikis
Closed, ResolvedPublic

Description

On T346046: [Search Update Pipeline] Source streams for private wikis, there is a proposal to have separate streams for private wikis.

From Dumps 2.0 point of view, we need to decide whether there is a need to dumps these private wikis.

Some points:

A quick discussion with @Ottoandry, @Milimetric suggest we do not want to dump these wikis, but we should discuss more broadly.

Event Timeline

Hm. We may want to include them in the intermediate table in the Data Lake, but definitely not in any public dumps we might expose.

Maybe let's just not do it for now and pick them up later if we want to?

xcollazo renamed this task from Consider whether we want to dump private wikis to Consider whether we want to dump private and/or closed wikis.Mar 17 2025, 7:20 PM
xcollazo claimed this task.

We have had the datalake table wmf_content.mediawiki_content_history for a while, and now we also have wmf_content.mediawiki_content_current.

There has been no requests for adding in data from closed and/or private wikis.

Being bold and closing this.