Page MenuHomePhabricator

WS Export: Missing subpages (quote in page name?)
Closed, ResolvedPublicBUG REPORT

Description

The following book is not exporting its subpages: https://en.wikisource.org/wiki/The_Shoemaker%27s_Apron.

It does have a table with ws-summary set.

This is what was produced:

I seem to recall a bug before with a quote in the name (was it T275870?) - is this a recurrence?

Event Timeline

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

It's a bit strange, it looks like this source HTML:

<td colspan="2">
    <div style="position:relative; width:100%;">
        <div style="text-align:left; text-indent:-1.5em; margin-left:1.5em;">
            <div class="toc-line-entry-text wst-toc-dot-bg" style="display:inline; position:relative; text-align:left; padding:0.0em 0.5em 0.0em 0.0em; z-index:2;">
                <span style="font-size: 83%;">
                    <a href="/wiki/The_Shoemaker%27s_Apron/The_Twelve_Months" title="The Shoemaker's Apron/The Twelve Months">
                        <span class="smallcaps" style="font-variant:small-caps">The Twelve Months</span>
                    </a>: The Story of Marushka and the Wicked Holena
                </span>
            </div>
        </div>
        <div class="ws-noexport wst-toc-dot-bg" style="position:absolute; left:0px; bottom:0px; width:1.5em; height:1.00em; z-index:1;"></div><div class="ws-noexport" style="position:absolute; right:0px; bottom:0px; width:100%; overflow:hidden; white-space:nowrap; text-align:right; z-index:0;"><div style="display:inline; float:right;">.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .</div>
        </div>
    </div>
</td>

is ending up in the epub as this:

<td colspan="2">
    <div style="position:relative; width:100%;">
        <div style="text-align:left; text-indent:-1.5em; margin-left:1.5em;">
            <div class="toc-line-entry-text wst-toc-dot-bg" style="display:inline; position:relative; text-align:left; padding:0.0em 0.5em 0.0em 0.0em; z-index:2;"> <span style="font-size: 83%;"> : The Story of Marushka and the Wicked Holena</span></div></div>
    </div>
</td>

i.e. without the link to the subpage.

But it's weirder: when I do the export locally, it works fine! I'm not sure what's going on.

Actually, for some reason, the HTML that we're getting is saying that the subpages don't exist! That's why they're getting stripped out. I think there's some sort of caching going on somewhere that's not being cleared properly, and that this HTML is just old. It's not what one gets with curl https://en.wikisource.org/api/rest_v1/page/html/The_Shoemaker's_Apron but is what \App\Util\Api::getAsync() is returning from that same URL.

<span style="font-size: 83%;">
  <a href="/w/index.php?title=The_Shoemaker%27s_Apron/The_Twelve_Months&amp;action=edit&amp;redlink=1" class="new" title="The Shoemaker's Apron/The Twelve Months (page does not exist)">
    <span class="smallcaps" style="font-variant:small-caps">The Twelve Months</span>
  </a>: The Story of Marushka and the Wicked Holena
</span>

It looks like this was a Parsoid caching issue, as the above work now exports in full without obvious errors.

@Inductiveload can you confirm?

Inductiveload claimed this task.

Yes, that's working.

Is this something we should worry about or it's just an occasional transient thing that will always resolve?

I've seen it a few times, but never been able to track down what's causing it. So I guess we just ignore it. :)