Page MenuHomePhabricator

Save articles linked in an article, too
Closed, DuplicatePublic

Description

If you save an article it would be great to save the articles, linked in this article, too.

Event Timeline

Florian raised the priority of this task from to Low.
Florian updated the task description. (Show Details)
Florian added subscribers: Krenair, Deskana, Aklapper, Florian.
Deskana renamed this task from Save articles linked in an article, to to Save articles linked in an article, too.May 20 2015, 5:34 AM
Deskana set Security to None.

I think this is done in ~150 lines in the Crawler sample of the OkHttp library. However, this uses a library called Jsoup for anchor selection and we would probably want to limit the depth

We kind of have the framework for this functionality now. From my comment in T108706:

This should a lot easier to fix with the new saved page architecture. Just add another parser for anchors, like ImageTagParser, filter out the ones you want, and wire it into PageImageUrlParser.

PageImageUrlParser should probably be renamed or split up though.