Page MenuHomePhabricator

random-pdf-rendering-es.wikipedia.org

Authored By
Jgiannelos
Nov 6 2020, 2:09 PM
Size
2 KB
Referenced Files
None
Subscribers
None

random-pdf-rendering-es.wikipedia.org

In [10]: import requests
In [11]: import io
In [12]: from pdfminer.high_level import extract_text
In [13]: valid = 0
...: invalid = 0
...: for i in range(20):
...: random = requests.get('https://es.wikipedia.org/api/rest_v1/page/random/title')
...: title = random.json()['items'][0]['title']
...: print('Random page title: {}'.format(title))
...: pdf = requests.get('https://en.wikipedia.org/api/rest_v1/page/pdf/{}'.format(title))
...: file = io.BytesIO(pdf.content)
...: try:
...: extract_text(file)
...: valid += 1
...: print('Valid pdf: {}'.format(title))
...: except:
...: invalid += 1
...: print('Invalid pdf: {}'.format(title))
...:
Random page title: John_Maynard_Smith
Valid pdf: John_Maynard_Smith
Random page title: Ruta_Estatal_de_California_4
Invalid pdf: Ruta_Estatal_de_California_4
Random page title: Eduardo_Caballero_Calderón
Valid pdf: Eduardo_Caballero_Calderón
Random page title: Wavignies
Valid pdf: Wavignies
Random page title: Aristarj_Lentúlov
Invalid pdf: Aristarj_Lentúlov
Random page title: Consejo_Internacional_para_la_Ciencia
Invalid pdf: Consejo_Internacional_para_la_Ciencia
Random page title: Mantra
Invalid pdf: Mantra
Random page title: Saxifragaceae
Valid pdf: Saxifragaceae
Random page title: Carretera_Panamericana
Valid pdf: Carretera_Panamericana
Random page title: General_Conesa_(Buenos_Aires)
Invalid pdf: General_Conesa_(Buenos_Aires)
Random page title: Cazuela_(comida)
Invalid pdf: Cazuela_(comida)
Random page title: Los_Cerralbos
Valid pdf: Los_Cerralbos
Random page title: Puerta_de_Brandeburgo
Invalid pdf: Puerta_de_Brandeburgo
Random page title: Glos-la-Ferrière
Valid pdf: Glos-la-Ferrière
Random page title: Glucólisis
Invalid pdf: Glucólisis
Random page title: Joan_Crawford
Invalid pdf: Joan_Crawford
Random page title: Festhalle
Valid pdf: Festhalle
Random page title: Divisoria_de_aguas
Invalid pdf: Divisoria_de_aguas
Random page title: Museo_Arqueológico_Nacional_de_Tarragona
Invalid pdf: Museo_Arqueológico_Nacional_de_Tarragona
Random page title: Arenal_(río)
Invalid pdf: Arenal_(río)
In [14]: valid
Out[14]: 8
In [15]: invalid
Out[15]: 12

File Metadata

Mime Type
text/plain; charset=utf-8
Storage Engine
blob
Storage Format
Raw Data
Storage Handle
8614033
Default Alt Text
random-pdf-rendering-es.wikipedia.org (2 KB)

Event Timeline