Page MenuHomePhabricator

S9a8m (Samuel Clark)
User

Projects

User does not belong to any projects.

Today

  • Clear sailing ahead.

Tomorrow

  • Clear sailing ahead.

Sunday

  • Clear sailing ahead.

User Details

User Since
Oct 26 2018, 8:42 AM (43 w, 1 h)
Availability
Available
LDAP User
Unknown
MediaWiki User
S9a8m [ Global Accounts ]

Recent Activity

Oct 27 2018

S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Oct 27 2018, 1:28 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

The final database!!

Oct 27 2018, 12:51 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

This isn't even my final form...

Oct 27 2018, 12:38 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Oct 27 2018, 12:30 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Oct 27 2018, 12:23 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Oct 27 2018, 12:14 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Oct 27 2018, 12:14 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

And finally, one that actually works:

Oct 27 2018, 11:51 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

So the last one still had horrible bugs, this one is better(-ish) I promise:

Oct 27 2018, 11:42 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Updated code without the horrible horrible bugs below:

Oct 27 2018, 11:31 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

I'll run 0-500

Oct 27 2018, 10:49 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Using code below, running in blocks of 500. Make sure to change the destination for the csv file.

Oct 27 2018, 10:47 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Current program running as so:

Oct 27 2018, 8:39 AM · Wikistorm-2019, Wikidata, patch-welcome

Oct 26 2018

S9a8m renamed T208036: Scrape chemical names and data for WikiData from chemical databases from Scrape chemical names and data for WikiData from ChemSpider to Scrape chemical names and data for WikiData from chemical databases.
Oct 26 2018, 2:45 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Scraper now works well, however getting the relative molecular mass out is difficult - easier to build a function to calculate this for ourselves, using the data found here https://www.science.co.il/elements/?s=Weight

Oct 26 2018, 1:29 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Below code will scrape chemical names from PubChem really nicely, will try to get more data out. Could someone make a csv writer module?

Oct 26 2018, 12:52 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Okay, we can now print names from Chemspider, which seems to slow down after 11 entries - will try and apply a similar approach to PubChem

Oct 26 2018, 12:47 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Currently working in the main hacking room, at the small table close to the door

Oct 26 2018, 12:28 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

An alternative place to get data would be https://www.accessdata.fda.gov/scripts/cder/daf/index.cfm?event=overview.process&ApplNo={X} (again {X} being an integer) which lists FDA-approved medicinal compounds

Oct 26 2018, 12:27 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Any help would be much appreciated, I have no experience scraping with Python so likely to be a bit slow

Oct 26 2018, 12:26 PM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a comment to T208036: Scrape chemical names and data for WikiData from chemical databases.

Studying the HTML file of a typical page (http://www.chemspider.com/Chemical-Structure.175.html) shows that the tag <h1 class="h4"> is only used once, in close proximity to the chemical name. Need to learn how to open HTML and extract characters following this tag. Will approach with Python.

Oct 26 2018, 11:56 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a project to T208036: Scrape chemical names and data for WikiData from chemical databases: Wikidata.
Oct 26 2018, 11:53 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m created T208036: Scrape chemical names and data for WikiData from chemical databases.
Oct 26 2018, 11:53 AM · Wikistorm-2019, Wikidata, patch-welcome
S9a8m added a watcher for Wikistorm-2018: S9a8m.
Oct 26 2018, 11:37 AM