Page MenuHomePhabricator

help needed with encoding statement value before pass it into formatter URLs for three SMILES related properties
Closed, ResolvedPublic

Description

The Wikidata SMILES properties (canonican SMILES, isomeric SMILES, CXSMILES) values all have characters that need encoding before passed as $1 in the formatter URL (for 'canonical SMILES' it is https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=$1&zoom=2.0&annotate=none). Currently this causes broken links out the CDK Depict. Now, I have seen (cannot remember which property) a similar problem where the statement value needs preprocessing before it links out. I like to set this up for the SMILES properties, but do not have entirely clear to how to start this effort (besides finding what property was doing something like what I think may be the solution).

A simpler alternative is that the 'formatter URL' approach allowed encoding the statement value before pass as $1.

Example CXSMILES where the simple 'formatter URL' does not work:

https://www.wikidata.org/wiki/Q46328873
statement value: CC([N+])C(=O)OC(CO)COP(=O)([O-])OCC(COC(=O)[*])OC(=O)[*] |$;;;;;;;;;;;;;;;;;;;;;_R1;;;;_R2$|

Example SMILES that does not work:

https://www.wikidata.org/wiki/Q133145
statement value: C#C
problem: the # messes up the simple $1 URL formatter URL

BTW, another alternative is that the open source CDKDepict would just be part of the WMF portfolio stack and (CX)SMILES would be visualized inline. CDKDepict would take the SMILES statement value as input (properly encoded, of course) and returns SVG which could be embedded.

Event Timeline

Hi @EgonWillighagen, thanks for taking the time to report this. Is this a bug report or a feature request? If it's either of them, could you please use the corresponding template? Thanks! :)

Hi @EgonWillighagen, thanks for taking the time to report this. Is this a bug report or a feature request? If it's either of them, could you please use the corresponding template? Thanks! :)

Yeah, I wasn't actually really sure either. It's a bug in the sense that the current practical implementation doesn't work. But to some extend it also was not designed to be pushed this far, perhaps, and therefore more like a feature. So, my choice I can have driven by how urgent I want it solved: urgent (I think it would really help a really lot, so then a bug. But I also know how hard the team is already working.

But yeah, if I report it as bug, it's still up to the team to decide how to priotize it.

Now, the problem is that the bug template is specifically for software bugs. But this is a software/data interaction bug... so, still use the "software bug" template?

the # messes up the simple $1 URL formatter URL

This is a url encoding problem then. Do you have a link where this is actually occurring ?

Property: https://www.wikidata.org/wiki/Property:P233

Correct url: https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=CC([N+])C(=O)OC(CO)COP(=O)([O-])OCC(COC(=O)[R1])OC(=O)[R2]&zoom=2.0&annotate=none from https://www.wikidata.org/wiki/Q46328873

Broken url: https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=C#C&zoom=2.0&annotate=none should be https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=C%23C&zoom=2.0&annotate=none from https://www.wikidata.org/wiki/Q133145

Doesn't have anything to do with our math projects. Recat'ing under wikidata

@TheDJ, that Math-Chemistry-Support is not (also) about chemistry?

This is essentially: T160281

yes, same issue, but maybe not the same solution.

@TheDJ, that Math-Chemistry-Support is not (also) about chemistry?

Math-Chemistry-Support is a project specifically about defining these symbols using our Math/LateX wikicode extension.

Math-Chemistry-Support is a project specifically about defining these symbols using our Math/LateX wikicode extension.

Ah, got it. Yeah, theoretically possible, as there is a LaTeX package for drawing chemical structures, but I'm not aware of a really good, open source tool to convert SMILES (-variants) into TeX. Yes, agreed it doesn't fit there.

This is a url encoding problem then. Do you have a link where this is actually occurring ?

I will write up some examples later today using the "bug" template, to highlight some issues.

List of steps to reproduce (step by step, including full links if applicable):

What happens?:
CDKDepict which the canonical SMILES links to shows methane

image.png (42×88 px, 770 B)

instead of hydrogen cyanide:

image.png (42×88 px, 712 B)

What should have happened instead?:
The canonical SMILES should be URL encoded before added as $1 in the formatter URL, giving https://www.simolecule.com/cdkdepict/depict/bow/svg?smi=C%23N

I will write up some examples later today using the "bug" template, to highlight some issues.

One done: https://phabricator.wikimedia.org/T307662#7911276

Thanks for the ping! That page was indeed the lead I had at the time and reason to file this issue, because I could not work out (in the time I had) how to update that.

But the solution turned out to be a lot easier for Wikidata: https://www.wikidata.org/w/index.php?title=MediaWiki%3AGadget-AuthorityControl.js&type=revision&diff=1694196586&oldid=1409657932

This was done by Nikky only last weekend (https://chem-bla-ics.blogspot.com/2022/08/wikidata-now-escapes-smiles-and-cxsmiles.html) and since I had to focus on student report grading, I forgot to update this ticket.

matej_suchanek assigned this task to Nikki.