Wikipedia:WikiProject Chemistry/IRC discussions/29 Jan 2008

From Wikipedia, the free encyclopedia

--- Log opened Tue Jan 29 11:04:02 EST 2008

11:04 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has joined #wikichem

11:04 <+Rifleman_82> All Rise

11:04 <+Rifleman_82> :)

11:04 <ChemSpiderMan> hi

11:04 -!- mode/#wikichem [+o Rifleman_82] by ChanServ

11:05 -!- mode/#wikichem [+oo ChemSpiderMan walkerma] by Rifleman_82

11:05 <@walkerma> Hi, sorry I'm a couple of minutes late, we had some freezing rain

11:05 -!- mode/#wikichem [-o Rifleman_82] by Rifleman_82

11:05 <+Rifleman_82> i think today there'll be less people present

11:05 <+Rifleman_82> walkerma: would you like to begin?

11:05 <+Rifleman_82> dmacks is probably away, but he's logging

11:05 <@walkerma> Should we try and get PC?

11:06 <+Rifleman_82> he's not on irc

11:07 <@walkerma> I just sent him a quick email, hopefully he'll come. He & I talked a bit with DMacks last week on IRC informally

11:07 <@walkerma> OK, let's start?

11:07 <+Rifleman_82> ok

11:07 <@ChemSpiderMan> ok

11:07 <@walkerma> "How can we handle structural identifiers such as InChIs and SMILES properly? These are designed for machine-reading, but people may often use our visible info to "copy and paste" into a search engine."

11:08 <@walkerma> Hopefully you had a chance to look over the survey I did

11:08 <+Rifleman_82> results are quite disappointing

11:08 <@walkerma> In what way?

11:09 <+Rifleman_82> finding a primary key is tough, the standards (SMILES & INCHI)aren't really that standard

11:09 <@walkerma> Oh, yes!

11:09 <+Rifleman_82> i agree with the point too that we can't really hope to replace the CRC handbook

11:09 <+Rifleman_82> we can cover all the important compounds, but not the more obscure

11:09 <+Rifleman_82> which i guess, is fair enough in that they probably aren't all that encyclopedic

11:10 <@ChemSpiderMan> there are a lot on WP that are FAR from enyclopedic...

11:10 <@ChemSpiderMan> in fact some I can ONLY find on WP

11:10 <@walkerma> I don't know that CRC has too many obscure ones

11:10 <@ChemSpiderMan> it doesn't

11:10 <+Rifleman_82> http://en.wikipedia.org/wiki/2-Phenylhexane

11:10 <@walkerma> But we sometimes have "hot" new compounds

11:11 <@walkerma> Phenylhexane not being an example of that!

11:11 <+Rifleman_82> i think the best we can hope for is for us to complement, not replace CRC

11:11 <@walkerma> That's fine!

11:11 <+Rifleman_82> a quick ref, instead of digging out a hefty tome

11:11 <+Rifleman_82> walkerma: elaborate on phenylhexane?

11:11 <+Rifleman_82> a quick, *reliable* ref

11:12 <@walkerma> In case people don't know, it was nominated for deletion. I tried to find some uses for it, but couldn't

11:12 <+Rifleman_82> it seems a common catalysis target

11:12 <@walkerma> I only found lots of studies on FC alkylation of benzene, reporting ratios

11:12 <+Rifleman_82> i fired up scifinder and copied a few refs for the catalysts... that's all

11:13 <@walkerma> But I don't think it has any major applications

11:13 <+Rifleman_82> there are more, but i plucked the low-lying fruit, those where the catalysts are stated instead of being listed as C:asdlfkajsd;

11:13 <@ChemSpiderMan> That's true for a lot of the PIKHAL collection I thinl

11:13 <+Rifleman_82> what about pikhal?

11:13 <@walkerma> I suspect it was someone testing out

11:14 <@walkerma> how to write a compound article

11:14 <+Rifleman_82> hexylbenzene?

11:14 <+Rifleman_82> erm, phenylhexane/

11:14 <@ChemSpiderMan> there are dozens of compiunds from PIKHAL...it's a series of compounds all from the same publication but do they all need to be in there? What value?

11:15 <+Rifleman_82> abused substances/

11:15 <+Rifleman_82> ?

11:15 <@walkerma> Because things from PIKHAL are inherently fascinating to a certain group of people

11:16 <@walkerma> Anyway, I think I'd like to get back to InChIs etc, is that OK?

11:16 <@ChemSpiderMan> sure

11:16 <@walkerma> My impression from the responses was that:

11:16 <@walkerma> (a) Sometimes people do want to copy/paste from WP but

11:17 <@walkerma> (b) They say they don't need to SEE the InChI or whatever explicitly on the main article page

11:17 <@ChemSpiderMan> that includes my view

11:17 <@walkerma> (c) But they would very much like the links to be just a click away

11:17 <@walkerma> So how should we best do that?

11:18 <+Rifleman_82> linkfarm?

11:18 <@walkerma> Explain

11:18 <+Rifleman_82> click out to a linkfarm like special:booksources

11:18 <+Rifleman_82> click on any ISBN (number) and it jumps to a list of possible sources

11:19 <@walkerma> Would that linkfarm have (say) all the InChIs for all of the organic compounds in our collection?

11:19 <@ChemSpiderMan> that's if they want to search...not if they want to copy

11:19 <+Rifleman_82> the linkfarm could include chemspider, emolecules, or anything else which accepts smiles/inchi search strings

11:20 <@ChemSpiderMan> that has value but is different from copying the InChI to paste into a converter to generate the structure

11:20 <@ChemSpiderMan> same thing as with SMILEs...you can lead them to a link farm but "I" copied SMILES to generate structures

11:20 <+Rifleman_82> oh, i do that quite often, copy smiles and inchi to chemsketch to generate a structure

11:20 <+Rifleman_82> if you want that, it's gotta be visible, or at least 1 click away

11:21 <+Rifleman_82> visible is fine, i like

International Chemical Identifier
InChI=
InChIKey=
CASRN=
PIN=

11:21 <@walkerma> I think what one respondent suggested was an excellent idea - if it can be done. You have the word InChI in the box, then if you click on that it brings up a Google search (or something like that) for that InChI.

11:22 <@walkerma> The click puts the actual InChI (sorry, InChIKey) into the search engine for you

11:23 <@ChemSpiderMan> I don't like shwing InChi...

11:23 <@ChemSpiderMan> Erythromycin: InChI=1/C37H67NO13/c1-14-25-37(10,45)30(41)20(4)27(39)18(2)16-35(8,44)32(51-34-28(40)24(38(11)12)15-19(3)47-34)21(5)29(22(6)33(43)49-25)50-26-17-36(9,46-13)31(42)23(7)48-26/h18-26,28-32,34,40-42,44-45H,14-17H2,1-13H3/t18-,19-,20+,21+,22-,23+,24+,25-,26+,28-,29+,30-,31+,32-,34+,35-,36-,37-/m1/s1

11:23 <+Rifleman_82> what is inchikey? is it a hash function of the inchi?

11:23 <@walkerma> You wouldn't see the InChI on the page at all

11:23 <@ChemSpiderMan> yes

11:23 <@ChemSpiderMan> hash ...cannot be converted to structure

11:23 <@ChemSpiderMan> has to be used to lookup

11:24 <@walkerma> You would only see things like InChI when you clicked on the word "InChI in the ChemBox.

11:24 <+Rifleman_82> can we talk about how inchis are not unique?

11:24 <+Rifleman_82> are they or are they not unique?

11:24 <@walkerma> In a minute? Is that OK?

11:24 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html

11:24 <@walkerma> I'd like to resolve the display problem first

11:25 <@ChemSpiderMan> I like your approach Martin

11:25 <@walkerma> Can it be done, Beetstra?

11:25 <@ChemSpiderMan> see InChI ONLY whecn clicking on "InChI in the CHembox"

11:26 <@walkerma> Is Beetstra awake?

11:26 <+Rifleman_82> don't think so

11:27 * Beetstra awakes a bit

11:28 <@walkerma> While he looks over things - let me mention one of the alternatives:

International Chemical Identifier
InChI=
InChIKey=
CASRN=
PIN=

11:28 <@walkerma> This is what PC wrote - clever code, but requires a separate "data box" at the bottom of the page

11:28 <+Beetstra> Ah, the linkfarm-solution. Yes, that can be done, I have written a wikipedia extension once .. but it has never reached application, they are not happy with those pages for the 'smaller' things

11:28 <@walkerma> See http://en.wikipedia.org/wiki/Tributylphosphine

11:29 <@walkerma> It seems to me we have three options on the table:

11:29 <@walkerma> 1. The linkfarm idea

11:29 <+Rifleman_82> oh, that's ugly, the

International Chemical Identifier
InChI=
InChIKey=
CASRN=
PIN=

method. results look great, but too much work!

11:29 <@walkerma> 2. The "click to see or search on InChI" idea

11:30 <@walkerma> 3. The

International Chemical Identifier
InChI=
InChIKey=
CASRN=
PIN=

approach

11:30 <@walkerma> I wanted Dirk to comment on the tech feasibility of #2

11:31 <+Beetstra> Searching on that InChI is difficult .. seeing, maybe

11:31 <+Beetstra> nah, that is going to be the same problem I think

11:31 <@ChemSpiderMan> why is searching on it difficult?

11:31 <+Beetstra> you would have to feed the inchi to the new page .. which would be similar to option 1

11:32 <+Beetstra> Otherwise it should just be incorporated in a search link .. www.google.com/ ..

11:32 <+Rifleman_82> we can have #1 and #2 at the same time

11:32 <+Beetstra> And you can make that show something else

11:32 <+Beetstra> Not without developers

11:32 <+Rifleman_82> InChI show search

11:32 <@walkerma> Yes, that would be great!

11:32 <+Beetstra> Yes, something like that

11:32 <@walkerma> Can we do it?

11:32 <+Beetstra> but where does the search have to point to?

11:33 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html

11:33 <+Rifleman_82> we can either have the Special:ChemSearch parallel to Special:Booksources (yes, there are prolems), or we can set up our own link farm?

11:33 <@ChemSpiderMan> this is a google search from the InChIKey

11:33 <+Beetstra> no, show does not work, that needs the special page

11:33 <+Beetstra> I did it with an own linkfarm, like special:booksources does

11:33 <@ChemSpiderMan> can do the same with the InChIString BUT know that google will COMMONLY fail you

11:34 <+Rifleman_82> dmoz

11:34 <@walkerma> So apparently ChemSpider offers this type of one-click Google search on the InChIKey, right?

11:35 <@ChemSpiderMan> yes

11:35 <@ChemSpiderMan> and on the InChI string

11:35 <+Rifleman_82> can we convert InChI to InChIkeys on the fly using wikipedia?

11:35 <+Rifleman_82> can we code a template?

11:35 <@ChemSpiderMan> Notice this comment about InChI string....read the last line on this "The condensed, 25 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.[2] Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy sear

11:35 <@ChemSpiderMan> read the last line...

11:36 <@ChemSpiderMan> searching InChiString is problematic...the indexers BREAK it

11:36 <@ChemSpiderMan> generating InChis on the fly is VERY fast...but you will have to pass a connection table from WP to the InChI DLL

11:36 <@ChemSpiderMan> so you will need to STORE the connection table

11:36 <@ChemSpiderMan> or pass the SMILES to the INCHI DLL or Openababel DLL

11:37 <@ChemSpiderMan> blah, blah, blah..

11:37 <+Rifleman_82> walkerma?

11:38 <@walkerma> I can't comment on the techniical feasibility. But the benefits are immense if we can get it to work

11:38 <@walkerma> If we can do this, and ChemSpider is doing it, then people will be able to start using Google and actually finding things

11:39 <@walkerma> So you can start to say, "I want to see if there is any info on hexylbenzene"

11:39 <@walkerma> And you can search by InChIKey (I think these will become the de facto standard for organics for online searches)

11:39 <@ChemSpiderMan> why don't we simply set up a webservice on ChemSPider

11:40 <+Rifleman_82> can inchikeys be a permanent replacement for inchis ? such that we can think of structure --> inchikey, skipping inchi directly?

11:40 <@walkerma> Explain

11:40 <@ChemSpiderMan> WP can hit the appropriate search button and pass a CSID over to ChemSPider to spawn the search.

11:40 <@ChemSpiderMan> But this is making WP dependent on CS and I don't think you should.

11:41 <@ChemSpiderMan> InCHIs will be around for a long time...

11:41 <@ChemSpiderMan> no guarantee that CS will be

11:41 <@ChemSpiderMan> We have a whole of web services for InChI already

11:41 <@ChemSpiderMan> http://www.chemspider.com/InChI.asmx

11:41 <+Rifleman_82> we have pubchemid linking to pubchem, your chemspider id can be another... but we might have some complaints about conflict of interest

11:42 <@ChemSpiderMan> I'll guarantee that

11:42 -!- egonw [n=egonw@kokosnoot.wur.nl] has joined #wikichem

11:42 -!- mode/#wikichem [+v egonw] by ChanServ

11:42 <+Rifleman_82> hi egon

11:42 <+egonw> hi

11:42 <+egonw> I made it :)

11:42 <@walkerma> Great!

11:42 <+egonw> hi ChemSpiderMan!

11:42 <@ChemSpiderMan> hi

11:42 <+egonw> hi walkerma

11:43 <+egonw> 15 minutes, right?

11:43 <+Rifleman_82> no, you're 45 minutes late :P

11:43 <+egonw> no?!

11:43 <+egonw> really?

11:43 <+egonw> 17:00 UTC, not?

11:43 <@walkerma> Yes, sorry! 1600h UTC

11:43 <+CheMoBot> user:Cherry blossom tree has edited monitored page Wikipedia talk:WikiProject Chemistry - diff - (+880)- summary: /* Reminder of the Philip Greenspun Illustration project */ new section

11:44 <+Rifleman_82> you missed the beginning of our discussion

11:45 <+Rifleman_82> we talked about external commenters' comments about whether they use wikipedia, whether they need the inchis to be shown

11:45 <+egonw> ok, is the channel logged? then I'll read up...

11:45 <+Rifleman_82> yes it is, dmacks will publish later

11:45 <@walkerma> I just sent you a log

11:45 <+egonw> thanx

11:45 <@walkerma> Read your email

11:46 <@walkerma> We're wondering how we can best give people access to info like InChIs on WP, without cluttering up pages

11:46 <@walkerma> and causing display problems

11:47 <@walkerma> Main ideas on the table:

11:47 <@walkerma> 1. The linkfarm idea

11:47 <@walkerma> 2. The "click to see or search on InChI" idea

11:47 <@walkerma> 3. The

International Chemical Identifier
InChI=
InChIKey=
CASRN=
PIN=

approach

11:48 <@walkerma> So Rifleman_82, which is the best option that is workable? And how should we proceed?

11:50 <+Rifleman_82> i'll give a conditional answer, because there are some issues i think need to be worked out

11:50 <+Rifleman_82> imho, i think it is best that the inchi be displayed, with a search link

11:51 <+Rifleman_82> if we treat inchi as plain text, and get it to break every 20 chars or so, we can avoid having it stretch across the screen

11:51 <+Rifleman_82> they will be soft breaks, not hard
breaks

11:51 <+Rifleman_82> soft break = "text wrapping"

11:51 <+Rifleman_82> i don't think we should hide them

11:51 <+Rifleman_82> and i don't think it'll be easy to have a "show" to show the thing

11:51 <+egonw> there would be a start and end 'codon

11:51 <+egonw> '?

11:52 <+Rifleman_82> and if we can let it "show" without breaking the page, we can have it shown by default

11:52 <+Rifleman_82> am i making sense?

11:52 <+Rifleman_82> sheesh

11:52 <@walkerma> Yes, though I don't know how to do soft line breaks

11:53 <+egonw> a space?

11:53 <+Rifleman_82> perhaps you can give me til our next meeting to find out? ii'm sure it can be done, just a matter of how...

11:53 <+Rifleman_82> no, a space will break the string

11:53 <+Rifleman_82> render it unsearchable

11:53 <+Rifleman_82> it is conceptually identical to a manually added

11:53 <+egonw> right

11:53 <+Rifleman_82> the other problem is presentation - arbitrary spaces will look odd when the browser windows are non-identically sized

11:54 <@walkerma> It must be doable, because if you use

things wrap OK

11:54 <+Rifleman_82> they usually wrap on existing spaces?

11:54 <@walkerma> I think....

11:54 <+egonw> I think so too

11:55 <+egonw> played with CSS to do this kind of time, but never found something satifying

11:56 <@walkerma> So for action, can Rifleman_82 agree to look into that?

11:56 <+Rifleman_82> yeah, i'll look into that

11:56 <+Rifleman_82> as a side issue

11:56 <+Rifleman_82> if we can use this method to make long IUPAC names break nicely, it will be an added bonus

11:56 <@walkerma> And Beetstra, do you have contacts we could ask about link farms, or do you think that idea is dead in the water?

11:57 <+Rifleman_82> break at hyphens preferably, break arbitrarily at 20 characters if if need be

11:57 <@walkerma> Sounds good, R82

11:58 <+Beetstra> Walkerma, it runs on chemistry.poolspares.com (don't ask about the domain, it is a pure test-wiki; and spam bots have found it already) ..

11:58 <+Rifleman_82> robots.txt?

11:58 <+Beetstra> No, they just scan for wikis and edit it

11:59 <+Beetstra> I should have blocked everything but administrators ..

11:59 <+Beetstra> See the chemical sources from the main page

11:59 <+Beetstra> and one of the example pages (water e.g.)

12:00 <+Beetstra> in the chembox are some links, like name, formula

12:01 <+Beetstra> But the developers were not really in to such special pages .. then there will be more etc.

12:02 <+Rifleman_82> heh, or we could migrate to poolspares :)

12:02 <@walkerma> OK, I will talk to my friend Kelson (Emmanuel) from the French WP, he's a full time developer

12:03 <@walkerma> I'll see if he can come up with a solution

12:03 <@walkerma> Either 1, 2 or 3!

12:03 <@walkerma> But in the meantime, the soft line breaks would make things much better...

12:04 <@walkerma> Can we move on in the agenda?

12:04 * Beetstra has to leave .. see you all later!

12:04 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit ["Bye Bye"]

12:04 <@walkerma> OK, thanks

12:04 <@ChemSpiderMan> bye Drik

12:04 <@ChemSpiderMan> Dirk

12:04 <@walkerma> I hope we can finish fairly soon

12:05 <@walkerma> But I'd like to ask if we can "officially" adopt InChIKeys

12:06 <@walkerma> i.e., we'd agree to support them as a standard format

12:06 <@walkerma> for representing structures

12:06 <@walkerma> I accept that full InChIs are already somewhat established

12:06 <+egonw> sounds good to me

12:06 <+Rifleman_82> are they unique?

12:06 <@walkerma> but I think for a quick copy, paste and search they are the way to go, and much better than SMILES

12:07 <+Rifleman_82> if they are, they can be primary key?

12:07 <+egonw> theoretically not, practically no clashes have been found

12:07 <+Rifleman_82> walkerma's sandbox has complaints of inchis not being unique

12:07 <+Rifleman_82> andsince inchikeys are hashed from inchis...

12:07 <+egonw> inchis *are* unique

12:07 <@ChemSpiderMan> read the InCHi article on WP

12:07 <@walkerma> The chance of a clash has been estimated as 1 in <<total no. of molecules known

12:07 <+egonw> at least when the stereochemistry layer is taking into account

12:08 <+Rifleman_82> rather, can there be more than one inchi for the same molecule?

12:08 <@walkerma> The comment on my Sandbox refers to the fact that

12:08 <+egonw> no

12:08 <@ChemSpiderMan> but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures.

12:08 <@ChemSpiderMan> there are no collisions in Pubhem or CHemspider

12:08 <@walkerma> you can get a different InChI if you draw the stereochemistry differently

12:08 <@ChemSpiderMan> probability is SOOOOOOO low

12:08 <@ChemSpiderMan> yes

12:08 <+Rifleman_82> i see, i was quite concerned about http://en.wikipedia.org/wiki/User:Walkerma/Sandbox5#Comments_1

12:08 <+egonw> walkerma: right, because those molecules are different

12:08 <@ChemSpiderMan> different InChI

12:08 <@walkerma> The R is different from the S is different from the unspecified

12:09 <+Rifleman_82> but if there are no issues, then that's fine

12:09 <@ChemSpiderMan> no issues

12:09 <@walkerma> And I really like the way ChemSpider uses InChIKeys

12:09 <@ChemSpiderMan> do not worry about clashes

12:09 <@walkerma> to do "related compounds" searches

12:09 <@ChemSpiderMan> There are about 5000-6000 structures on WP

12:10 <+Rifleman_82> is there a way to remove stereochemistry from inchikeys?

12:10 <@ChemSpiderMan> yes

12:10 <@ChemSpiderMan> It's the second layer

12:10 <+Rifleman_82> or to use inchi/inchikeys to do substructure searches?

12:10 <+Rifleman_82> similarity searches?

12:10 <+Rifleman_82> oh, i got it, it's the second half of the inchikey?

12:10 <+egonw> no

12:10 <+egonw> inchi(-leus) cannot be used directly for subscructure searching

12:10 <+egonw> leus==keys

12:11 <@ChemSpiderMan> Third time...please read quickly

12:11 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html

12:11 <+Rifleman_82> you'll have to get some other program to take it, expand it, and search, and pass the search results back?

12:11 <+Rifleman_82> sorry tony, i'll go read

12:11 <@ChemSpiderMan> No...InChIkeys can only be looked up...CANNOT be reversed

12:11 <@ChemSpiderMan> It's a hash...cannot reverse it

12:11 <@walkerma> But you can use the ChemSpider lookup table for 20 million compouds, right?

12:12 <@ChemSpiderMan> absolutely

12:12 <+Rifleman_82> it's a static table, isn't it?

12:12 <@walkerma> http://www.chemspider.com/InChI.asmx?op=InChIKeyToInChI

12:12 <+egonw> hope not :)

12:12 <+Rifleman_82> and it can be defined for any arbitrary compound... whether or not it exists

12:12 <+egonw> I hope it keeps growing

12:12 <@ChemSpiderMan> everYDAY

12:13 <@walkerma> Yes, R82, it can

12:13 <@walkerma> For anything drawable

12:13 <@ChemSpiderMan> http://www.chemspider.com/InChI.asmx?op=InChIKeyToInChI is a search.....

12:13 <@ChemSpiderMan> it's NOT a conversion

12:13 <+Rifleman_82> got it, tony

12:13 <+egonw> Rifleman_82: any organic molecule at least

12:13 <@ChemSpiderMan> Internally searches InChIKey against ChemSpider database (>17M unique compounds). Returns empty string in case of failure.

12:13 <+Rifleman_82> though System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.

12:13 <@ChemSpiderMan> :-)

12:13 <+Rifleman_82> tried to search morphine

12:14 <+Rifleman_82> okay let's not discuss my problems now

12:14 <+Rifleman_82> :P

12:14 <@ChemSpiderMan> note Egon's comments...

12:14 <+Rifleman_82> ok

12:14 <@ChemSpiderMan> must be able to generate an InChI...

12:14 <@ChemSpiderMan> many things cannot generate InChIs

12:14 <@ChemSpiderMan> markush, polymers, organometallics, inorganics ...there are exceptions of course

12:14 <+Rifleman_82> can inchi/keys be able to handle inorganics, organometallics, abnormal oxidation states, abnormal coordination modes?

12:15 <+egonw> generally not

12:15 <@ChemSpiderMan> there are lots of things it can handle and InChI itself is being extended by the InChI team

12:15 <+Rifleman_82> so it's a matter of time?

12:15 <@ChemSpiderMan> but today it's primarily for organic molecules

12:15 <@ChemSpiderMan> yes

12:15 <+Rifleman_82> ok

12:15 <@ChemSpiderMan> They ARE working on extending it

12:15 <+egonw> but anything which involves C,H,N,O,P,S,Si, ... it handles just fine

12:15 <@ChemSpiderMan> But it's the same issue I have....

12:15 <+Rifleman_82> octahedral carbon? :)

12:16 <@ChemSpiderMan> I am creating MOLFILES...

12:16 <+egonw> octahedral carbon??

12:16 <@ChemSpiderMan> and SDF

12:16 <+Rifleman_82> yeah, carbon with 6 ligands

12:16 <@ChemSpiderMan> and many things on WP cannot be captured that way

12:16 <+egonw> WP entry?

12:16 <+Rifleman_82> i'll drop you a note when i find it

12:16 <+egonw> you'r not refering to some transition state?

12:16 <+Rifleman_82> no, no

12:17 <@ChemSpiderMan> InChI=1/CH12N6/c2-1(3,4,5,6)7/h2-7H2

12:17 <+Rifleman_82> my friend was telling me that togni made some weird carbon possessing an abnormal structure, carbon center with octahedral symmetry

12:17 <+Rifleman_82> i'll send you the paper whne i find it

12:17 <@ChemSpiderMan> This is a carbon with 6 NH2 groups

12:17 <@ChemSpiderMan> and the InCHI key

12:17 <+Rifleman_82> wow, that's nice :)

12:17 <@ChemSpiderMan> QHARIOGGLZILKP-UHFFFAOYAR

12:18 <+egonw> oi, that's something my atom typing is going to fail on :)

12:18 <+Rifleman_82> inchi does not attempt to assume valence, like smiles does?

12:18 <+egonw> to some extend it is... e.g. for determining tautomerism etc

12:18 <+Rifleman_82> ok

12:19 <@ChemSpiderMan> my suggestion...let's NOT analyze InChI here...that's been done in many other places

12:19 <+egonw> :)

12:19 <+egonw> agreed, it's the best out there, and should be used whenever possible

12:19 <+Rifleman_82> ok

12:20 <@walkerma> OK, I think I'll post something on the wiki about InChIKeys, and we can go with the consensus from there

12:20 <@ChemSpiderMan> I suggest focusing on the issue at hand of whether to adopt it for structures where we can generate it

12:20 <@ChemSpiderMan> and I say yes

12:20 <+egonw> same here

12:20 <@walkerma> I'd like to ask one final thing

12:20 <@ChemSpiderMan> ok

12:21 <@walkerma> Can we convert ChemSpiderMan's SDF file into wiki syntax easily?

12:21 <@walkerma> In other words

12:21 <@walkerma> Can we upload the data from his validated file onto the wiki semi-automatically?

12:21 <@ChemSpiderMan> Rifleman,,,It's a text file

12:21 <@walkerma> If so, how will it be done?

12:21 <@ChemSpiderMan> Easy to separate each structue in the file.

12:21 <@walkerma> Will we need a bot?

12:22 <@ChemSpiderMan> I have the article name for each record so you CAN link it and stream in the text

12:22 <@ChemSpiderMan> for each record

12:22 <@ChemSpiderMan> I can name the fields for you

12:22 <+Rifleman_82> i don't think chem-awb can do it

12:22 <@walkerma> ok

12:22 <@ChemSpiderMan> SMILES, Name, InCHIString, InCHIkey etc

12:22 <+Rifleman_82> you can pay grad students to be data entry clerks... 10 cents for each article...? :)

12:23 <@ChemSpiderMan> Egon will get how easy this is "theoretically"

12:23 <@ChemSpiderMan> trascription errors..

12:23 <@ChemSpiderMan> let's NOT have someone type it in....for 5000 articles

12:23 <@ChemSpiderMan> one slipped key :-(

12:23 <+Rifleman_82> haha

12:23 <+Rifleman_82> and you pay another grad student 10 cents per article to check them :P

12:24 <+egonw> a bot sounds like the way to go

12:24 <+Rifleman_82> LOL

12:24 <+egonw> however... that assumes ChemSpiderMan has the mapping...

12:24 <+egonw> also, consider overwriting fields...

12:24 <+Rifleman_82> if we're going to do this on a regular basis, perhaps we can write a dedicated maintenance bot

12:24 <@ChemSpiderMan> I can provide names to map...this is easy from my side (I think)

12:24 <@ChemSpiderMan> yes....would be ideal...

12:24 <+Rifleman_82> which runs through the entire list every now and then to check?

12:25 <+Rifleman_82> apart from ideas i'm not sure how to do it

12:25 <+egonw> " I can provide names to map"

12:25 <+egonw> based on compound names?

12:25 <@ChemSpiderMan> provided things have a ChemBox available this is easy I think

12:25 <@walkerma> Sounds good to me

12:25 <@ChemSpiderMan> We take the ChemBox fields and I map the names in my SDF to the ChemBox names

12:25 <+egonw> I'd have the bot cross check against other identifiers available in the ChemBox

12:25 <+egonw> like PubChem ID

12:26 <@ChemSpiderMan> Primary key is article name

12:26 <+egonw> and only if consistent add it, otherwise reports as 'difficult case'

12:26 <@ChemSpiderMan> it's one of the fields in the SDF for each structure

12:26 <+egonw> ChemSpiderMan: ok, so you do got a "inchi <-> WP page" map

12:26 <+egonw> 1:1 that is

12:26 <@ChemSpiderMan> I am NOT copying PubChem IDs into my file...I am not chekcing them

12:26 <@ChemSpiderMan> hold on..

12:27 <@ChemSpiderMan> I have a field in the SDF called URL

12:27 <@ChemSpiderMan> This is one of them

12:27 <@ChemSpiderMan> http://en.wikipedia.org/wiki/5-MeO-DET

12:27 <+egonw> that map is curated? if so, that could go in automatically...

12:27 <@ChemSpiderMan> EXACTLY!!!

12:27 <@ChemSpiderMan> it's a breeze I believ

12:27 <@ChemSpiderMan> believe

12:27 <+egonw> I'd still cross check with info already present in the ChemBox

12:28 <@ChemSpiderMan> yes...for more checking

12:28 <+Rifleman_82> ChemSpiderMan: you want to tell us about the problems from the last 10 or 50?

12:28 <@ChemSpiderMan> :-( must I? I've sent the files and they are annotated to the maximum with the issues

12:28 <@ChemSpiderMan> I have stepped out of a meeting to do this...

12:28 <@ChemSpiderMan> I need to get back

12:28 <+Rifleman_82> ok

12:29 <+Rifleman_82> next time then

12:29 <@walkerma> The main issue is with CAS, right? I think that will be the agenda for another meeting

12:29 <@ChemSpiderMan> the big issue is the one I suggested...CAS NUMBERS

12:29 <@ChemSpiderMan> I think 3/7 idd not agree with the structure drawn

12:29 <+egonw> CAS numbers are intrinsically difficult

12:29 <@ChemSpiderMan> so what's wrong...the structure or the CAS?

12:29 <@ChemSpiderMan> yes...I agree Egon...they need validating..

12:29 <@ChemSpiderMan> and what actually needs to be done

12:29 <+egonw> and would not suggest checking against CAS in the ChemBox

12:30 <@ChemSpiderMan> is the structure needs to be searched in the registry to FIND the right CAS number

12:30 <+egonw> we don't have access to the CAS database, so can't even (formally) validate those

12:30 <@walkerma> Yes

12:30 <@ChemSpiderMan> exactly.

12:30 <@walkerma> OK, shall we agree to meet next week - same time, same place, to talk about CAS nos?

12:30 <@ChemSpiderMan> ok

12:30 <+Rifleman_82> sure

12:30 <+Rifleman_82> quick one before we finish?

12:30 <@walkerma> OK

12:30 <+Rifleman_82> request, more than antyhign else

12:30 <@ChemSpiderMan> ok

12:30 <+Rifleman_82> take a look at http://en.wikipedia.org/wiki/User:Rifleman_82/Functional_groups_style_guidelines and edit as you see fit!

12:31 <+Rifleman_82> once it's finalized i'll start making navboxes and rewriting articles to fit

12:31 <+Rifleman_82> i mean, that is if you guys agree with the hwole idea in the first place

12:31 <+egonw> one quick comment

12:31 <@walkerma> OK, I was planning on giving feedback on that today

12:31 <+egonw> functional groups down there -> azo compound

12:32 <+egonw> please be careful about mixing up 'contains the functional group' and 'compound classes'

12:32 <@walkerma> Yes, I was going to say the same thing

12:32 <+Rifleman_82> it started off as functional groups

12:32 <@walkerma> (I'm really bad at mixing them up myself!)

12:32 <+Rifleman_82> and i stole the functional groups template

12:32 <+egonw> in the sense that: don't mix them up

12:32 <+Rifleman_82> but i thought about it and i think we can include more than that, which i spelled out in the scope

12:32 <@walkerma> Hydroxy= functional group, Alcohol = family

12:32 <+Rifleman_82> at the top

12:33 <+egonw> acohol is actually both :)

12:33 <+Rifleman_82> amine too

12:33 <+Rifleman_82> amine group, amine...ssss

12:33 <+egonw> but azo compound certainly not

12:33 <@walkerma> OK, shows how confused I am...! My text says otherwise...

12:33 <+egonw> and alkane is typically not considered a functional group

12:33 <+egonw> but more like the 'unfuntional' part of the molecule

12:33 <+Rifleman_82> haha

12:34 <+Rifleman_82> true true

12:34 <@ChemSpiderMan> gotta go guys

12:34 <@ChemSpiderMan> bye

12:34 <@walkerma> Yes, byt

12:34 <+egonw> bye ChemSpiderMan

12:34 <@walkerma> bye

12:34 <+Rifleman_82> nite tony

12:34 <+egonw> nice chatting with you for once (instead of blogging :)

12:34 -!- ChemSpiderMan [n=ChemSpid@c-68-33-151-242.hsd1.md.comcast.net] has left #wikichem []

12:34 <@walkerma> See people next week! Thanks for coming, Egon

12:34 <+egonw> it's a really horrible time for me to attend

12:34 <+egonw> both 16 and 17 UTC

12:34 <+Rifleman_82> what time is it for you, egonw?

12:35 <+egonw> I'm UTC+1

12:35 <+Rifleman_82> close to the end of office hours/

12:35 <+egonw> so dinner time, and/or being-in-traffic time

12:35 <+egonw> just after it, actually...

12:35 <+Rifleman_82> walkerma: tomorrow's my last day at work, so next week onward i can stay much later... would that help anyone?

12:35 <@walkerma> I could perhaps meet one hour earlier - but we have to be careful, many people in western US then would get excluded

12:35 <+egonw> same goes for those second nature meetings :(

12:36 <@walkerma> 2nd life?

12:36 <+egonw> walkerma: maybe alternate every week

12:36 <+egonw> right... see that nature blog

12:36 <@walkerma> It's always much better if you can keep the same time

12:36 <+egonw> walkerma: true

12:36 <@walkerma> Otherwise people show up 45 mins late....!

12:36 <+egonw> :)

12:37 <+egonw> had 17 in mind... apparently already shifted that to my time zone... so did that twice

12:37 <@walkerma> Well, I think we should keep the same time, at least for next week. I'll poll people about other poss times. Henry Rzepa often has a seminar at this time

12:37 <+Rifleman_82> don't feel bad... i screwed up the timezones and gave walkerma a lot of trouble for our first meeting too

12:37 <+Rifleman_82> :P

12:37 <@walkerma> OK, I'd better go

12:37 <+Rifleman_82> ok

12:38 <+Rifleman_82> good night!

12:38 <@walkerma> Bye!

12:38 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has quit ["ChatZilla 0.9.80 [Firefox 2.0.0.11/2007112718]"]

--- Log closed Wed Jan 29 12:38:58 2008