Wikipedia:WikiProject Chemistry/IRC discussions/29 Jan 2008
From Wikipedia, the free encyclopedia
--- Log opened Tue Jan 29 11:04:02 EST 2008
11:04 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has joined #wikichem
11:04 <+Rifleman_82> All Rise
11:04 <+Rifleman_82> :)
11:04 <ChemSpiderMan> hi
11:04 -!- mode/#wikichem [+o Rifleman_82] by ChanServ
11:05 -!- mode/#wikichem [+oo ChemSpiderMan walkerma] by Rifleman_82
11:05 <@walkerma> Hi, sorry I'm a couple of minutes late, we had some freezing rain
11:05 -!- mode/#wikichem [-o Rifleman_82] by Rifleman_82
11:05 <+Rifleman_82> i think today there'll be less people present
11:05 <+Rifleman_82> walkerma: would you like to begin?
11:05 <+Rifleman_82> dmacks is probably away, but he's logging
11:05 <@walkerma> Should we try and get PC?
11:06 <+Rifleman_82> he's not on irc
11:07 <@walkerma> I just sent him a quick email, hopefully he'll come. He & I talked a bit with DMacks last week on IRC informally
11:07 <@walkerma> OK, let's start?
11:07 <+Rifleman_82> ok
11:07 <@ChemSpiderMan> ok
11:07 <@walkerma> "How can we handle structural identifiers such as InChIs and SMILES properly? These are designed for machine-reading, but people may often use our visible info to "copy and paste" into a search engine."
11:08 <@walkerma> Hopefully you had a chance to look over the survey I did
11:08 <+Rifleman_82> results are quite disappointing
11:08 <@walkerma> In what way?
11:09 <+Rifleman_82> finding a primary key is tough, the standards (SMILES & INCHI)aren't really that standard
11:09 <@walkerma> Oh, yes!
11:09 <+Rifleman_82> i agree with the point too that we can't really hope to replace the CRC handbook
11:09 <+Rifleman_82> we can cover all the important compounds, but not the more obscure
11:09 <+Rifleman_82> which i guess, is fair enough in that they probably aren't all that encyclopedic
11:10 <@ChemSpiderMan> there are a lot on WP that are FAR from enyclopedic...
11:10 <@ChemSpiderMan> in fact some I can ONLY find on WP
11:10 <@walkerma> I don't know that CRC has too many obscure ones
11:10 <@ChemSpiderMan> it doesn't
11:10 <+Rifleman_82> http://en.wikipedia.org/wiki/2-Phenylhexane
11:10 <@walkerma> But we sometimes have "hot" new compounds
11:11 <@walkerma> Phenylhexane not being an example of that!
11:11 <+Rifleman_82> i think the best we can hope for is for us to complement, not replace CRC
11:11 <@walkerma> That's fine!
11:11 <+Rifleman_82> a quick ref, instead of digging out a hefty tome
11:11 <+Rifleman_82> walkerma: elaborate on phenylhexane?
11:11 <+Rifleman_82> a quick, *reliable* ref
11:12 <@walkerma> In case people don't know, it was nominated for deletion. I tried to find some uses for it, but couldn't
11:12 <+Rifleman_82> it seems a common catalysis target
11:12 <@walkerma> I only found lots of studies on FC alkylation of benzene, reporting ratios
11:12 <+Rifleman_82> i fired up scifinder and copied a few refs for the catalysts... that's all
11:13 <@walkerma> But I don't think it has any major applications
11:13 <+Rifleman_82> there are more, but i plucked the low-lying fruit, those where the catalysts are stated instead of being listed as C:asdlfkajsd;
11:13 <@ChemSpiderMan> That's true for a lot of the PIKHAL collection I thinl
11:13 <+Rifleman_82> what about pikhal?
11:13 <@walkerma> I suspect it was someone testing out
11:14 <@walkerma> how to write a compound article
11:14 <+Rifleman_82> hexylbenzene?
11:14 <+Rifleman_82> erm, phenylhexane/
11:14 <@ChemSpiderMan> there are dozens of compiunds from PIKHAL...it's a series of compounds all from the same publication but do they all need to be in there? What value?
11:15 <+Rifleman_82> abused substances/
11:15 <+Rifleman_82> ?
11:15 <@walkerma> Because things from PIKHAL are inherently fascinating to a certain group of people
11:16 <@walkerma> Anyway, I think I'd like to get back to InChIs etc, is that OK?
11:16 <@ChemSpiderMan> sure
11:16 <@walkerma> My impression from the responses was that:
11:16 <@walkerma> (a) Sometimes people do want to copy/paste from WP but
11:17 <@walkerma> (b) They say they don't need to SEE the InChI or whatever explicitly on the main article page
11:17 <@ChemSpiderMan> that includes my view
11:17 <@walkerma> (c) But they would very much like the links to be just a click away
11:17 <@walkerma> So how should we best do that?
11:18 <+Rifleman_82> linkfarm?
11:18 <@walkerma> Explain
11:18 <+Rifleman_82> click out to a linkfarm like special:booksources
11:18 <+Rifleman_82> click on any ISBN (number) and it jumps to a list of possible sources
11:19 <@walkerma> Would that linkfarm have (say) all the InChIs for all of the organic compounds in our collection?
11:19 <@ChemSpiderMan> that's if they want to search...not if they want to copy
11:19 <+Rifleman_82> the linkfarm could include chemspider, emolecules, or anything else which accepts smiles/inchi search strings
11:20 <@ChemSpiderMan> that has value but is different from copying the InChI to paste into a converter to generate the structure
11:20 <@ChemSpiderMan> same thing as with SMILEs...you can lead them to a link farm but "I" copied SMILES to generate structures
11:20 <+Rifleman_82> oh, i do that quite often, copy smiles and inchi to chemsketch to generate a structure
11:20 <+Rifleman_82> if you want that, it's gotta be visible, or at least 1 click away
11:21 <+Rifleman_82> visible is fine, i like
International Chemical Identifier | |
---|---|
InChI= | |
InChIKey= | |
CASRN= | |
PIN= |
11:21 <@walkerma> I think what one respondent suggested was an excellent idea - if it can be done. You have the word InChI in the box, then if you click on that it brings up a Google search (or something like that) for that InChI.
11:22 <@walkerma> The click puts the actual InChI (sorry, InChIKey) into the search engine for you
11:23 <@ChemSpiderMan> I don't like shwing InChi...
11:23 <@ChemSpiderMan> Erythromycin: InChI=1/C37H67NO13/c1-14-25-37(10,45)30(41)20(4)27(39)18(2)16-35(8,44)32(51-34-28(40)24(38(11)12)15-19(3)47-34)21(5)29(22(6)33(43)49-25)50-26-17-36(9,46-13)31(42)23(7)48-26/h18-26,28-32,34,40-42,44-45H,14-17H2,1-13H3/t18-,19-,20+,21+,22-,23+,24+,25-,26+,28-,29+,30-,31+,32-,34+,35-,36-,37-/m1/s1
11:23 <+Rifleman_82> what is inchikey? is it a hash function of the inchi?
11:23 <@walkerma> You wouldn't see the InChI on the page at all
11:23 <@ChemSpiderMan> yes
11:23 <@ChemSpiderMan> hash ...cannot be converted to structure
11:23 <@ChemSpiderMan> has to be used to lookup
11:24 <@walkerma> You would only see things like InChI when you clicked on the word "InChI in the ChemBox.
11:24 <+Rifleman_82> can we talk about how inchis are not unique?
11:24 <+Rifleman_82> are they or are they not unique?
11:24 <@walkerma> In a minute? Is that OK?
11:24 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html
11:24 <@walkerma> I'd like to resolve the display problem first
11:25 <@ChemSpiderMan> I like your approach Martin
11:25 <@walkerma> Can it be done, Beetstra?
11:25 <@ChemSpiderMan> see InChI ONLY whecn clicking on "InChI in the CHembox"
11:26 <@walkerma> Is Beetstra awake?
11:26 <+Rifleman_82> don't think so
11:27 * Beetstra awakes a bit
11:28 <@walkerma> While he looks over things - let me mention one of the alternatives:
International Chemical Identifier | |
---|---|
InChI= | |
InChIKey= | |
CASRN= | |
PIN= |
11:28 <@walkerma> This is what PC wrote - clever code, but requires a separate "data box" at the bottom of the page
11:28 <+Beetstra> Ah, the linkfarm-solution. Yes, that can be done, I have written a wikipedia extension once .. but it has never reached application, they are not happy with those pages for the 'smaller' things
11:28 <@walkerma> See http://en.wikipedia.org/wiki/Tributylphosphine
11:29 <@walkerma> It seems to me we have three options on the table:
11:29 <@walkerma> 1. The linkfarm idea
11:29 <+Rifleman_82> oh, that's ugly, the
International Chemical Identifier | |
---|---|
InChI= | |
InChIKey= | |
CASRN= | |
PIN= |
method. results look great, but too much work!
11:29 <@walkerma> 2. The "click to see or search on InChI" idea
11:30 <@walkerma> 3. The
International Chemical Identifier | |
---|---|
InChI= | |
InChIKey= | |
CASRN= | |
PIN= |
approach
11:30 <@walkerma> I wanted Dirk to comment on the tech feasibility of #2
11:31 <+Beetstra> Searching on that InChI is difficult .. seeing, maybe
11:31 <+Beetstra> nah, that is going to be the same problem I think
11:31 <@ChemSpiderMan> why is searching on it difficult?
11:31 <+Beetstra> you would have to feed the inchi to the new page .. which would be similar to option 1
11:32 <+Beetstra> Otherwise it should just be incorporated in a search link .. www.google.com/ ..
11:32 <+Rifleman_82> we can have #1 and #2 at the same time
11:32 <+Beetstra> And you can make that show something else
11:32 <+Beetstra> Not without developers
11:32 <+Rifleman_82> InChI show search
11:32 <@walkerma> Yes, that would be great!
11:32 <+Beetstra> Yes, something like that
11:32 <@walkerma> Can we do it?
11:32 <+Beetstra> but where does the search have to point to?
11:33 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html
11:33 <+Rifleman_82> we can either have the Special:ChemSearch parallel to Special:Booksources (yes, there are prolems), or we can set up our own link farm?
11:33 <@ChemSpiderMan> this is a google search from the InChIKey
11:33 <+Beetstra> no, show does not work, that needs the special page
11:33 <+Beetstra> I did it with an own linkfarm, like special:booksources does
11:33 <@ChemSpiderMan> can do the same with the InChIString BUT know that google will COMMONLY fail you
11:34 <+Rifleman_82> dmoz
11:34 <@walkerma> So apparently ChemSpider offers this type of one-click Google search on the InChIKey, right?
11:35 <@ChemSpiderMan> yes
11:35 <@ChemSpiderMan> and on the InChI string
11:35 <+Rifleman_82> can we convert InChI to InChIkeys on the fly using wikipedia?
11:35 <+Rifleman_82> can we code a template?
11:35 <@ChemSpiderMan> Notice this comment about InChI string....read the last line on this "The condensed, 25 character InChIKey is a hashed version of the full InChI (using the SHA-256 algorithm), designed to allow for easy web searches of chemical compounds.[2] Most chemical structures on the Web up to 2007 have been represented as GIF files, which are not searchable for chemical content. The full InChI turned out to be too lengthy for easy sear
11:35 <@ChemSpiderMan> read the last line...
11:36 <@ChemSpiderMan> searching InChiString is problematic...the indexers BREAK it
11:36 <@ChemSpiderMan> generating InChis on the fly is VERY fast...but you will have to pass a connection table from WP to the InChI DLL
11:36 <@ChemSpiderMan> so you will need to STORE the connection table
11:36 <@ChemSpiderMan> or pass the SMILES to the INCHI DLL or Openababel DLL
11:37 <@ChemSpiderMan> blah, blah, blah..
11:37 <+Rifleman_82> walkerma?
11:38 <@walkerma> I can't comment on the techniical feasibility. But the benefits are immense if we can get it to work
11:38 <@walkerma> If we can do this, and ChemSpider is doing it, then people will be able to start using Google and actually finding things
11:39 <@walkerma> So you can start to say, "I want to see if there is any info on hexylbenzene"
11:39 <@walkerma> And you can search by InChIKey (I think these will become the de facto standard for organics for online searches)
11:39 <@ChemSpiderMan> why don't we simply set up a webservice on ChemSPider
11:40 <+Rifleman_82> can inchikeys be a permanent replacement for inchis ? such that we can think of structure --> inchikey, skipping inchi directly?
11:40 <@walkerma> Explain
11:40 <@ChemSpiderMan> WP can hit the appropriate search button and pass a CSID over to ChemSPider to spawn the search.
11:40 <@ChemSpiderMan> But this is making WP dependent on CS and I don't think you should.
11:41 <@ChemSpiderMan> InCHIs will be around for a long time...
11:41 <@ChemSpiderMan> no guarantee that CS will be
11:41 <@ChemSpiderMan> We have a whole of web services for InChI already
11:41 <@ChemSpiderMan> http://www.chemspider.com/InChI.asmx
11:41 <+Rifleman_82> we have pubchemid linking to pubchem, your chemspider id can be another... but we might have some complaints about conflict of interest
11:42 <@ChemSpiderMan> I'll guarantee that
11:42 -!- egonw [n=egonw@kokosnoot.wur.nl] has joined #wikichem
11:42 -!- mode/#wikichem [+v egonw] by ChanServ
11:42 <+Rifleman_82> hi egon
11:42 <+egonw> hi
11:42 <+egonw> I made it :)
11:42 <@walkerma> Great!
11:42 <+egonw> hi ChemSpiderMan!
11:42 <@ChemSpiderMan> hi
11:42 <+egonw> hi walkerma
11:43 <+egonw> 15 minutes, right?
11:43 <+Rifleman_82> no, you're 45 minutes late :P
11:43 <+egonw> no?!
11:43 <+egonw> really?
11:43 <+egonw> 17:00 UTC, not?
11:43 <@walkerma> Yes, sorry! 1600h UTC
11:43 <+CheMoBot> user:Cherry blossom tree has edited monitored page Wikipedia talk:WikiProject Chemistry - diff - (+880)- summary: /* Reminder of the Philip Greenspun Illustration project */ new section
11:44 <+Rifleman_82> you missed the beginning of our discussion
11:45 <+Rifleman_82> we talked about external commenters' comments about whether they use wikipedia, whether they need the inchis to be shown
11:45 <+egonw> ok, is the channel logged? then I'll read up...
11:45 <+Rifleman_82> yes it is, dmacks will publish later
11:45 <@walkerma> I just sent you a log
11:45 <+egonw> thanx
11:45 <@walkerma> Read your email
11:46 <@walkerma> We're wondering how we can best give people access to info like InChIs on WP, without cluttering up pages
11:46 <@walkerma> and causing display problems
11:47 <@walkerma> Main ideas on the table:
11:47 <@walkerma> 1. The linkfarm idea
11:47 <@walkerma> 2. The "click to see or search on InChI" idea
11:47 <@walkerma> 3. The
International Chemical Identifier | |
---|---|
InChI= | |
InChIKey= | |
CASRN= | |
PIN= |
approach
11:48 <@walkerma> So Rifleman_82, which is the best option that is workable? And how should we proceed?
11:50 <+Rifleman_82> i'll give a conditional answer, because there are some issues i think need to be worked out
11:50 <+Rifleman_82> imho, i think it is best that the inchi be displayed, with a search link
11:51 <+Rifleman_82> if we treat inchi as plain text, and get it to break every 20 chars or so, we can avoid having it stretch across the screen
11:51 <+Rifleman_82> they will be soft breaks, not hard
breaks
11:51 <+Rifleman_82> soft break = "text wrapping"
11:51 <+Rifleman_82> i don't think we should hide them
11:51 <+Rifleman_82> and i don't think it'll be easy to have a "show" to show the thing
11:51 <+egonw> there would be a start and end 'codon
11:51 <+egonw> '?
11:52 <+Rifleman_82> and if we can let it "show" without breaking the page, we can have it shown by default
11:52 <+Rifleman_82> am i making sense?
11:52 <+Rifleman_82> sheesh
11:52 <@walkerma> Yes, though I don't know how to do soft line breaks
11:53 <+egonw> a space?
11:53 <+Rifleman_82> perhaps you can give me til our next meeting to find out? ii'm sure it can be done, just a matter of how...
11:53 <+Rifleman_82> no, a space will break the string
11:53 <+Rifleman_82> render it unsearchable
11:53 <+Rifleman_82> it is conceptually identical to a manually added
11:53 <+egonw> right
11:53 <+Rifleman_82> the other problem is presentation - arbitrary spaces will look odd when the browser windows are non-identically sized
11:54 <@walkerma> It must be doable, because if you use
11:54 <+Rifleman_82> they usually wrap on existing spaces?
11:54 <@walkerma> I think....
11:54 <+egonw> I think so too
11:55 <+egonw> played with CSS to do this kind of time, but never found something satifying
11:56 <@walkerma> So for action, can Rifleman_82 agree to look into that?
11:56 <+Rifleman_82> yeah, i'll look into that
11:56 <+Rifleman_82> as a side issue
11:56 <+Rifleman_82> if we can use this method to make long IUPAC names break nicely, it will be an added bonus
11:56 <@walkerma> And Beetstra, do you have contacts we could ask about link farms, or do you think that idea is dead in the water?
11:57 <+Rifleman_82> break at hyphens preferably, break arbitrarily at 20 characters if if need be
11:57 <@walkerma> Sounds good, R82
11:58 <+Beetstra> Walkerma, it runs on chemistry.poolspares.com (don't ask about the domain, it is a pure test-wiki; and spam bots have found it already) ..
11:58 <+Rifleman_82> robots.txt?
11:58 <+Beetstra> No, they just scan for wikis and edit it
11:59 <+Beetstra> I should have blocked everything but administrators ..
11:59 <+Beetstra> See the chemical sources from the main page
11:59 <+Beetstra> and one of the example pages (water e.g.)
12:00 <+Beetstra> in the chembox are some links, like name, formula
12:01 <+Beetstra> But the developers were not really in to such special pages .. then there will be more etc.
12:02 <+Rifleman_82> heh, or we could migrate to poolspares :)
12:02 <@walkerma> OK, I will talk to my friend Kelson (Emmanuel) from the French WP, he's a full time developer
12:03 <@walkerma> I'll see if he can come up with a solution
12:03 <@walkerma> Either 1, 2 or 3!
12:03 <@walkerma> But in the meantime, the soft line breaks would make things much better...
12:04 <@walkerma> Can we move on in the agenda?
12:04 * Beetstra has to leave .. see you all later!
12:04 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit ["Bye Bye"]
12:04 <@walkerma> OK, thanks
12:04 <@ChemSpiderMan> bye Drik
12:04 <@ChemSpiderMan> Dirk
12:04 <@walkerma> I hope we can finish fairly soon
12:05 <@walkerma> But I'd like to ask if we can "officially" adopt InChIKeys
12:06 <@walkerma> i.e., we'd agree to support them as a standard format
12:06 <@walkerma> for representing structures
12:06 <@walkerma> I accept that full InChIs are already somewhat established
12:06 <+egonw> sounds good to me
12:06 <+Rifleman_82> are they unique?
12:06 <@walkerma> but I think for a quick copy, paste and search they are the way to go, and much better than SMILES
12:07 <+Rifleman_82> if they are, they can be primary key?
12:07 <+egonw> theoretically not, practically no clashes have been found
12:07 <+Rifleman_82> walkerma's sandbox has complaints of inchis not being unique
12:07 <+Rifleman_82> andsince inchikeys are hashed from inchis...
12:07 <+egonw> inchis *are* unique
12:07 <@ChemSpiderMan> read the InCHi article on WP
12:07 <@walkerma> The chance of a clash has been estimated as 1 in <<total no. of molecules known
12:07 <+egonw> at least when the stereochemistry layer is taking into account
12:08 <+Rifleman_82> rather, can there be more than one inchi for the same molecule?
12:08 <@walkerma> The comment on my Sandbox refers to the fact that
12:08 <+egonw> no
12:08 <@ChemSpiderMan> but the probability for duplication of only the first 14 characters has been estimated as only one duplication in 75 databases each containing one billion unique structures.
12:08 <@ChemSpiderMan> there are no collisions in Pubhem or CHemspider
12:08 <@walkerma> you can get a different InChI if you draw the stereochemistry differently
12:08 <@ChemSpiderMan> probability is SOOOOOOO low
12:08 <@ChemSpiderMan> yes
12:08 <+Rifleman_82> i see, i was quite concerned about http://en.wikipedia.org/wiki/User:Walkerma/Sandbox5#Comments_1
12:08 <+egonw> walkerma: right, because those molecules are different
12:08 <@ChemSpiderMan> different InChI
12:08 <@walkerma> The R is different from the S is different from the unspecified
12:09 <+Rifleman_82> but if there are no issues, then that's fine
12:09 <@ChemSpiderMan> no issues
12:09 <@walkerma> And I really like the way ChemSpider uses InChIKeys
12:09 <@ChemSpiderMan> do not worry about clashes
12:09 <@walkerma> to do "related compounds" searches
12:09 <@ChemSpiderMan> There are about 5000-6000 structures on WP
12:10 <+Rifleman_82> is there a way to remove stereochemistry from inchikeys?
12:10 <@ChemSpiderMan> yes
12:10 <@ChemSpiderMan> It's the second layer
12:10 <+Rifleman_82> or to use inchi/inchikeys to do substructure searches?
12:10 <+Rifleman_82> similarity searches?
12:10 <+Rifleman_82> oh, i got it, it's the second half of the inchikey?
12:10 <+egonw> no
12:10 <+egonw> inchi(-leus) cannot be used directly for subscructure searching
12:10 <+egonw> leus==keys
12:11 <@ChemSpiderMan> Third time...please read quickly
12:11 <@ChemSpiderMan> http://www.chemspider.com/news/searching-inchikeys-by-connectivities-only-with-and-without-stereo.html
12:11 <+Rifleman_82> you'll have to get some other program to take it, expand it, and search, and pass the search results back?
12:11 <+Rifleman_82> sorry tony, i'll go read
12:11 <@ChemSpiderMan> No...InChIkeys can only be looked up...CANNOT be reversed
12:11 <@ChemSpiderMan> It's a hash...cannot reverse it
12:11 <@walkerma> But you can use the ChemSpider lookup table for 20 million compouds, right?
12:12 <@ChemSpiderMan> absolutely
12:12 <+Rifleman_82> it's a static table, isn't it?
12:12 <@walkerma> http://www.chemspider.com/InChI.asmx?op=InChIKeyToInChI
12:12 <+egonw> hope not :)
12:12 <+Rifleman_82> and it can be defined for any arbitrary compound... whether or not it exists
12:12 <+egonw> I hope it keeps growing
12:12 <@ChemSpiderMan> everYDAY
12:13 <@walkerma> Yes, R82, it can
12:13 <@walkerma> For anything drawable
12:13 <@ChemSpiderMan> http://www.chemspider.com/InChI.asmx?op=InChIKeyToInChI is a search.....
12:13 <@ChemSpiderMan> it's NOT a conversion
12:13 <+Rifleman_82> got it, tony
12:13 <+egonw> Rifleman_82: any organic molecule at least
12:13 <@ChemSpiderMan> Internally searches InChIKey against ChemSpider database (>17M unique compounds). Returns empty string in case of failure.
12:13 <+Rifleman_82> though System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
12:13 <@ChemSpiderMan> :-)
12:13 <+Rifleman_82> tried to search morphine
12:14 <+Rifleman_82> okay let's not discuss my problems now
12:14 <+Rifleman_82> :P
12:14 <@ChemSpiderMan> note Egon's comments...
12:14 <+Rifleman_82> ok
12:14 <@ChemSpiderMan> must be able to generate an InChI...
12:14 <@ChemSpiderMan> many things cannot generate InChIs
12:14 <@ChemSpiderMan> markush, polymers, organometallics, inorganics ...there are exceptions of course
12:14 <+Rifleman_82> can inchi/keys be able to handle inorganics, organometallics, abnormal oxidation states, abnormal coordination modes?
12:15 <+egonw> generally not
12:15 <@ChemSpiderMan> there are lots of things it can handle and InChI itself is being extended by the InChI team
12:15 <+Rifleman_82> so it's a matter of time?
12:15 <@ChemSpiderMan> but today it's primarily for organic molecules
12:15 <@ChemSpiderMan> yes
12:15 <+Rifleman_82> ok
12:15 <@ChemSpiderMan> They ARE working on extending it
12:15 <+egonw> but anything which involves C,H,N,O,P,S,Si, ... it handles just fine
12:15 <@ChemSpiderMan> But it's the same issue I have....
12:15 <+Rifleman_82> octahedral carbon? :)
12:16 <@ChemSpiderMan> I am creating MOLFILES...
12:16 <+egonw> octahedral carbon??
12:16 <@ChemSpiderMan> and SDF
12:16 <+Rifleman_82> yeah, carbon with 6 ligands
12:16 <@ChemSpiderMan> and many things on WP cannot be captured that way
12:16 <+egonw> WP entry?
12:16 <+Rifleman_82> i'll drop you a note when i find it
12:16 <+egonw> you'r not refering to some transition state?
12:16 <+Rifleman_82> no, no
12:17 <@ChemSpiderMan> InChI=1/CH12N6/c2-1(3,4,5,6)7/h2-7H2
12:17 <+Rifleman_82> my friend was telling me that togni made some weird carbon possessing an abnormal structure, carbon center with octahedral symmetry
12:17 <+Rifleman_82> i'll send you the paper whne i find it
12:17 <@ChemSpiderMan> This is a carbon with 6 NH2 groups
12:17 <@ChemSpiderMan> and the InCHI key
12:17 <+Rifleman_82> wow, that's nice :)
12:17 <@ChemSpiderMan> QHARIOGGLZILKP-UHFFFAOYAR
12:18 <+egonw> oi, that's something my atom typing is going to fail on :)
12:18 <+Rifleman_82> inchi does not attempt to assume valence, like smiles does?
12:18 <+egonw> to some extend it is... e.g. for determining tautomerism etc
12:18 <+Rifleman_82> ok
12:19 <@ChemSpiderMan> my suggestion...let's NOT analyze InChI here...that's been done in many other places
12:19 <+egonw> :)
12:19 <+egonw> agreed, it's the best out there, and should be used whenever possible
12:19 <+Rifleman_82> ok
12:20 <@walkerma> OK, I think I'll post something on the wiki about InChIKeys, and we can go with the consensus from there
12:20 <@ChemSpiderMan> I suggest focusing on the issue at hand of whether to adopt it for structures where we can generate it
12:20 <@ChemSpiderMan> and I say yes
12:20 <+egonw> same here
12:20 <@walkerma> I'd like to ask one final thing
12:20 <@ChemSpiderMan> ok
12:21 <@walkerma> Can we convert ChemSpiderMan's SDF file into wiki syntax easily?
12:21 <@walkerma> In other words
12:21 <@walkerma> Can we upload the data from his validated file onto the wiki semi-automatically?
12:21 <@ChemSpiderMan> Rifleman,,,It's a text file
12:21 <@walkerma> If so, how will it be done?
12:21 <@ChemSpiderMan> Easy to separate each structue in the file.
12:21 <@walkerma> Will we need a bot?
12:22 <@ChemSpiderMan> I have the article name for each record so you CAN link it and stream in the text
12:22 <@ChemSpiderMan> for each record
12:22 <@ChemSpiderMan> I can name the fields for you
12:22 <+Rifleman_82> i don't think chem-awb can do it
12:22 <@walkerma> ok
12:22 <@ChemSpiderMan> SMILES, Name, InCHIString, InCHIkey etc
12:22 <+Rifleman_82> you can pay grad students to be data entry clerks... 10 cents for each article...? :)
12:23 <@ChemSpiderMan> Egon will get how easy this is "theoretically"
12:23 <@ChemSpiderMan> trascription errors..
12:23 <@ChemSpiderMan> let's NOT have someone type it in....for 5000 articles
12:23 <@ChemSpiderMan> one slipped key :-(
12:23 <+Rifleman_82> haha
12:23 <+Rifleman_82> and you pay another grad student 10 cents per article to check them :P
12:24 <+egonw> a bot sounds like the way to go
12:24 <+Rifleman_82> LOL
12:24 <+egonw> however... that assumes ChemSpiderMan has the mapping...
12:24 <+egonw> also, consider overwriting fields...
12:24 <+Rifleman_82> if we're going to do this on a regular basis, perhaps we can write a dedicated maintenance bot
12:24 <@ChemSpiderMan> I can provide names to map...this is easy from my side (I think)
12:24 <@ChemSpiderMan> yes....would be ideal...
12:24 <+Rifleman_82> which runs through the entire list every now and then to check?
12:25 <+Rifleman_82> apart from ideas i'm not sure how to do it
12:25 <+egonw> " I can provide names to map"
12:25 <+egonw> based on compound names?
12:25 <@ChemSpiderMan> provided things have a ChemBox available this is easy I think
12:25 <@walkerma> Sounds good to me
12:25 <@ChemSpiderMan> We take the ChemBox fields and I map the names in my SDF to the ChemBox names
12:25 <+egonw> I'd have the bot cross check against other identifiers available in the ChemBox
12:25 <+egonw> like PubChem ID
12:26 <@ChemSpiderMan> Primary key is article name
12:26 <+egonw> and only if consistent add it, otherwise reports as 'difficult case'
12:26 <@ChemSpiderMan> it's one of the fields in the SDF for each structure
12:26 <+egonw> ChemSpiderMan: ok, so you do got a "inchi <-> WP page" map
12:26 <+egonw> 1:1 that is
12:26 <@ChemSpiderMan> I am NOT copying PubChem IDs into my file...I am not chekcing them
12:26 <@ChemSpiderMan> hold on..
12:27 <@ChemSpiderMan> I have a field in the SDF called URL
12:27 <@ChemSpiderMan> This is one of them
12:27 <@ChemSpiderMan> http://en.wikipedia.org/wiki/5-MeO-DET
12:27 <+egonw> that map is curated? if so, that could go in automatically...
12:27 <@ChemSpiderMan> EXACTLY!!!
12:27 <@ChemSpiderMan> it's a breeze I believ
12:27 <@ChemSpiderMan> believe
12:27 <+egonw> I'd still cross check with info already present in the ChemBox
12:28 <@ChemSpiderMan> yes...for more checking
12:28 <+Rifleman_82> ChemSpiderMan: you want to tell us about the problems from the last 10 or 50?
12:28 <@ChemSpiderMan> :-( must I? I've sent the files and they are annotated to the maximum with the issues
12:28 <@ChemSpiderMan> I have stepped out of a meeting to do this...
12:28 <@ChemSpiderMan> I need to get back
12:28 <+Rifleman_82> ok
12:29 <+Rifleman_82> next time then
12:29 <@walkerma> The main issue is with CAS, right? I think that will be the agenda for another meeting
12:29 <@ChemSpiderMan> the big issue is the one I suggested...CAS NUMBERS
12:29 <@ChemSpiderMan> I think 3/7 idd not agree with the structure drawn
12:29 <+egonw> CAS numbers are intrinsically difficult
12:29 <@ChemSpiderMan> so what's wrong...the structure or the CAS?
12:29 <@ChemSpiderMan> yes...I agree Egon...they need validating..
12:29 <@ChemSpiderMan> and what actually needs to be done
12:29 <+egonw> and would not suggest checking against CAS in the ChemBox
12:30 <@ChemSpiderMan> is the structure needs to be searched in the registry to FIND the right CAS number
12:30 <+egonw> we don't have access to the CAS database, so can't even (formally) validate those
12:30 <@walkerma> Yes
12:30 <@ChemSpiderMan> exactly.
12:30 <@walkerma> OK, shall we agree to meet next week - same time, same place, to talk about CAS nos?
12:30 <@ChemSpiderMan> ok
12:30 <+Rifleman_82> sure
12:30 <+Rifleman_82> quick one before we finish?
12:30 <@walkerma> OK
12:30 <+Rifleman_82> request, more than antyhign else
12:30 <@ChemSpiderMan> ok
12:30 <+Rifleman_82> take a look at http://en.wikipedia.org/wiki/User:Rifleman_82/Functional_groups_style_guidelines and edit as you see fit!
12:31 <+Rifleman_82> once it's finalized i'll start making navboxes and rewriting articles to fit
12:31 <+Rifleman_82> i mean, that is if you guys agree with the hwole idea in the first place
12:31 <+egonw> one quick comment
12:31 <@walkerma> OK, I was planning on giving feedback on that today
12:31 <+egonw> functional groups down there -> azo compound
12:32 <+egonw> please be careful about mixing up 'contains the functional group' and 'compound classes'
12:32 <@walkerma> Yes, I was going to say the same thing
12:32 <+Rifleman_82> it started off as functional groups
12:32 <@walkerma> (I'm really bad at mixing them up myself!)
12:32 <+Rifleman_82> and i stole the functional groups template
12:32 <+egonw> in the sense that: don't mix them up
12:32 <+Rifleman_82> but i thought about it and i think we can include more than that, which i spelled out in the scope
12:32 <@walkerma> Hydroxy= functional group, Alcohol = family
12:32 <+Rifleman_82> at the top
12:33 <+egonw> acohol is actually both :)
12:33 <+Rifleman_82> amine too
12:33 <+Rifleman_82> amine group, amine...ssss
12:33 <+egonw> but azo compound certainly not
12:33 <@walkerma> OK, shows how confused I am...! My text says otherwise...
12:33 <+egonw> and alkane is typically not considered a functional group
12:33 <+egonw> but more like the 'unfuntional' part of the molecule
12:33 <+Rifleman_82> haha
12:34 <+Rifleman_82> true true
12:34 <@ChemSpiderMan> gotta go guys
12:34 <@ChemSpiderMan> bye
12:34 <@walkerma> Yes, byt
12:34 <+egonw> bye ChemSpiderMan
12:34 <@walkerma> bye
12:34 <+Rifleman_82> nite tony
12:34 <+egonw> nice chatting with you for once (instead of blogging :)
12:34 -!- ChemSpiderMan [n=ChemSpid@c-68-33-151-242.hsd1.md.comcast.net] has left #wikichem []
12:34 <@walkerma> See people next week! Thanks for coming, Egon
12:34 <+egonw> it's a really horrible time for me to attend
12:34 <+egonw> both 16 and 17 UTC
12:34 <+Rifleman_82> what time is it for you, egonw?
12:35 <+egonw> I'm UTC+1
12:35 <+Rifleman_82> close to the end of office hours/
12:35 <+egonw> so dinner time, and/or being-in-traffic time
12:35 <+egonw> just after it, actually...
12:35 <+Rifleman_82> walkerma: tomorrow's my last day at work, so next week onward i can stay much later... would that help anyone?
12:35 <@walkerma> I could perhaps meet one hour earlier - but we have to be careful, many people in western US then would get excluded
12:35 <+egonw> same goes for those second nature meetings :(
12:36 <@walkerma> 2nd life?
12:36 <+egonw> walkerma: maybe alternate every week
12:36 <+egonw> right... see that nature blog
12:36 <@walkerma> It's always much better if you can keep the same time
12:36 <+egonw> walkerma: true
12:36 <@walkerma> Otherwise people show up 45 mins late....!
12:36 <+egonw> :)
12:37 <+egonw> had 17 in mind... apparently already shifted that to my time zone... so did that twice
12:37 <@walkerma> Well, I think we should keep the same time, at least for next week. I'll poll people about other poss times. Henry Rzepa often has a seminar at this time
12:37 <+Rifleman_82> don't feel bad... i screwed up the timezones and gave walkerma a lot of trouble for our first meeting too
12:37 <+Rifleman_82> :P
12:37 <@walkerma> OK, I'd better go
12:37 <+Rifleman_82> ok
12:38 <+Rifleman_82> good night!
12:38 <@walkerma> Bye!
12:38 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has quit ["ChatZilla 0.9.80 [Firefox 2.0.0.11/2007112718]"]
--- Log closed Wed Jan 29 12:38:58 2008