Wikipedia:WikiProject Chemistry/IRC discussions/15 Apr 2008

From Wikipedia, the free encyclopedia

--- Log opened Tue Apr 15 11:59:59 EDT 2008

12:00 <+Physchim62> OK, it's 1600 UTC

12:01 <+Beetstra> I think that is still a minute away .. :-p

12:02 <+Beetstra> Oops .. my clock was 3 minutes wrong ..

12:02 <+Physchim62> either your watch is wrong or the Spanish time signal is two minutes *in advance*

12:03 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has joined #wikichem

12:03 <+Physchim62> ah, Martin!

12:03 <walkerma> Hi! I can't stay long but I wanted to quickly update you

12:03 <+Physchim62> you have the floor

12:03 <walkerma> Thanks!

12:04 <walkerma> I just had a phone call with CAS (about 5 minutes ago), and the bad news is that it will be hard for them to check our individual substances

12:05 <+Physchim62> and the good news?

12:05 <walkerma> The good news is that they will give us their own collection of 5000-1000 substances that THEY have determined to be "high traffic substances"

12:05 <+Physchim62> I'm annoyed

12:05 <+Physchim62> but carry on

12:06 <walkerma> They have a collection which they have found gets lots of hits in their own database

12:06 <+Physchim62> unacceptable

12:06 <walkerma> which they believe to be substances of general public interet

12:06 <walkerma> interest

12:06 <walkerma> That seems like a good idea for two reasons:

12:07 <walkerma> (wait please PC!)

12:07 * Physchim62 slpas the Chair with a Large Wet Haddock

12:07 <+Physchim62> slaps even

12:07 <walkerma> 1. They have a concern - as raised by Egon - that someone will create WP pages just to get validated CAS #s into the public domain

12:08 <walkerma> This approach ensures that obscure compounds don't creep in

12:08 <walkerma> 2. Perhaps the most important point - they can give us such a collection in the next day or two

12:08 <+Physchim62> people already do that

12:09 <+Physchim62> I propose that CAS publish the list off their own website

12:10 <walkerma> I'm going to have to go in a minute. Whoever is logging - please can you hold off on publishing this, until we have received the file? They have been swamped with people contacting them since the announcement, and we want to keep things under control

12:10 <+Physchim62> all the machinary that we have put in place for CAS verification can be used to compare the numbers with articles, that is not a problem

12:10 <walkerma> PC: The reality is that it would be a LOT of work for them to slog through our list, at least that's what they are telling us

12:10 <+Physchim62> I will hold off publishing my log, yes, of course

12:11 <+Beetstra> Indeed .. this sucks

12:11 <walkerma> And my guess is that we can get most of our collection validated very quickly

12:11 <+Beetstra> So we have 1000 reliable numbers .. and 6000 non-reliable numbers ..

12:11 <+Beetstra> sure .. that is making a difference

12:11 <+Physchim62> I do not accept that. they have had a lot of publicity from their offer to WP and now they are renouncing it

12:12 <+Physchim62> and CAS get to choose which numbers they give us

12:12 <+Beetstra> I concur .. this is not good

12:12 <walkerma> We will go from zero to perhaps 6000 validated CAS nos in a week - that's good I think

12:12 <+Beetstra> Yes .. but please keep CAS out of the publicity here

12:12 <+Physchim62> no, it is bad, it is worse than the situation we had before*

12:12 <walkerma> The change will be announced openly

12:12 <walkerma> The good news longer term - they are willing to work with us.

12:12 <+Physchim62> the change will be attacked openly 5ie, not here and now)

12:13 <walkerma> I said that we could probably try to check our collection for obscure compounds, and then send them a list, I think they are willing to look at these

12:13 <walkerma> But it is simply much more efficient for them to work this way

12:14 <walkerma> We do have a genuine good relationship here, I can say more later, and I think we should be very glad for getting this collection - it's the first time in their history that they have made such a collection open to the public

12:14 <+Physchim62> bullshit!

12:15 <walkerma> Must go! I'll leave IRC on but I'll be away

12:15 <+Physchim62> sorry Martin, but these are published data

12:15 <+Physchim62> I'll email you later or tomorrow

12:17 -!- ChemSpiderman [n=tony@c-68-33-211-217.hsd1.md.comcast.net] has joined #wikichem

12:18 <+Physchim62> ChemSpiderMan, you have just missed the bad news from CAS

12:18 <ChemSpiderman> uh-oh.

12:18 <ChemSpiderman> what news?

12:18 <+Physchim62> [18:04] <walkerma> I just had a phone call with CAS (about 5 minutes ago), and the bad news is that it will be hard for them to check our individual substances

12:18 <+Physchim62> [18:05] <Physchim62> and the good news?

12:18 <+Physchim62> [18:05] <walkerma> The good news is that they will give us their own collection of 5000-1000 substances that THEY have determined to be "high traffic substances"

12:19 <ChemSpiderman> hmmm..interesting

12:19 <+Beetstra> Heh .. so we have to add ref to those 1000-5000 saying "this CAS number has been verified by CAS" ..

12:19 <+Physchim62> Martin is away from the computer at the moment, so I can't get him to explain further

12:21 <+Physchim62> for me, this is unacceptable for several reasons: 1) CAS is going back on the agreement that they made with Martin less than a month ago to validate about 7000 compounds

12:21 <+Physchim62> 2) CAS gets to choose which numbers it publishes, not us

12:21 <ChemSpiderman> This is not surprising...

12:22 <ChemSpiderman> where are you on our planet? You up for a phone call? I can call you...I was at ACS last week and sensed something...

12:22 <+Physchim62> 3) CAS is trying to enforce a right which it doesn't actually possess, ie its copyright in CASRNs (TM)

12:23 <+Physchim62> ChemSpiderman, I'm in Spain, and I would love to have a quick chat, but I'm not at home at the minute

12:23 <ChemSpiderman> no probs...

12:23 <+Beetstra> 4) CAS dictates which 5000 compounds are important .. which may be that those are the 5000 we already have .. but it may also be that it is 4000 are there, and 1000 not ..

12:23 <ChemSpiderman> you might end up with a series of new articles with no info other than a structure and a CAS number if it goes this way

12:23 <+Beetstra> .. within the 7000 we already have ..

12:24 <ChemSpiderman> I find it hard to believe that comparing a 5000 structures with CAS numbers is difficult.

12:25 <ChemSpiderman> Don't forget we host 20 million...we do stuff like this all the time

12:25 <ChemSpiderman> and we don't have a boatload of people to throw at the problem

12:25 <+Physchim62> yes, but WP is intended to have text as well

12:28 <ChemSpiderman> Sorry? Don't understand the comment...

12:28 <+Beetstra> For the compounds that don't exist yet in Wikipedia

12:28 <+Physchim62> no, it was me who misunderstood your comment

12:29 <+Beetstra> They are going to be stubs: {{chembox new|name=something|section3={{chembox identifiers|CAS=123-45-6}} }}{{stub}}

12:29 <+Physchim62> it seems to be a political problem in CAS: some people want to be open, some people want to be closed

12:29 <ChemSpiderman> By the way...take a look at http://www.wichempedia.org/Chemical-Structure.10194104.html

12:30 <ChemSpiderman> this is ChemSpider's approach to subsetting the Wikipedia:Chemistry data. We scrape the first paragraph and link back to Wikipedia for the full article

12:31 <ChemSpiderman> all done under FDL as noted on the site.

12:31 <+Physchim62> impressive!

12:32 <ChemSpiderman> I welcome your comments here or on the blog : http://www.chemspider.com/blog/wichempedia-very-early-beta-is-released-using-new-chemspider-dedicated-website-approach.html

12:32 <ChemSpiderman> Thanks

12:32 <ChemSpiderman> It was a long night getting this ready for today to share

12:32 <ChemSpiderman> there are teething problems but it's pretty good

12:33 <ChemSpiderman> It makes WP:Chem structure searchable now...but from outside WP. Structure/substructure

12:34 <+Beetstra> Oh wait .. ChemSpiderman, did Walkerma talk to you about the extended functions of the mediawiki interface?

12:34 <+Beetstra> You scrape the first paragraph, right .. from the html?

12:35 <ChemSpiderman> don't think so...

12:35 <ChemSpiderman> Walkerma didn't talk Mediawiki interface as I recall...

12:35 <+Beetstra> OK .. how do you get the data from wikipedia .. you load the page?

12:36 <+Physchim62> structure searchablity IS Good News. I don't think we'll be able to persuade the developers to include it in MediaWiki for quite some time yet, but it is nice to have some portals that can do it.

12:37 <ChemSpiderman> if there's some details you want to share please do...

12:37 <ChemSpiderman> we use http://en.wikipedia.org/wiki/Special:Export/Tylenol for example

12:38 <+Beetstra> OK, that is already quite OK

12:38 <+Beetstra> I was suggesting to use ChemSpiderman: see http://en.wikipedia.org/w/index.php?title=benzene&action=raw

12:38 <ChemSpiderman> the text content is then parsed with a home-built algorithm

12:38 <+Beetstra> Yes

12:39 <+Beetstra> Another thing you might want to have a look at is the api .. http://en.wikipedia.org/w/api.php

12:40 <+Beetstra> That can give you e.g. category-content in an easy format ..

12:40 <ChemSpiderman> will review

12:40 <+Beetstra> So you can see quickly if there are things added

12:40 <ChemSpiderman> thanks

12:40 <+Physchim62> I'm going to have to disappear for a while, partly to fuel my tobacco addiction and partly to consider what should be done about the CAS announcement (when it's publically made)

12:41 <ChemSpiderman> when will the announcement be made? I'll call Martin to chat

12:41 <ChemSpiderman> Very disappointing but NOT a surprise...

12:41 <ChemSpiderman> I did meetwith ACS pubs last week..."interesting"

12:42 <+Physchim62> CAS should be giving us their list in the next couple of days: I would repeat Martin's request NOT to make things public before he has the list of numbers$

12:42 <ChemSpiderman> I will say nothing until Martin and I have talked...

12:42 <+Physchim62> me niether!

12:42 <ChemSpiderman> You might be interetsed in this

12:42 <+Physchim62> neither, even

12:42 <ChemSpiderman> http://www.chemspider.com/blog/intention-to-scrape-crystaleye-content-and-staying-in-relationship-with-publishers.html

12:42 <ChemSpiderman> After 6 months

12:43 <ChemSpiderman> and a committee meeting between ACS and CAS

12:43 <ChemSpiderman> to discuss my question

12:43 <ChemSpiderman> they chatted with me in New orleans...and I still left without an answer

12:43 <ChemSpiderman> Please don't publish this on WP!!!

12:44 <+Physchim62> don't worry, it's not the first time we've had sensitive stuff discussed here ;)

12:45 <ChemSpiderman> If we have more questions about hooking WiChempedia to Wikipedia in a facile manner who should I come to?

12:45 <+Physchim62> I got involved in WP by telling Martin off for telling people how to make chemical weapons :P

12:46 <+Physchim62> I can try to get you in touch with people, but the first name out of my hat would be Beetstra

12:46 <+Physchim62> :P

12:47 <ChemSpiderman> how did I know..

12:47 <+Physchim62> I don't have that sort of technical competance

12:47 * Beetstra hides again

12:47 <+Physchim62> Beetstra, if you don't wan't to do it, I can try to find other people

12:48 <+Beetstra> No, it is fine

12:48 * Beetstra blocks another spammer

12:49 * Physchim62 salutes Beetstra for his commitment to the Cause :P

12:49 <ChemSpiderman> thanks Beetstra..I will approach you should we hit a wall.

12:49 <+Beetstra> There is a lot built in the interface to make interfacing easier

12:50 <ChemSpiderman> Would appreciate either of you posting about WiChempedia on WP to garner feedback. I only know certain pages....

12:50 <ChemSpiderman> we need the help!

12:50 <+Beetstra> Just use the wikiprojects .. that is most neutral

12:50 <+Physchim62> OK, I must go and find tobacco now, thank you all for your contributions, speak to you soon!

12:50 -!- Physchim62 [n=Physchim@unaffiliated/physchim62] has quit ["What did you say this button does?"]

12:51 <+Beetstra> And keeps you from being accused of COI ..

12:51 <ChemSpiderman> thanks!

12:52 <+Beetstra> I have to leave soon as well ..

12:52 <ChemSpiderman> bye all

12:52 -!- ChemSpiderman [n=tony@c-68-33-211-217.hsd1.md.comcast.net] has quit []

13:17 -!- Beetstra [n=djbeetst@Wikimedia/Beetstra] has quit [Connection timed out]

13:43 -!- walkerma [n=chatzill@admin-151-108.potsdam.edu] has quit [Remote closed the connection]

--- Log closed Wed Apr 15 13:44:26 EDT 2008