Wikipedia talk:WikiProject Chemistry/CAS validation
From Wikipedia, the free encyclopedia
Contents |
[edit] CAS Discourages Using SciFinder for curating 3rd party databases (e.g. Wikipedia)
Chemical Abstracts Service (CAS) objects to anyone encouraging the use of SciFinder® and STN® to curate third-party databases or chemical substance collections, including the one found in Wikipedia. SciFinder and STN are provided to researchers under formal license agreements, under which the researchers agree to refrain from using these tools to build databases. We urge and expect those researchers to respect the explicit terms of the agreements they have entered into. CAS is a division of the American Chemical Society. Please contact CAS if you have questions. Eric Shively, CAS, eshivelyATcas.org Eshively (talk) 20:56, 5 March 2008 (UTC)
- Thanks for your reply, I contacted CAS last month and was hoping to hear from someone there. Thanks for making contact and clarifying the limitations. I'll be in touch soon, Walkerma (talk) 03:43, 7 March 2008 (UTC)
- I have blogged this - thanks to Chemspiderman and am appalled by it. Unless Chemical Abstracts changes their policies I think the only logical and safe thing to do is to boycott the use of CAS numbers anywhere in Wikipedia. (There should of course be factual entries about CAS and the CAS number system). There is no way of proving to CAS that information in Wikipedia has not been scraped from Scifinder, but as Wikipedia rightly honours copyright it should assume that with an aggressively unhelpful copyright holder all information comes from third-party sources.Petermr (talk) 05:02, 8 March 2008 (UTC)
- Well, does this mean that there is no way correcting structures according to their CAS numbers? And this raises indeed the question, if there is any use of a CAS numbers in the public domain? Looks like a classical license compatibility problem. And this raises the following questions. 1. It is not allowed using SciFinder® and STN® for curating CAS numbers. 2. Is it allowed using scientific publications with CAS numbers for curating CAS numbers? JKW (talk) 08:53, 8 March 2008 (UTC)
- CAS numbers are not subject to copyright. Wikipedia did not enter an agreement with the parties involved with CAS. Any individual should check whether he/she is bound by any contract that would prohibit entering CAS numbers in Wikipedia. Anyone else should continue to improve the articles, which includes validating CAS numbers. -- 89.12.246.73 (talk) 15:51, 8 March 2008 (UTC)
- FYI: /CAS-German Wikipedia -- 89.12.246.73 (talk) 16:01, 8 March 2008 (UTC)
- I agree with Eric Shively wholeheartedly on this issue. CAS databases' terms and conditions should be respected (unless they conflict with each other which makes this impossible). Equally, I would not be in favour of a boycott or removal of any information wrt CAS or CAS numbers from Wikipedia. This would damage the Wikipedia information resource and undermine the contributions of users. There are many sources of CAS numbers (e.g. some chemical catalogues) from which I think (without being a lawyer) it is perfectly legal for these identifiers (as I see them) to be obtained from though I accept CAS databases must be the most reliable source. Will-ocw (talk) 20:10, 8 March 2008 (UTC)
- IMHO, the ideal solution is for us to work together with CAS, and that is what I want WP:Chem to do. At one of the recent Wikipedia IRC meetings where we discussed CAS numbers, that was the approach we agreed to take, and the one I have been pursuing. Whatever opinions people may hold on the copyright issue, most would agree that CAS registry numbers are more established and more reliable than the alternatives, while InChIKeys (while valuable) are based on molecular structure (microscopic), not chemical composition (macroscopic). I think if we lose CAS from the chemical information community, it would be a great loss to all, so instead I would rather find a way to break down the walls. If ACS and CAS can fully participate in the new web-based world of open information, then everybody gains a huge amount - except perhaps for ACS's commercial competitors. We just need to ACS and CAS to see it that way! Walkerma (talk) 03:41, 9 March 2008 (UTC)
-
- Comment--Please examine the exact wording of their notice. Even they they did not explicitly say not to use the registry numbers. And they referred only to Scifinder, and STN, the electronic services, and in terms of the licensing, not the copyright. CAS registry numbers are available from other sources--including their printed Chemcial Abstracts. Most libraries in the US no longer get it in print, but some of the large public libraries still do--including the New York Public Library, and many libraries in other countries also. The numbers can also be found for many compounds through a variety of other sources.
CAS has claimed in the past that the registry numbers are copyright, though to the best of my knowledge this has never been tested in court. They would be hard put to show that the use of a limited number of them, obtained individually in observance of the license, was not a fair use. They do not make that claim here. DGG (talk) 10:17, 9 March 2008 (UTC)
I trust that the chemical community, on Wikipedia and elsewhere, will treat this missive from CAS with the contempt it deserves.
- It is clear that CAS places the maximization of its revenue above the provision of chemical information. What does CAS object to here? That researchers use its products to find chemical information, or that this information is published? In either case, its stance is both ludicrous and profoundly anti-scientific.
- In a discussion about CAS registry numbers, it should be pointed out that these are used by many governments and international organizations (see, e.g., 29 CFR 1910.1000) and innumerable commercial firms (e.g. chemical suppliers). Indeed, they would not be interesting for WP if they were not so widely used! CAS tacitly admits that it cannot control this use through copyright law, as has been discussed at length both on WP, which is why it has to resort to contract law in the form of the draconian license terms it imposes for access to its databases.
- However CAS is effectively a monopoly supplier of much chemical information, as can be seen from the prices it manages to charge for access to its databases. The restrictions it purports to impose of the reuse of its “product” would appear to breach anti-trust legislation on both sides of the Atlantic. Users of CAS databases in the European Union can take heart from Art. 8.1 of the Database Directive (96/6/EC):
- “The maker of a database which is made available to the public in whatever manner may not prevent a lawful user of the database from extracting and/or re-utilising insubstantial parts of its contents, evaluated qualitatively and/or quantitatively, for any purposes whatsoever.”
I call on CAS to make it clear that the information contained in its databases may be freely reused in accordance with the principles of chemical science and the laws of the jurisdictions in which it operates. Physchim62 (talk) 12:24, 10 March 2008 (UTC)
- I'm curious about how much is insubstantial. I don't think we aspire to include even 1% of the CAS registry numbers, but is that "insubstantial" enough? --Itub (talk) 12:29, 10 March 2008 (UTC)
I have blogged about this and received considerable feedback. There is clearly a difference of opinion in the WP community about whether to try to deal with CAS or not. I side with Physchim62 but I think there is also a dispassionate reason why WP should withdraw any mention of CAS numbers and I have outlined this on in my blog. Simply:
- Wikipedia requires authoritative sources for its information.
- The assignment of a CAS number to one or more WP entries requires the authority of CAS
- CAS forbids WP to use this authority
- Therefore WP cannot include CAS numbers if it wishes to uphold its principles of authoritative sources - there are NONE available to it.
So I think WP would violate its own principles by including CAS numbers Petermr (talk) 14:56, 10 March 2008 (UTC)
- With all due respect to Petermr, his arguement fails on the second point: CAS numbers are used (and often misused) without the intervention of CAS. Both the US federal government and the European Union quote them in their regulations, for example. The people who would benefit from "correct" CAS numbers being included in WP are CAS themselves. That is the irony of it all! Physchim62 (talk) 17:51, 10 March 2008 (UTC)
- Two more aspects; 1. The question must be raised, if there are not already plenty examples of copyright misuse in 3rd party databases and especially journal publications. Just images X publications publish Y CAS numbers each, and the total would be higher than 10000. This is a copyright violation, isn't is? We need a laywer for answering how valid CAS copyright claims are under a public domain and misuse standpoint of view; 2. I honestly think that no information is better than wrong information, which means here CAS numbers. Since we can not use CAS services for validation, can we include CAS numbers at all in Wikipedia? How do we verify that they are correct? JKW (talk) 21:14, 10 March 2008 (UTC)
I'll quote from my blogpost "I say let’s not abandon hope regarding CAS opening their numbers to the world just yet. This dialog is likely sparking discussions already. Let’s keep it out there and establish a groundswell of concern and support and hope that the right thing can happen for our good and for CAS. I have great respect for many of their people and their work and want the resolution to be appropriate for all parties." --ChemSpiderMan (talk) 00:25, 11 March 2008 (UTC)
- To reply to JKW, it's not a lawyer we need to determine whether the claims of copyright in CAS Registry Numbers® are valid, but a judge! The issue has been discussed at length over the last two years on Wikipedia (English and German) and the consensus seems to be that there is no copyright for lack of creativity. For example, see Matthew Bender v. West Pub. Co. (158 F.3d 693). Physchim62 (talk) 15:48, 11 March 2008 (UTC)
[edit] New announcement from CAS
CAS, a division of the American Chemical Society, is pleased to announce that it will contribute to the Wikipedia project. CAS will work with Wikipedia to help provide accurate CAS Registry Numbers® for current substances listed in Wikiprojects-Chemicals section of the Wikipedia Chemistry Portal that are of widespread general public interest.
The CAS Registry is the world’s most comprehensive collection of chemical substances and the CAS Registry Number is the recognized global standard for chemical substance identification.
CAS views Wikipedia as an important societal tool for the general public, and this collaboration with Wikipedia is in line with CAS’ mission as a Division of the American Chemical Society.
We look forward to working with the Wikipedia volunteers over the next few weeks to make this happen.Eshively (talk) 13:40, 12 March 2008 (UTC)
- Well that's great news, both for CAS and for Wikipedia. I can only hope that the same constructive spirit is shown to other legitimate users of information from CAS databases ;) Physchim62 (talk) 14:19, 12 March 2008 (UTC)
- This is a terrific outcome. I think the collaboration between CAS and Wikipedia will definitely assist us in producing a high quality validated dataset of chemical compounds as well as demonstrate to the community the intentions for both parties to collaborate for the public good. This is a good outcome! I have blogged it here. --ChemSpiderMan (talk) 15:33, 12 March 2008 (UTC)
- Congratulations and thanks for innitiating this discussion, contributing, and bringing it to a broader audience! My personal special thanks go to the very constructive contributions of ChemSpiderMan, User:Walkerma, and Physchim62
- JKW (talk) 17:12, 15 March 2008 (UTC)
[edit] Blog Feedback
- http://www.chemspider.com/blog/cas-discourages-using-scifinder-to-help-curate-wikipedia-structures-and-cas-numbers.html --ChemSpiderMan (talk) 00:25, 11 March 2008 (UTC)
- http://wwmm.ch.cam.ac.uk/blogs/murrayrust/?p=1003
- http://opendotdotdot.blogspot.com/2008/03/worlds-leading-anti-scientific-society.html
- http://www.earlham.edu/~peters/fos/2008/03/acs-blocks-use-of-industry-standard.html
- http://www.chemspider.com/blog/enforcing-copyright-of-cas-numbers.html
- http://www.chemspy.com/chemistry-news/copyright-and-cas-numbers.html
- http://miningdrugs.blogspot.com/2008/03/cas-numbers-are-not-public-domain-are.html
- http://www.chemspy.com/chemistry-news/copyright-and-cas-numbers.html
--Historiograf (talk) 21:49, 10 March 2008 (UTC) and JKW (talk) 23:48, 10 March 2008 (UTC)
[edit] First block of 25
I've looked at the first block, and there appear to be some problems with stereochemistry already. A few comments. To check the pdf against scifinder as-is (for someone using only one monitor/laptop screen) requires a lot of alt-tab window switching. Copying individual CAS numbers from the PDF into the Scifinder window (which can accept a text file or multiple lines) is not very efficient.
Tony, is there a way to generate the list in an excel file, CAS on left, structure on right, which I can print out? Then I can copy 10 or 25 CAS #s onto SciFinder, print out, and check one by one? It'll a lot of paper compared with printing the 25 pages as 25 pages.
Walkerma, I'll send you a summary of the Scifinder output by email. --Rifleman 82 (talk) 09:30, 26 January 2008 (UTC)
- Yes, I can generate an Excel like file for you and send it forward. More work for me but it will reduce the work for you all so I'll do it. I'm not sure when I'll get to it but I might get it done today. If not today then it's unlikely to be until next week because of some meetings I am hosting next week.--ChemSpiderMan (talk) 13:44, 26 January 2008 (UTC)
-
- Rifleman...before I wade too deeply into the issues can you comment on the problems you are seeing? Lack of stereochemistry? Incorrect stereochemistry...I assume the latter. What we have to agree on is whether the primary key is the structure name and find the CAS reg number that matches that and adjust the structure to that. The problem is if the name itself is "too general". What we need to make sure of is that the CAS Number doesn't become the primary key and things are adjusted around that to far. These is much easier to explain in an interactive dialog if we can have one--ChemSpiderMan (talk) 15:43, 26 January 2008 (UTC)
[edit] Looking for a Status Report
I am presently finishing up a paper for submission tomorrow. My hope is to get back to Wikipedia curation in the next few days. Question: What is the progress with the validation of the 150 structures I have posted so far? Any feedback? I don't want to jump back into the project until those have been looked at in detail and there's feedback on the progress to date and the process I'm using. i welcome your comments all. --ChemSpiderMan (talk) 05:36, 15 February 2008 (UTC)
- I did one block, I think, but then I'm not sure exactly what was I supposed to do. :) For the entries that had a CAS number, I searched for that CAS number using SciFinder and checked if it matched the structure in the PDF file. There was one case where the stereochemistry of the structure in the PDF was wrong, but the structure in the WP article was correct! This was Adenosine thiamine triphosphate; however, the CAS number is for the neutral chloride so it does not match the figure exactly. I added a parenthetical note in the infobox to that effect; I don't know what the standard practice should be, but I think we will need these types of annotations often, especially if we want to provide more than one CAS number per "substance" (i.e., WP article).
- Another question is whether we will go ahead with using a different infobox field for CAS numbers that have been checked. In this case I did nothing with the entries that were correct.
- The case of articles that have no CAS number yet, and especially those with no infobox, is more complicated. I would prefer handling those as a separate project, as it is much more time-consuming. In those cases I searched for the compound by name, or sometimes by structure or formula, and added the new CAS number to the infobox if there was one. In the articles with no infobox I just added the CAS number somewhere in the article or talk page, as I was not in the mood to start creating infoboxes at that moment.
- Besides fixing the articles that need to be fixed, are we keeping some sort of log? For example, there were some pages in the PDF with comments and questions. In some cases I could answer the question if desired, but where? Especially, should we start a list of "unfinished fixes"? For example, the article on Aciclovir seems to have the wrong tautomer in the figure, but structure drawing is not my specialty, so I just left a note in the talk page. --Itub (talk) 10:08, 15 February 2008 (UTC)
-
- In terms of what to do I would hope to receive feedback/comments regarding errors identified in the files as they exist. You have provided one piece and the stereochemistry has been changed for the adenosine thiamine triphosphate structure.I believe the CAS number should be found for that compound if possible. The noting that the CAS number is for the chloride is exactly what I was hoping for. Since before the CAS number was not for th compound shown. Now people know it's for the chloride. Excellent. Adding the CAS numbers for the rest is good too. I have no way to source them from CAS. In terms of answering the questions I think the community preference would be to have the questions answered in front of everyone so probably setting up a page and listing the structure (linked to WP article), the question asked and the answer given would be ideal. Then people can discuss, get to consensus (maybe) and an action can be taken. At the end of the project my output will be an SDF file of structures and connected terms and then the job of migrating that information will need to start. There are automated ways to take my SDF file and create PNG files for uploading. --ChemSpiderMan (talk) 14:43, 15 February 2008 (UTC)
[edit] Stereo Issues on Structures
Rifleman...I have sent you and Walkerma the first 50 records in Excel format. I have now started re-checking those records based on your information. I can confirm there ARE issues with stereochemistry. The problem is where? Now that the structures and the IUPAC names on Wikipedia match the disconnect with the CAS number. So, is there a DIFFERENT CAS number for the structure drawn/...my expectation is YES...so you would need to draw the structure to do the search to get the correct CAS number OR the fact is the structure itself is wrong. How do we figure some of these out? This undertaking just got a whole lot bigger gentlemen...a lot bigger. We need to chat before it continues.--ChemSpiderMan (talk) 20:07, 26 January 2008 (UTC)
- Tuesday's not too far away. Can we do it then? --Rifleman 82 (talk) 02:50, 27 January 2008 (UTC)
-
- I will try and sit in on that discussion but have actually got someone visiting from Europe for the week and need to make the most of our time. I commit to attending the IRC chat provided I have internet access wherever I am at that time.--ChemSpiderMan (talk) 04:34, 27 January 2008 (UTC)
[edit] Inorganics
Myself and Martin have compiled a list of some 2400 articles describing inorganic substances, which can be found at Wikipedia:WikiProject Chemicals/Inorganics. Most of these substances will have to be verified on the basis of a name, and many articles describe more than one substance. There are also substances which are described in more than one article. Any comments on the best way to proceed would be gratefully received! Physchim62 (talk) 14:56, 25 March 2008 (UTC) It appears that there are a large number of red links on the list, dispite the fact that it was complied from WP sources! I'll get on to that problem, and post a revised version in the next couple of days or so. Physchim62 (talk) 15:01, 25 March 2008 (UTC)