Wikipedia:WikiProject Chemistry/IRC discussions

From Wikipedia, the free encyclopedia

WikiChem has been meeting at channel #wikichem on freenode every Tuesday, 1600H UTC. All are welcome, and logs are published after each meeting regularly. In order to participate, you will need to download an IRC client such as ChatZilla. For more information, please contact Walkerma (talk · contribs).

Apart from this, all are welcome to drop by #wikichem for a discussion. Beetstra (talk · contribs), Physchim62 (talk · contribs) and Rifleman_82 (talk · contribs) can often be found there. You can approach them for any information.

Contents

[edit] Quickstart guide

Go to http://java.freenode.net/, type in your desired username (preferably the same as your Wikipedia one), and #wikichem (omit the #, it's been typed in already). A java-enabled browser is a prerequisite.

[edit] Past meetings

[edit] /15 Jan 2008

[edit] Agenda

  1. What do we have? The dataset of chemical compounds, currently being cleaned up by ChemSpiderMan et al. - numbers, quality? Data in other chemistry articles, e.g. on chemists?
  2. How can we make the data more easily searchable/mineable, and more suitable for the semantic web?
  3. How can we foster mashups with other sites that might bring chemists to us, while providing useful chemical information for the other site?

[edit] Summary of main conclusions

  • We probably have around 6000 organics with chembox or drugbox, and the majority of the list has been checked by User:ChemSpiderMan for Structure/Name. The list will probably get finished during February. Inorganics/organometallics haven't been addressed yet.
  • User:ChemSpiderMan will also provide us with InChIs and InChIKeys for all compounds, in the SDF file he is providing.
  • User:Petermr is planning to use this collection of articles/chemboxes/drugboxes as the basis of an RDF-based database, like a chemical version of DBPedia.
  • For this, we need to standardise the chemboxes (partly being done now through Chembox new), and we need to standardise the data content. "Main problems with the data were (e.g.) character encodings (can be awful), lack of consistency in units, difficulty of parsing annotations in values (e.g. 200 (decomposes))."
  • We might reduce errors in things like density, MP, BP, by having such things stored with one single entry (in °C; or g/cm<sup3), with other versions being calculated from these.
  • User:Petermr would like to standardise how we pass information to and from the chemboxes/drugboxes. Bot?
  • WP is becoming the #1 source of information on simple compounds. Can we get things like links from chemistry articles, using the approach of Project Prospect?
  • User:Petermr would like us to use only ASCII, with no character encodings.
  • Do we need a "WikichemID" for each compound? If so, how should it be done? There was extensive discussion, but no clear conclusion.
  • Should the database reside on the wiki, or off? How should we get "drive-by" users to add information, if we make it hard to enter data? We agreed to sleep on this!
  • Should we start handling spectra? (ChemSpider is already doing this.)
  • How should we handle salts and different "forms" of the same compound - to be discussed later (covered the following week).

[edit] Action

  1. Continue to help ChemSpiderMan in completing the list, by fixing errors on the wiki. See the working list.
  2. Complete the migration of the Chembox so we have one standard chembox
  3. Look into amending how we handle certain data, to standardise on one unit and calculate the rest (I'm not sure how much of this is done already - Walkerma).
  4. Some items warrant further discussion soon, notably the WikichemID idea and the problem with salts - to be done the following week.

[edit] /22 Jan 2008

[edit] Agenda

  1. What progress has been made with the dataset, and what issues have arisen?
  2. How do we deal with salts, where there is perhaps a counterion in the name but not in the structure?
  3. What should be used as the primary key for the dataset (this was an unresolved issue from the previous week). Should we classify by compound (and if so, by name, structure or CAS#) or by article (which may cover several compounds)?

[edit] Summary of main conclusions

  • We should put in the MOS that structure, name, CAS, InChI, etc should all be for the same form of the compound.
  • We may need to put tables in, as with cresol or tartaric acid, when multiple forms are possible, but more discussion on this aspect is needed.
  • We will continue to classify compounds by article name, at least for the time being. The reasons: CAS is problematical in cases like tartaric acid where one "compound" can have lots of CAS#s, InChIs don't really work for inorganics, and Wikipedia is organised by article, not by specific compound.
  • We still need to clarify what CAS# should be used as the "main" one in the chembox, for the MOS.
  • We will need to validate the CAS nos. for the 6000 structures checked by ChemSpiderMan.

[edit] Action

  1. Work on validation of CAS nos. at Wikipedia:WikiProject_Chemistry/CAS_validation.

[edit] /29 Jan 2008

Please review some responses to Walkerma's questions to get the views of some chemical information professionals on this topic. Please also take a look at the InChI and InChIKey on some test pages:

Agenda: InChIs and InChIKeys

  1. How can we handle structural identifiers such as InChIs and SMILES properly? These are designed for machine-reading, but people may often use our visible info to "copy and paste" into a search engine.
  2. Should we promote the use of {{InChI}}, or develop something different? Should such information be placed together in the ChemBox, or in a databox at the bottom of the page (as with the InChI template)), or on a separate data page, or what? Should it be hidden, semi-hidden, or fully visible?
  3. ChemSpiderMan will be providing us with InChIs and InChIKeys (a concise, hashed version of InChI) for all the structures. Should we include InChIKey information as well? If so, where?
  4. (If time) How will we upload the information from ChemSpiderMan's SDF file, including the InChIs and InChIKeys?

Summary of main conclusions:

  • Consensus wasn't totally clear, but several options were discussed for the display of InChI strings:
  • A link farm
  • "Click to see or search on InChI"
  • Use of {{InChI}}
  • A lot depends on the technical feasibility. PC was not present, to explain how the {{InChI}} option might work. Some felt it would be better to display an InChI, perhaps with "soft" line breaks to break up the string only for displaying (if this can be done). Others liked the "Click to see or search" approach. There was an extensive discussion about how InChIs and InChIKeys work.
  • It should be possible to upload ChemSpiderMan's SDF file into Wikipedia using a bot, assuming the articles have Chemboxes. The same bot might be used to check ChemBoxes on an ongoing basis. The bot should flag any Chembox where the PubChem link doesn't match with the bot list, and any other quick check like that.
  • We should reach consensus on use of InChIKeys on Wikipedia.

Action:

  1. Look into the possibility of a soft line break to break up InChIs etc.
  2. Post a "request for comment" regarding InChIKeys.
  3. Consider who might write and operate a bot for uploading the SDF file.

[edit] /5 Feb 2008

Agenda: CAS numbers - how can we validate these quickly, easily and cheaply?

Summary of main conclusions:

  1. Dealing with CAS nos. is very challenging!
  2. The only reliable way to validate them is via ACS. Ideally this might be done with the help of people at ACS/CAS, but failing that we will have to plod through SciFinder.
  3. Should we also mention "wrong but popular" CAS nos., to aid searches?
  4. The ChemBox could have separate cas and cas_validated fields.
  5. We have an issue of clarity: If an article is on (say) glucose, should it show the CAS no. for the unspecified isomer (which matches better with the article title?) or the CAS no. for the structure shown directly with it. The consensus was to usually include both, with the "specific" form shown close to the drawn structure. One proposal was: If we use a non-specific chembox, we could add a separate chemsubbox (within the chembox?) for information on a specific form or isomer.

Action:

  • Contact ACS for help. If we don't hear back by February 29th we will continue to work manually on the CAS numbers.
  • Add cas and cas_validated into the ChemBox
  • PC, Rifleman and Beetstra to determine the details of how to handle specific vs. generic CAS nos. in the Chembox.

Followup

  • I contacted someone I know at ACS (call him A), and he says that he passed my request on to CAS. I still haven't found out who at CAS, despite a "reminder" email at the end of February. While waiting back from person A, I had also contacted someone I know (less well) at CAS (call him B), and he responded, but by that time I had a reply. I didn't want Bto be duplicating the effort of someone else; I said I would ask for help from him if my first "lead" (via A) failed. Walkerma (talk) 04:30, 4 March 2008 (UTC)
  • We have been asked by CAS not to use SciFinder for curation. I have been in contact with CAS, we should hear back by mid-March. Walkerma (talk) 04:08, 9 March 2008 (UTC)

[edit] /12 Feb 2008

    1. Agenda: Choice and indexing of identifiers
      1. Which identifiers (InChI, CAS, etc) are the most important for us (already discussed to some extent)?
      2. Should we create indexes on these identifiers?
      3. Under what circumastances should we link out to external sites?
Summary of main conclusions:
    1. Action:

[edit] /19 Feb 2008

Agenda: "The protonation problem" and related issues

  1. How do we deal with compounds such as Geranyl pyrophosphate which may exist in various conjugate acid/base forms under physiological conditions? See comment here. What about drugs such as Ranitidine, which may be produced in a salt form, yet which are often written as a neutral compound?
  2. Related to this, how should we handle zwitterions such as amino acids and betanin?
  3. Related to this, how do we handle tautomers in cases such as 1,3-cyclopentanedione, where the structure may vary depending on conditions?
  4. (If time) How do we deal with counterions - this often arises with pharmaceuticals which may even exist with a variety of counterions such as succinate, maleate, etc.
  5. (If time) How do we deal with sugars such as Fructose-1-phosphate or glucose? Cyclic or open-chain form? See these comments.
  6. (If time) How should we deal with hydrates of salts, and different Werner complexes, as seen at chromium(III) chloride?

Summary of main conclusions:

  • When we have the choice of charged or uncharged forms, we will (for consistency) use the uncharged form. We agreed that "compounds will be shown in the neutral form, no matter what is their "standard form"." Thus, a pyrophosphate ester will have OH groups, not Os. An amine will typically be shown as the amine rather than in its protonated form. Details of structure and counterions can be discussed in the article.
  • The same approach will be taken with a zwitterion such as an amino acid, with explanation of the zwitterionic structure included in the article.
  • In cases of tautomerism, where there is some ambiguity, the article name will be agreed based on a case-by-case basis, and the structure etc. will match the article name. Details of the tautomerism can be handled in the article, as in 2-pyridone

Action:


[edit] /26 Feb 2008

    1. Agenda: Chembox issues
      1. A carry-over from last week - how should we organise chemboxes for pages where several related substances are being described (e.g., tartaric acid, cresol?
      2. How can we cite our sources for ChemBox information without (a) breaking the Chembox in the printable version and (b) confusing the reader? See User:Walkerma/Sandbox2 and its printable version as a test place.
      3. (If time) Is "table creep" a problem? Is there anything we should be keeping off mainspace and either hiding or placing on the data page?
Summary of main conclusions:
    1. Action:

[edit] /4 Mar 2008

Agenda: Organic reactions - now with a general review Background: We have been approached by Mark Leach (who runs an online reaction database), regarding the upload of generic reaction information into Wikipedia. I (Walkerma) took the liberty of inviting him to talk to us on IRC about how reactions can be represented online. A more detailed agenda will be posted later.

  • Meeting with Mark Leach postponed: He has had to cancel the meeting with us for 4th March, but hopes to attend in a week or two. However, he will use the time to write a demo page for us to look at. For March 4th I am proposing we cover the following:
  1. Review of all our recent meetings - what are the main things we should be working on? Who is going to work on them? (I will try to expand the summaries and action items before the meeting)
  2. What are the main topics still outstanding?
  3. If there is time, perhaps we could begin to consider how we handle reactions. One proposal of mine (Walkerma) is the use of image maps: See Ryoji_Noyori#Chemistry and Lithium_aluminium_hydride#Use_in_organic_chemistry.

Summary of main conclusions:

  • Most discussion centred around updating and expanding the chemistry manual of style, so that it includes all of the standards and systems we agreed upon at the recent meetings.
  • Related to this, we agreed on a need to improve communication, and to work with neighbouring WikiProjects.
  • There was some informal discussion on image maps, which were seen as useful, and people were impressed by the image map editor.

Action

  • User:Rifleman_82 agreed to take on the central task, rewriting and organising the [manual of style] (also see draft version). This will include the WP:Chem style guide as well as other aspects of chemistry. It will be written in summary style with sub-pages as needed. He will also assist Walkerma by contacting fvas and assisting with a new navigation scheme.
  • User:DMacks agreed to help establish rules for images and contribute this to the style guide. He will be contacting User:Benjah-bmm27 to request his input.
  • User:Axiosaurus has agreed to look at how we can tighten up our policies on inorganic nomenclature.
  • User:Walkerma has agreed to (1) Design a better navigation system around the chemistry pages (projects, portal, MOS), so that newcomers can find stuff more easily and (2) Talk to the neighbouring projects, and get their opinion on our MOS additions (others may help with #2). He will also contact ~K about writing a policy for reaction pages.


[edit] /11 Mar 2008

Agenda:Dealing with inorganics & organometallics. Also Mark Leach will join us to talk about chemical reactions.

We have looked in detail at Chemspiderman's collection of organics. How should we validate the remaining compounds?

Summary of main conclusions:

Action

Followup

[edit] /18 Mar 2008

Agenda:Tying up the loose ends for validation by CAS

  • We need to resolve a few outstanding issues such as "Which carbohydrate form should be used?"
  • Ensuring that we have neutral forms, not charged forms (as we agreed at the 19 Feb meeting).
  • What remains to be done to build a collection of inorganics & organometallics?

Summary of main conclusions:

  • For carbohydrates, we plan to define a "standard form" for all of the common carbohydrates. For others, the alpha-pyranose form will be the standard form by default. If there is good reason to choose a different form for a particular carbohydrate, this can be discussed until a consensus standard form for that compounds is reached. We did not agree on which representation would be used; the Haworth form was not popular, but there was no clear decision made between chair forms or stereodifferentiated hexagonal cyclohexanes.

Action

  • Write a page showing the standard form for the common carbohydrates.
  • See how well our current collection matches with the new rules.
  • Resolve how best to represent the pyranoses.

[edit] /25 Mar 2008

Many of the regulars can't make it this week, so there is no formal meeting. As usual, #wikichem is always open for informal discussion about...anything really.

[edit] /1 Apr 2008

No formal meeting.

[edit] /8 Apr 2008

I propose that we continue with informal meetings for now - mostly we just need to get on and do the work, instead of talking about doing the work! We can discuss progress on the CAS validation work, and also perhaps get a New Orleans report from anyone who is there. Walkerma (talk) 07:19, 6 April 2008 (UTC)

Time Should we change the time? It seems that our original time has become difficult for several of our group, and things have changed anyway with the clocks going forward in many countries. Are there any other times on Tuesday that you would prefer?

My availability is significantly reduced now unfortunately. This week I am not available until 1pm and the following week I am at the ACS. Lunchtime (noon) on Tuesday is certainly better for me.--68.33.211.217 (talk) 16:54, 30 March 2008 (UTC)

Agenda:Getting the chemicals list ready for CAS We have two main groups of articles that we are currently getting ready for CAS. Physchim62 also has a combined version.

I assume it is a list (XLS/TXT?) rather than an SDF file?--68.33.211.217 (talk) 16:54, 30 March 2008 (UTC)
Actually, Antony's collection is an SDF file. I'm not sure about Physchim62's file. Walkerma (talk) 03:44, 1 April 2008 (UTC)
My file is in a relational database, but I can provide other versions without too much problem. Wikipedia:WikiProject Chemicals/Inorganics was extracted from the database, for example. Physchim62 (talk) 17:49, 1 April 2008 (UTC)

I propose we find out what has been done and what final tweaks still need to be done. The two main lists are:

  • Antony's SDF collection
This is progressing but slower than I would hope because of many other distractions--68.33.211.217 (talk) 16:54, 30 March 2008 (UTC)
I know the feeling! Physchim62 (talk) 17:49, 1 April 2008 (UTC)
I extracted a list of linked Wikipedia pages from it. Was pretty easy to parse and munge the .sdf DMacks (talk) 14:35, 8 April 2008 (UTC)

[edit] /15 Apr 2008

Several people seem to want to discuss CAS validation and "data-mining" from WP; I shall do my best to be available. Physchim62 (talk) 19:26, 14 April 2008 (UTC)

OK, let's do that. I asked if people were interested in an IRC meeting on this topic, but no one responded to my request, so I was expecting this to be another small, informal gathering. I think many people are catching up after New Orleans, and I think Rifleman may also be on the road, but perhaps a few can gather - there is certainly interest in the data-mining aspect. I will be quite busy myself, so I may not be able to be there for much of the time. PC, can you chair the meeting? I expect I will be joining on IRC around 1610h UTC. I should have an update on the CAS work, also. Walkerma (talk) 21:41, 14 April 2008 (UTC)
OK, will do. Can you remind me to log it, in case I forget! Agenda is (depending on who can be available):
  • update on CAS verification
  • questions/discussion concerning "data-mining" from Wikipedia
  • any other issues
Physchim62 (talk) 13:46, 15 April 2008 (UTC)

Summary of main conclusions:

Action

[edit] /22 Apr 2008

It looks as if ChemSpiderMan can make this meeting, and PC can now talk on IRC, so we will try to meet formally this week. Many of the topics I'm proposing are similar to what is listed above for April 15.

Time: 1700 h UTC (1pm US EDT). NOTE NEW TIME, one hour later!


Agenda:

  • How to merge in data from CAS once this has been received). If we have received the file, we can perhaps discuss that too.
  • How to organise the data once it is validated. We need to find a way to ensure that validated content remains intact. PC has some ideas on how to put the data into a database form within WP.

Summary of main conclusions:

  • PC will continue working on CAVer, a relational database linking WP articles with specific compounds, while we are waiting for news from CAS.

Action

  • circulation (by email) of test lists in the various formats used; for queries, contact PC.
  • meeting logs not to be published until situation with CAS is clarified

[edit] /29 Apr 2008

PC will not be able to make the formal meeting, but will try to be on IRC 1530–1630 UTC to answer any questions.

I probably can't make it (network flaky--if someone else can log, I can format & post it later), but did manage to get non-volatile data shifted out of the main article. Bonus: InChI keys google-indexable (actually visible on a page) but not visible in article Chembox. proof of concept DMacks (talk) 13:44, 29 April 2008 (UTC)

Time: 1700 h UTC (1pm US EDT).


Agenda:

  • feedback re CAS data and prospects
  • AOB

Summary of main conclusions:

Action

[edit] /6 May 2008

Agenda:

Summary of main conclusions:

  • There are some issues to be resolved on differences between CAS format and our format for some data (especially inorganics). We may need to contact CAS on this.
  • Despite this, it should be possible for us to release groups of 500 articles at a time (monthly?), starting quite soon.
  • Physchim62 will handle curation of inorganic data, while ChemSpiderMan, Walkerma and Rifleman82 will be handling the organics.

Action

[edit] /13 May 2008

Meeting at 1600h UTC (noon US EDT). Agenda:

  • Wichempedia, chempedia, wikichem and related ideas.
  • If any time left, we can discuss CAS issues further.

Summary of main conclusions:

  • Nobody was talkative in-channel today...no actual meeting.

[edit] /20 May 2008

Meeting at 1600h UTC (noon US EDT). Agenda:

  • An informal meeting to discuss the CAS work, and the wikichem idea, as people see fit.

Summary of main conclusions:

[edit] Upcoming meetings

[edit] /27 May 2008?

I am on holiday/vacation until June 14, so can't moderate till then- so if there's going to be any meetings someone else will have to volunteer and set an agenda.

Agenda:

Summary of main conclusions:

Action