Template talk:PDB

From Wikipedia, the free encyclopedia

Currently this template points to;

http://www.ebi.ac.uk/thornton-srv/databases/cgi-bin/pdbsum/GetPage.pl?pdbcode=1xyz

I think this is the wrong link for PDB codes, as the above database is just one example of many derived database based on the PDB.

I am going to update the template to point at the appropriate page of the official PDB website;

http://www.rcsb.org/pdb/explore/explore.do?structureId=1XYZ

If this is not acceptable for some reason, we can instead add several off site links per PDB code, for example (taken from the list of links maintained at PDBWiki [1]);

I think such a suite of links in addition to the official link could be quiet nice. --Dan|(talk) 09:59, 3 April 2008 (UTC)

In fact we have four PDB templates right now: "PDB2" ("official site"), in ProteinBox (see Histone H2B type 1-C), "PDB3" (PDBsum site) - in PfamBox (see C2 domain), and "PDB link" (PDBsum site) in enzyme articles by User:WillowW (see Category:Enzymes of known structure).
What kind of link is better for "PDB" is an open question. From WP perspective, it does not matter if a site is "official" or not. A lot of "official sites" are spam. We have no obligations to PDB and we are not affiliated with PDB in any way. The links must be "educational", convenient for a reader, and add value to WP articles. In that regards PDBsum is the way to go, because it provides a much better interface than the official PDB (I can easily explain why). We discussed this question with User:WillowW and decided to use PDBsum.
Please do explain why, I am curious.--Dan|(talk) 20:36, 3 April 2008 (UTC)

Even MSD is better than "official site". PDB has enormous resources and does very poor job, probably due to management problems.Biophys (talk) 15:53, 3 April 2008 (UTC)

That is easy to say, and I follow the argument above, but I would like to revisit this discussion. How / where are the above link options compared and contrasted? How do we keep that comparison up to date? Despite what you put above about the concept of an 'official' site, which I think is generally reasonable, somehow when it comes to data providers I think we have an obligation to link to the official source of the data (note the lack of inverted commas around that last use of the word official). Is it possible to make a page listing the major PDB derived databases with comparisons according to key features? It seems like that is what we need to begin to address this question, but I don't know where such a page should live... Comparison of online PDB derived databases??? --Dan|(talk) 20:36, 3 April 2008 (UTC)
I would recommend to create articles Protein structure databases and Protein sequence databases. Structure databases use PDB and a lot of other databases/resources, just like PDB itself. Calling them "PDB derived" would be incorrect.Biophys (talk) 03:53, 4 April 2008 (UTC)
I don't think it is incorrect to do so. Where does structural data come from if not from the PDB? People add links to that data and pull in information from other resources to supplement that data, but ultimately it all comes from (stems from) the RSCB PDB. Dosn't it? --141.14.26.125 (talk) 15:31, 4 April 2008 (UTC)
No. Only 3D coordinates of proteins come from RSCB PDB. In addition to 3D coordinates, the derivative databases (and partly RSCB PDB) include a lot of other data: amino acid sequences, ligands, chemical reactions catalyzed by enzymes, secondary structure, automatically generated pictures of ligand binding pockets, modeled protein movements or arrangements of proteins in membranes, and so on. Many of these data can be found only in derivative databases, but not in RSCB PDB.Biophys (talk) 20:45, 12 April 2008 (UTC)

Contents

[edit] PDB versus PDBsum

Please compare "2axt" (as a random example) in PDB and PDBsum. The interface by PDBsum provides, among other things, the following:
  1. Abstract of key crystallography paper with key Figures;
  2. Link to each subunit (row at the left) - click there and you will see the picture of an individual subunit and nice diagram with topology, amino acid sequence and regular secondary structure; please click further to "sequence" motifs, and so on
  3. List of ligands - click at any ligand - and you will see something like that [2] - "standard PDB" also has links to ligand structure (please compare)
  4. Pictures with secondary structures, including beta-sheet topologies produced by PROMOTIF;
  5. Pictures with chemical reactions for enzymes;
  6. Links to UniProt records (this is important);
  7. Links to many other databases that are not linked to PDB

None of that can be found in the "standard" PDB. I could continue and make this list much longer. Just look yourself and trace the links provided in PDB and PDBsum. "Standard PDB" is a very poor job compare to PDBsum.Biophys (talk) 03:53, 4 April 2008 (UTC)

OK, I'll have a look and see what there is to see. I am slightly worried that you favour one site because you are more familiar with that site, but I'll have a look. I have also found many bugs on the PDB site, so I know where you are coming from. --141.14.26.125 (talk) 15:31, 4 April 2008 (UTC)
So, did you look at PDBsum?Biophys (talk) 16:08, 10 April 2008 (UTC)

[edit] Comparing different PDB templates

Here is an example call to each of the five different PDB templates that I have found in common usage, namely 1) {{PDB}}, 2) {{PDB link}}, 3) {{PDB enzyme}}, 4) {{PDB2}}, and 5) {{PDB3}};

  1. PDB 1xyz
  2. 1xyz
  3. PDB 1xyz
  4. 1xyz
  5. 1xyz

Note that {{PDB enzyme}} redirects to {{PDB}}.

The current usage is ... 212, 903, 4, 2986, and 166 times, respectively (as of a couple of days ago).

Clearly, {{PDB enzyme}} can be converted into {{PDB}}, and it certainly looks like {{PDB3}} can be converted into {{PDB link}}.

The question remains as to whether it does any harm to convert {{PDB link}} into {{PDB2}} (or vice-verse). It looks like the difference is minimal, the question is related to the general style of external links.

The final question whether {{PDB}} can be similarly merged onto {{PDB link}} or {{PDB2}}. i.e. is that call to the PDB article really needed on the pages that use this template.

Just a dump of my current thinking. --Dan|(talk) 15:50, 4 April 2008 (UTC)

From the PBB perspective, I very much support the use of {{PDB2}} and would object to its disappearance. This template is well suited to the PBB infobox, in which the external link icon would really take up a substantial amount of space. (For example, the list for Cyclin-dependent_kinase_2 is already quite long -- imagine that using {{PDB link}}.) However, I imagine for inline links, {{PDB link}} is preferable since it clearly indicates an external link. While I agree we should standardize and consolidate when appropriate, I don't think we need to be too aggressive about this if it means sacrificing these subtle (but important) differences. AndrewGNF (talk) 01:22, 5 April 2008 (UTC)
And I strongly support Andrew. There is a reason for doing this. The PDB2 template is currently used in the ProteinBox template (many thousands of articles). By creating the additional PDB2 template, Andrew makes sure that any changes of "main" PDB template (like you did) will not affect his work with ProteinBox template. Same thing was done by WillowW (enzyme articles) and by me (PfamBox). This also creates an extra flexibility. If anyone of us decides to switch to MSD instead of standard PDB in all higher-level templates, one can easily do so without affecting anything else.Biophys (talk) 16:58, 5 April 2008 (UTC)
I agree too. However, I don't think the story ends here. Lets be clear, there are two arguments to discuss here, one is link format, the other is link destination. It seems totally reasonable to format links differently for 'large scale' use in boxes (as argued above). And I can almost see the point of adding a link to the PDB also (although I still want to check through some articles to see if this is really necessary). However, with regard to link destination, I think it is confusing and wrong to have links pointing to different destinations. I think we should reach a consensus and harmonize the link destination (changing many thousands of articles if necessary - isn't that the whole point of templates?). --Dan|(talk) 20:54, 5 April 2008 (UTC)

In the interest of keeping everything straight, I created {{PDB_template_usage}} and transcluded it from the various PDB templates above. thoughts? AndrewGNF (talk) 20:15, 5 April 2008 (UTC)

Looks good. Lets move some of the other information from {{PDB}} into {{PDB_template_usage}}. I am going to try to write Protein structure databases ... but ... well... lets try... --Dan|(talk) 20:54, 5 April 2008 (UTC)
Good job! I think the question by Dan about the best destination is reasonable. As I noted above, there are huge advantages from educational/WP perspective to use PDBsum whenever posssible, unless someone can explain why "official" PDB link is better - I think only crystallographic group of symmetry is better in PDB than in PDBsum. On the other hand, let's respect opinions of people who are using PDB links in the more complicated templates. If AndrewGNF wants to use "official" PDB in ProteinBox, so be it.Biophys (talk) 16:21, 10 April 2008 (UTC)
I'm agnostic, so if you guys want to change {{PDB2}} to PDBsum, I have no objection. AndrewGNF (talk) 17:31, 10 April 2008 (UTC)
Two points in favor of PDB. First is up time. As I write this, the PDB site is up and PDBsum is down. The PDB is mirrored, and to the best of my knowledge, PDBsum is not. The second is timeliness. When I want to download coordinates of a structure, I go to the primary source, the PDB. This is the first site that will make available new structures and the first site that will publish corrections. The primary purpose of links in WP is verification and secondary purpose is additional education. WP articles should to the greatest extent possible be self explanatory and not rely on external links for understanding. Hence IMHO, reliability and timeliness should take precedence over adding additional understanding. Cheers. Boghog2 (talk) 21:08, 10 April 2008 (UTC)
The entire EBI site is down today, not only PDBsum. That can happen anywhere. Right, there are no mirrors except PDB iself, which can be easily accessed (if PDB is down, you do not get an instant access either with links from WP). Pleas note that PDB links do not verify anything, because this database does not provide any information as text, except PDB headers (after a couple of extra clicks). This is contary to PDBsum which provides abstract of key articles, or UniProt which provides annotations. Therefore, PDBsum, or UniProt can be used for WP:Verifiability purposes, whereas PDB is not. The PDB/PDBsum links are mostly useful to provide some graphic information about proteins and to serve as "gates" to other internet resources, and that is where PDBsum much superior.Biophys (talk) 22:14, 10 April 2008 (UTC)
The following points need to be clarified:
  • PDB links do not verify anything, because this database does not provide any information as text. The PDB does in fact provide text information. On the left hand side of any structure page in the PDB is an expandable list of "Download files" which includes the header text, full text (header + coordinates), and the coordinate data in a number of formats (xml, mmCIF, etc.). As far as I can see, PDBSum only provides the header information and not the coordinates. If you want the coordinates in PDBSum, you must follow the links back to the PDB.
  • PDBsum which provides abstract of key articles, or UniProt which provides annotations. The PDB also provides the abstract as well as a link to the corresponding PubMed citaiton. In addition, the PDB provides a direct link to the relevant UniProt page under the "Sequence Details" tab on every structure page.
  • The PDB/PDBsum links are mostly useful to provide some graphic information about proteins and to serve as "gates" to other internet resources, and that is where PDBsum much superior. I have not done an exhaustive comparison, but it appears that the PDB provides many of the same links that are found in the PDBSum. The links in PDBSum may be organized better, but they are for the most part also included in the PDB.
  • WP:Verifiability. What is being verified here? It is the experimentally determined structure. Many funding agencies and scientific journals require that the coordinates of structures be deposited in the PDB. The PDB has the task of insuring that these structures conform to certain quality standards, are properly documented, are assign a unique accession code (which by the way, is universally used by derivative databases including the PDBSum), and are made freely available to everyone. In addition, the PDB has the most complete on-line information about the experimental conditions under which the data was collected and this information is important in verification and in the interpretation of the structures. Furthermore the PDB is now requiring that structure factors be included in any structure that is deposited in the PDB. While it is highly unlikely that a Wikipedia reader would check a structure using the structure factors, these factors provide the most rigorous level of verifiability that the data for the structure was collected and interpreted in an optimal way and the PDB to my knowledge in general the only place where this data can be obtained. Boghog2 (talk) 15:13, 12 April 2008 (UTC)
Boghog2, did you look and compared both sites? Did you ever use yourself these databases in your work? The difference between PDB and PDBsum is obvious and significant. Otherwise, I would not argue here.Biophys (talk) 18:29, 11 April 2008 (UTC)
I have used the PDB for close to 20 years as a source of coordinates for structure based drug design. The coordinates can be downloaded from the PDB and not from the PDBSum. In other words, the PDBSum lacks essential data that is required for my work. I think the confusion here is that different users have different requirements. From a bioinformatics perspective, the PDBSum has some advantages. For structural biology, PDBSum lacks the most basic data (the coordinates). In short, I think PDBSum is a nice complement to but was never intended to replace the PDB. Cheers. Boghog2 (talk) 15:13, 12 April 2008 (UTC)
Sorry, but I can not agree. There are two main points. 1.Yes, the PDB database is more convenient for downloading the coordinates. But a wikipedia reader does not need these coordinates. Specialists (like you or me) will never download protein structures by reaching them through WP articles, and non-specialists do not need to download anything. Remember that WP is designed for general public, high school students, etc. A common wikipedia reader only needs is a good portal to other internet resources and good graphics that shows various features of the proteins, such as subunit structure, secondary structure, ligands etc. That is where PDBsum much superior. What user would prefer link to PDB rather than to PDBsum from WP articles? Certainly not a student, and not a specialist like me. 2. PDBsum provides quick link to standard PDB (and many other resources), whereas standard PDB does not provides link to PDBsum. So, the PDB is only one click away of PDBsum, but not vice versa. Finally let's not discuss "verifiability" issues to save some time. You would probably agree that PDB is no better for verification purposes than PDBsum, and that link to PDB is not good for verification purposes (one should refer to publications in books and journals).Biophys (talk) 16:52, 12 April 2008 (UTC)
P.S. Of course I am not telling such stupid thing that PDBsum was intended to replace the PDB! We only discuss which link is better for WP purposes.Biophys (talk) 16:57, 12 April 2008 (UTC)
More comments:
  • Specialists (like you or me) will never download protein structures by reaching them through WP articles. You might be surprised. Occasionally I might run into an WP protein article that interests me and I will want to down load the structure, if for no other reason than to produce a PyMol figure to add back to the article. Of course, I have a half dozen other ways to download the coordinates, but if I have a direct link right there right in front of me without having to double click through the PDBSum site, it is more convenient.
  • A common wikipedia reader only needs is a good portal to other internet resources and good graphics that shows various features of the proteins, such as subunit structure, secondary structure, ligands etc You are making assumptions about what a reader might be interested in which might not apply in all cases. Furthermore, the PDB provides much the same information as PDBSum.
  • That is where PDBsum much superior. I agree that PDBSum has a nice layout, but I think you are overstating the superiority.
  • What user would prefer link to PDB rather than to PDBsum from WP articles? Oh, I don't know, me perhaps? ;-)
  • PDBsum provides quick link to standard PDB (and many other resources), whereas standard PDB does not provides link to PDBsum (but the PDB does provide links to many of the same third party sources). So, the PDB is only one click away of PDBsum, but not vice versa. As a derivative database, it is natural that PDBsum would link to PDB. I agree that it would be nice if the PDB provided a link back to the PDBsum.
  • Finally let's not discuss "verifiability" Verifiability is the single most important issue, and on this point, there is no contest. As the original definitive source of the coordinates, structure factors, experimental details, the PDB is a far superior to PDBsum. The only verifiability that the PDBsum has is its links to the original literature plus the PDB. Please note that structures often appear in the PDB BEFORE the corresponding paper is published in the scientific literature. In addition, the paper invariably references the coordinates deposited in the PDB. Hence for all practical purposes, the PDB becomes an inseparable part of the paper. Therefore to provide a complete citation, one not only needs to cite the scientific literature, but also the coordinates deposited in the PDB.
  • You would probably agree that PDB is no better for verification purposes than PDBsum. On this point, I strongly disagree.
Cheers Boghog2 (talk) 19:03, 12 April 2008 (UTC)

[edit] Which PDB link is better for wikipedia? Take two.

I guess Boghog2 made the following arguments.

Argument 1. The original PDB link (let's call it RCSB-PDB) is better because this is a primary deposition site of data, a kind of "primary source".
Reply. No, we have to provide best "secondary" sources per WP:Verifiability.
Argument 2. RCSB-PDB is better for downloading the 3D coordinates of proteins.
Reply. Yes, this is true. However, RCSB-PDB is only one click away of PDBsum (the link is there). Moreover, one would normally use searches in RCSB-PDB or more reliable scientific databases (such as SMART) rather than wikipedia to identify and download the PDB files of interest.
Argument 3. PDBsum has no significant advantages over RCSB-PDB.
Reply. No, that is not the case. This is main point and should be explained in detail (see 2axt as an example):
1. PDBsum has links to 22 (!) other biological databases, including RCSB-PDB. RCSB-PDB has only links to 3 other databases (PFAM, SCOP and CATH).
2. PDBsum usually includes an Abstract of key crystallography paper with key Figures. RCSB-PDB does not.
3. PDBsum has link to each subunit (row at the left) - with the picture of an individual subunit, its topology diagram, amino acid sequence and regular secondary structure (please click further to "sequence" motifs, and so on). RCSB-PDB has nothing of that.
4. PDBsum has diagrams of ligand binding pockets. PDB-RCSB has none of that.
5. PDBsum includes pictures with secondary structure and their topologies generated by PROMOTIF. RCSB-PDB has nothing.
6. PDBsum has pictures with chemical reactions for enzymes and links to enzyme databases. PDB-RCSB has nothing.
I could continue this list, but the overall picture is clear. Actually, this is a perfect example why a secondary source (PDBsum) is better than a primary source for WP. It provides references/links to numerous different primary sources, whereas the primary source does not.Biophys (talk) 13:05, 13 April 2008 (UTC)

[edit] Which PDB link is better for wikipedia? Take three.

Argument 1. The original PDB link (let's call it RCSB-PDB) is better because this is a primary deposition site of data, a kind of "primary source".
Reply. No, we have to provide best "secondary" sources per WP:Verifiability.
Reply to Reply. The original scientific publication can be considered as both a primary source (since it was written by the researchers who collected and interpreted the data) and as a secondary source (since it was peer reviewed by two anonymous referees and the journal editor). The RCSB-PDB can also be considered as both a primary source (since the coordinates, structure factors, and experimental details are stored there) as well as a secondary source (since the deposited structures must undergo a quality check before they are released). PBDSum provides no additional quality checks above and beyond what the wwPDB has provided, therefore PDBSum could at most be considered a tertiary source. Adding additional links helps to interpret the experimental structure, but does nothing to verify it (e.g., does the structure adequate fit the electron density, are there any severe geometrical outliers, etc.).
Argument 2. RCSB-PDB is better for downloading the 3D coordinates of proteins.
Reply. Yes, this is true. However, RCSB-PDB is only one click away of PDBsum (the link is there). Moreover, one would normally use searches in RCSB-PDB or more reliable scientific databases (such as SMART) rather than wikipedia to identify and download the PDB files of interest.
Reply to Reply. I would agree that this is not a critical issue.
Argument 3. PDBsum has no significant advantages over RCSB-PDB.
Reply. No, that is not the case. This is main point and should be explained in detail (see 2axt as an example):
Reply to Reply. That is a misinterpretation of what I wrote above. I think it is important to distinguish between verification of the experimental structure which RCSB-PDB provides and interpretation of the structure which both the RCSB-PDB and PDBSum provide. I would agree that the PDBSum provides more links and therefore is better for interpreting the structure within a biological context, but the RCSB-PDB provides the essential minimum (PubMed citation + PFAM, SCOP and CATH). What the RCSB-PDB provides which the PDBSum does not is the hard data and experimental details which is essential for judging the quality and relevance of the structure.

Cheers. Boghog2 (talk) 21:20, 13 April 2008 (UTC)

So, you do not dispute my last and the only important argument 3: PDBsum provides seven times more links than RCSB-PDB and a lot of additional graphical and other capabilities that RCSB-PDB does not? Good. Then we can agree at least on something. But you tell that I am missing the following point of yours: "RCSB-PDB provides which the PDBSum does not is the hard data and experimental details which is essential for judging the quality and relevance of the structure". What that suppose to mean? Both databases give an easy link to the same PDB file header which describes experimental conditions (it looks a little bit nicer in the PDBsum); both databases give the same R-factor, resolution, R-free, and so on (it seems that PDBsum is missing only the crystallographic group of symmetry). So, all experimental details are practically the same (there is a database of protein crystallization conditions [3], but we are not talking about it). As about "hard data" - they are available by clicking "RSCB" or "MDB" links in the PDBsum menu. So, there are no important differences here.Biophys (talk) 01:28, 14 April 2008 (UTC)
Point #1 (verifiability) takes precedence over point # 3 (adding additional understanding) and you have not responded to that issue. While the crystallographic group symmetry and crystallization conditions are included in the header information (and therefore accessible from both databases), other data like structure factors is not. Furthermore these experimental details are more throughly reported in the "Materials and Methods Report" tab in the RCSB-PDB. The RCSB-PDB also provides much more extensive searching and reporting capabilities including the ability to include experimental details as optional search parameters. This is important for example in locating the best structure(s) (highest resolution, crystallization pH close to physiological) to include in links from Wikipedia. One last point: there is a link from the RCSB-PDB structure entry back to the corresponding PDBSum entry under the external links section. So for interpretation of the structure, they are available from the RCSB-PDB by clicking back to the PDBSum. "So, there are no important differences here." Cheers. Boghog2 (talk) 06:45, 14 April 2008 (UTC)
Sorry, but your point #1 (verifiability) is groundless because we are talking about different links to the same coordinates (3D structure) and the same data. Same data have identical "verifiability". But you have made an additional argument that RCSB-PDB has a better search facility. However this is irrelevant. Templates we are talking about make a reference to a single and defined PDB file. So, the only important question is what this specific link provides. And in that regard the PDBsum link provides much more information and options. One does not need any search facilities, since the proper link has been already provided in an WP article.Biophys (talk) 14:50, 14 April 2008 (UTC)
RCSB-PDB includes data (e.g., 3D coordinates and structure factors) that are not stored in PDBSum database. To the extent that the PDBSum tertiary source links back to the primary/secondary source of the data (i.e., RCSB-PDB), both databases contain the same data. But why not link directly to the primary/secondary source of the data, the RCSB-PDB rather than a tertiary source like PDBSum? Boghog2 (talk) 20:46, 14 April 2008 (UTC)
With regard to your last argument. Where did you see a link from PDB entries to PDBsum? I do not see any. If there was such link, I would not argue that much. The problem is: a reader goes to the standard PDB link, and he does not know anything about the existence of PDBsum and other protein structure databases. However if he follows PDBsum link (as I suggest), he will see the links to RCSB-PDB and 21 other protein structure/sequence databases. As you can see from the discssion above, many people (like Dan) do not know much about the Protein structure databases. But if they follow PDBsum link, they will see a lot of them. If they follow RCSB-PDB, they will see only three of them.Biophys (talk) 15:59, 14 April 2008 (UTC)
Now I see... I should go from a RCSB-PDB entry (say 1gzm) to "External links" (this link does not work with mozilla on my SGI computer!), then find PDBsum at the bottom of a very long page, and click PDBsum. No WP reader will ever do that. In PDBsum however, all links to other databases (and a lot of other options) are immediately vizible.Biophys (talk) 16:28, 14 April 2008 (UTC)
So the whole argument now boils down to whose cross link is most accessible? Boghog2 (talk) 20:46, 14 April 2008 (UTC) PS: I have fond memories of SGI. I hope I don't start any OS religious wars, but in some ways, I still prefer the IRIX GUI to Linux.

No, SGI has nothing to do with it. Everything boils down to a very simple thing (see Argument#3, "Take two"). A WP reader will see easy links to 22 protein structure/sequence databases and a number of additional graphic representations (!) from the PDBsum link, but he will see only 3 immediate links to other databases from a PDB entry and nothing else. To clarify, let's consider a simple practical example. I want to see a Uniprot annotation of subunit C from PDB file 2axt. I can do this by a single click from the PDBsum link. However from an RCSB-PDB links, I should go first to "External links" (I have learned this from you only 30 minutes ago), then find PDBsum link at the bottom of a page, and so on. Only a WP reader who is more advanced than me (perhaps you) will ever do that. So, I still believe that PDBsum link is better. But let's simply keep all templates as they are right now, since we can not agree on that. Dan changed it to RCSB-PDB, so consider that your position prevailed. I do not care about WP users so much to continue this discussion. Thanks. No any hard feelings about that. Biophys (talk) 21:37, 14 April 2008 (UTC)