Template talk:GNF Protein box

From Wikipedia, the free encyclopedia

Hi Andrew, I made some changes both stylistic and major changes with respect to PDB. I'm not actually sure what you had in mind for the PDB_supplemental field? David D. (Talk) 03:30, 31 March 2007 (UTC)

David, all the changes you made look fantastic. Thanks for all your hard work. As for the the PDB_supplemental field, I'm not sure what's that for either. I think I took that from Template:Protein (which I used as a starting point for my modifications), and added it to the "usage" information when I noticed that it wasn't there. I guess it was used to identify additional PDB entries (beyond the one used to create the image). For example, see Clostridium_perfringens_alpha_toxin. I could see how this might be useful... Perhaps I'll add a couple secondary PDB IDs to ITK (gene) so we can play around with formatting... AndrewGNF 17:59, 2 April 2007 (UTC)
Okay, I added two PDB_supplemental entries. How do you think they should be displayed? I like how you moved the main PDB caption up to right under the image. Do you think we should essentially duplicate that entry on a "PDB" line below, together with any PDB_supplemental entries (like in the Clostridium_perfringens_alpha_toxin example)? This would also address the case where a PDB entry was specified but no image was available. If you think this is a reasonable idea, then I was trying to find the syntax that would allow you to say essentially {#or: {#ifexists:{{{PDB|}}} } | {#ifexists:{{{PDB_supplementary}}} } | [http://www.pdb.org/pdb/cgi/explore.cgi?pdbId={{{PDB}}} {{{PDB}}}], {{{PDB_supplementary}}} }, or something like that. But I can't find an "or" operator... AndrewGNF 18:30, 2 April 2007 (UTC)
First, try deleting the image in your template and see what happens. I coded it to say "Image can be generated using PDB file PDB#" when there is no image but a PDB file available. From what you wrote above I think you did not notice this. David D. (Talk) 18:42, 2 April 2007 (UTC)
It makes sense to use the supplemental field for alternative PDB files but where to show them is another issue. The reason i moved the PDB file to the top was to try and decrease the number of data lines in the template. Another possible solution is to have them below the image too? In the meantime, we should probably change the name of the field PDB supplementary to "PDB additional" or "PDB more" so it is less cryptic. David D. (Talk) 18:55, 2 April 2007 (UTC)
Right, sorry, I did miss that check for PDBs without images. Looks great. I've changed the template from the ambiguously named "PDB_supplemental" to "PDB_additional" as you suggested. As for decreasing the number of data lines in the template, I was planning on looking at consolidating data rows as a way of removing dead space from the template. One area I've been looking at is the area around the identifiers (MGI, OMIM). Those column widths are far too large because they have to be shared with the function fields. Anyway, I was thinking of something like this (apologies for the exceptionally crude mock up):
External IDs OMIM: 186973, MGI: 23525, Homologene: 4051
(Finally, in response to your previous edit summary comment, the cellpadding parameter was also inherited from Template:Protein, so I'm fine with that being modified. ) AndrewGNF
That might work, we'll need to see it in the table to see how compact it looks. Note i just edited to aversion of the template that has all the PDB data in the image box area. I did not use or statement but nested if statements. It gives the same result. You'll have to play around to see the different permutations. image , no image, additional PDB's or not too. Thinking about this more, we may want to make it more clear that if an image is presented then the PDB field should be the same PDB number as used to create the image. Feel free to revert any of those changes back. by the way the style and code can both be improved but i want to get a feel for the best location for these number before worrying about style. David D. (Talk) 20:45, 2 April 2007 (UTC)

okay, looks great... I undid one of the nested "if"s so that it doesn't check the PDB_additional paramter -- it just prints it regardless (and it presumably is blank if the parameter isn't specified). The only compromise with that is that the tag reads "Structure file(s)" instead of knowing whether it's a singular or a plural. Anyway, I think it's a net win for code readability, but I'm not averse to the more precise version either. But overall, I'm a convert. I like having the PDB and image sections comingled with intelligent logic on how to display when the image is and isn't present. (Oh, I also changed your template to use the Template:PDB2 so that the PDB URL is consistent.)

I added a note to the "usage" section of Template:GNF_Protein_box describing the preferred usage of the PDB parameter when the image is specified. I'll have a look later at the making the condensed "External IDs" view above and we can see how that looks... AndrewGNF 21:17, 2 April 2007 (UTC)

I like those changes, I'll see if i can think of a simple fix for the file(s) issue but I do agree that it is as important to keep the code simple too. I knew what I had written was a mess ;) David D. (Talk) 21:51, 2 April 2007 (UTC)
Ugh, just realized there are small annoying issues still remaining. I fixed one issue (when no image OR PDB is present) but the ambiguity of singular/plural also leads to a trailing comma when only the PDB (and not PDB_additional) is described.
But then it occurred to me that we're maybe being too respectful of the structure we inherited. What if we created a new field called "image_source"? Instead of "From PDB file {{PDB2|{{{PDB}}}}}", we could just do "Image source: {{{image_source}}}", and allow each protein page to decide how to display image source (probably typically using the PDB2 template). Then we could repurpose the "PDB" parameter to simply have all PDB IDs and list them as "Availabe structures: {{{PDB}}}" under the image (if available). And then we can delete the PDB_additional parameter. What do you think? The only downside I can see is that often the image_source structure will be included in the list of PDB entries, but I think this level of redundant content is okay. I can try to mock up these changes later... AndrewGNF 22:20, 2 April 2007 (UTC)

Okay, I think I've made enough changes for today. I've taken the individual lines for OMIM, MGI, and Homologene and condensed them into a single "External IDs" line. I've moved Uniprot to the ortholog table. I've changed the usage of the PDB parameter, added the image_source parameter, and deleted the PDB_additional parameter. And I've also slightly changed how the Refseq data is rendered in the Ortholog table (again to consolidate a bit). (Oh, it's worth noting that I've used a particularly odd hack I think for the "External IDs" data line which involves a string concatenation template I created in my user name space. If there's a more standard way to do this, I'd be more than happy to change that...) Finally, I've changed the cell padding to 2. I think that parameter is quickly multiplied to make the table taller (requiring more scrolling), but if you think readability is better with the larger padding, I'm happy to go back to 5... AndrewGNF 00:58, 3 April 2007 (UTC)

I like the fixes, especially using the PDB image_source field as a solution. I don't think the redundancy issue is a problem. I added some no break spaces to the external links so the numbers to not get separated from the descriptor when they wrap around at the end of the cell. David D. (Talk) 20:29, 3 April 2007 (UTC)

I liked all the latest changes except for the change to "Mouse Orthologs" (rather than simply "Orthologs"), which I've changed back. My rationale is that "orthologs" is nondirectional, meaning that putting "Human Orthologs" would have been just as valid, depending on which species you do most of your research in. Also, we should leave open the possibility of adding additional species (though layout would be a definite issue...) okay with you David? AndrewGNF 17:22, 4 April 2007 (UTC)

Absolutely, I was expecting you to come back with a new version. I was thinking mouse orthologs since it seemed your project was coming from a more human perspective. i much prefer it to be more of a gene perspective. I was thrown off by the symatlas link I removed that I had thought was linking to human only but now I realise you had intended that as a link for a more global comparison of the genes between organisms. See next section too for other comments on that link. David D. (Talk) 03:27, 5 April 2007 (UTC)

Contents

[edit] link to all symatlas

I added a species-independent link to SymAtlas in the ortholog table. In theory, clicking each of those probe sets spawns a different query, and a user may want to view expression pattern of all probe sets at once. After adding the link, I realized in practice that all the links result in the same output. But in the next couple months, this will not be the case... AndrewGNF 18:42, 4 April 2007 (UTC)

This makes sense. In that case it should probably be in the main table rather than a small sublink. I'll do a change to test this out. David D. (Talk) 03:23, 5 April 2007 (UTC)
As usual, these changes look great to me. AndrewGNF 18:46, 5 April 2007 (UTC)

[edit] And finally

I just made a few more tweeks that might make this table more user friendly. As usual please feel fre to revert any of these changes. Looks like we're done? :) David D. (Talk) 05:00, 6 April 2007 (UTC)

Looks great. No objections at all. Thanks for lending your expertise to this project and teaching me a few wikitext tricks. I'll post later today to the bot approval group so we can start developing the bot itself in ITK's model! Cheers, AndrewGNF 15:58, 6 April 2007 (UTC)
The only outstanding issue is, and it's a real pain, the table will look different on various browsers. I have been working on Safari and MAC based OS. I know from experience things can look very different on other browsers and on other systems. You may want to check and see there is no glaring incompatability when viewed on other computers and check the various commonly used browsers such as IE and firefox. David D. (Talk) 17:19, 6 April 2007 (UTC)
Ehhh, those pesky Mac folks can figure out how the formatting issues for themselves...  ;) No, you raise a good point, I will have a look at that. (and if anyone else if following this thread, please post your experience for lesser-used OS/browser combinations.) I commonly use Firefox and IE, and both of those look great... AndrewGNF 20:00, 6 April 2007 (UTC)

[edit] summary and GeneAtlas image

Two changes I made this morning. First, the white space in the middle seemed so vast I wanted to try to fill it in with something (hopefully productive). So I added the "Summary" from Entrez Gene. (NCBI's usage info says that this, and most things at NCBI, is in the public domain.)

Second change I think might be open for a bit more debate, if only because of the possible appearance of it being somewhat self serving. I added an image of the "Gene Atlas" expression pattern, which is a data set that I was involved in generating. Basically, it is the result of microarray analysis across a diverse sampling of anatomic tissues. The thought is that even if you have a completely uncharacterized gene on the sequence level, knowing where is is expressed in the human or mouse anatomy may give important clues about the gene's function. Anyway, we at GNF have found this to be an incredibly useful tool, and users of our website have also found it useful. I'm open to discussion as to whether this would be useful for the Wikipedia community... (Looking at the modifications a bit more, I can even see moving the "compare all orthologs" link up to this new section, and possibly even deleting the probe-set specific links in the ortholog table. I'll try that out now...) AndrewGNF 18:37, 9 April 2007 (UTC)

Both changes look great. As far as the picture, is concerned is it possible to upload a version without the tissue types at the bottom? that text is useless since it cannot be read in the smaller version. The histogram is useful though since it give a general impression of tissue specificity at a glance. It might be useful to have legiable (wrt the small version) values on the y axis so that an estimate of expression can also be garnered with a quick look. Obviously the picture suitable for this thumbnail will will not be so useful in close up but i think a better idea might be for the picture to be clickable and go directly to your own data base. In that way only the latest information will be available. Obviously you would have to program your database to graph that data differently, to a more user friendly thumbnail format, as well as having the current detailed version.
Now the question of is this self serving? The short answer is yes and i'm not sure how the rest of the community will respond to it. Are there equivalent expression databases out there? You could have the picture link to your database and the lower link to an alternative source? Possibly HGP has something? David D. (Talk) 19:28, 9 April 2007 (UTC)
Regarding the picture formatting, yes, all the suggestions you made are possible. Although I think the way it is now is okay. Agreed the axes are not legible (though there is a color code that people could eventually get familiar with), but my natural instinct would be to click on the thumbnail image which brings it up full-sized (where the axis labels are legible). Another consideration is that I think the option you propose linking the thumbnail directly to our database is possibly slightly controversial? The ImageMap extension (discussion) seems to indicate that this is still buggy, and its use at wikipedia is debated. Anyway, I'll certainly defer to your (and others') expertise here on WP best-practices...
Regarding other databases, certainly there are other sites that display gene expression data. Some large data sets track gene expression across cancer samples or other disease states, and then there are thousands of small data sets that look at response to various perturbations. But if we had to choose one and only one gene expression data set to display, I think the most convincing argument could be made for the Gene Atlas. Knowledge of where a gene is expressed is very broadly useful, evidenced by the "multiple tissue northern blots" in just about every paper describing the initial characterization of a gene. And to back up the claim that the data are useful (and at the risk of further self-promoting), our website gets ~40,000 hits per week primarily looking at Gene Atlas data. Certainly not NCBI or Ensembl-like traffic, but not bad relative to the size of the community.
(And finally, I should clarify that neither GNF nor me personally makes any money from additional eyes to our website. I used "self-serving" only in the sense of the increased fame and renown that this database brings, tongue firmly in cheek...)  ;) AndrewGNF 20:20, 9 April 2007 (UTC)
Of course it's not me you have to convince, but its always good to have some alternative looks for those that are the "knee jerk, anti exploitation of wikipedia by companies who are trying to get free advertising, and its just not right you know, get out of our house, you're banned" type of editor. OK i exaggerate ;) David D. (Talk) 21:36, 9 April 2007 (UTC)
Got it, point well taken. I haven't had the pleasure yet, but we'll see how it goes...  ;) AndrewGNF 21:43, 9 April 2007 (UTC)

Andrew, check out what i did with the thumb version of the expression pattern. First, removed all text illegible text and made expression levels nmore legible (increase font size). Second, note that this thumb version redirects to the good quality figure, so those who want more details have it one click away. I think this alteration is worth it to make the infobox more user friendly, especially if this modification for output is relatively easy from your end? David D. (Talk) 17:33, 30 April 2007 (UTC)

Cool, didn't know that was possible. But I definitely like the logic and think it improves the user interface. I don't think it will be too difficult to create two versions of the images. Thanks... As an aside, we've just identified a student to take on this project so we'll see some movement soon on creating the 10 example pages... AndrewGNF 17:04, 1 May 2007 (UTC)

[edit] Color scheme

Any specific reason we need to make the color scheme different from {{protein}}? If people like it, any reason we shouldn't we use it too? AndrewGNF (talk) 01:29, 21 December 2007 (UTC)

Well, I think it will make it more apparent to users that what they are looking at is a human gene. If somebody decided to add Drosophila, C. elegans or Arabidopsis genes in the future, wouldn't it be nice if they were different colors too? Color-coding is simple and useful. I know that most proteins are beige, but really, the templates are too similar already. AnteaterZot (talk) 02:13, 21 December 2007 (UTC)
The latest version you proposed, seems fine to me, though I'm not particularly picky... AndrewGNF (talk) 02:15, 21 December 2007 (UTC)
I'd recommend restoring the beige. --Arcadian (talk) 06:42, 21 December 2007 (UTC)
Why? And you didn't like any of the other choices? Could you propose another color change to help distinguish these infoboxes? AnteaterZot (talk) 09:22, 21 December 2007 (UTC)
I don't think they should be distinguished, because they are so similar in scope and goal. --Arcadian (talk) 20:30, 23 December 2007 (UTC)
Scope; proteins in all taxa vs proteins in H. sapiens. Goal; important proteins, protein domains and families vs complete list of proteins and even subunits. Harm from different color; none. AnteaterZot (talk) 02:48, 24 December 2007 (UTC)