<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>My Personal Science Nerd &#187; search</title>
	<atom:link href="http://mypersonalsciencenerd.com/tag/search/feed/" rel="self" type="application/rss+xml" />
	<link>http://mypersonalsciencenerd.com</link>
	<description></description>
	<lastBuildDate>Thu, 24 Jun 2010 04:13:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>&#8220;BLAST Off!!&#8221; : How to use BLAST for running a protein search</title>
		<link>http://mypersonalsciencenerd.com/overallblog/bio1/blast-off-how-to-use-blast-for-running-a-protein-search/</link>
		<comments>http://mypersonalsciencenerd.com/overallblog/bio1/blast-off-how-to-use-blast-for-running-a-protein-search/#comments</comments>
		<pubDate>Mon, 02 Nov 2009 15:05:09 +0000</pubDate>
		<dc:creator>ElersonGL</dc:creator>
				<category><![CDATA[Biology 101]]></category>
		<category><![CDATA[BLAST]]></category>
		<category><![CDATA[genetics]]></category>
		<category><![CDATA[nucleotide]]></category>
		<category><![CDATA[protein]]></category>
		<category><![CDATA[search]]></category>

		<guid isPermaLink="false">http://mypersonalsciencenerd.com/?p=488</guid>
		<description><![CDATA[Need to do a BLAST search FOR PROTEINS for your bio class or project but don't really know how? Check this out!]]></description>
			<content:encoded><![CDATA[<p>Ok. Since I&#8217;m big on crediting original authors to their work, I have to  begin by saying that I took the &#8220;BLAST Off!!&#8221; title from a former lab partner named Chloe&#8217; Smith. If she reads this and doesn&#8217;t see her name, she&#8217;ll annihilate me.  So there it is!</p>
<p><img class="aligncenter size-large wp-image-496" title="Apollo 15 Launch" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/480px-Apollo_15_launch-365x457.jpg" alt="Apollo 15 Launch" width="428" height="536" /></p>
<p>But now, on to brass tax. If you&#8217;re reading this, it&#8217;s because you either want or need to learn to use <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi" target="_blank">BLAST</a>, the genetic Google hosted by the National Institutes of Health. <strong>This article is going to focus on searching if you already have an amino acid sequence, NOT A NUCLEOTIDE SEQUENCE. </strong>So, without further adieu, let&#8217;s go!</p>
<h2>Step 1: Verification</h2>
<p>So, the first thing you&#8217;ll need to do is to verify that you have an amino acid sequence rather than a nucleotide sequence. This is probably going to be the easiest part for those of you who are just getting started with BLAST. Your sequence is going to be a text file, just like any other research paper that you&#8217;ve ever written or read. But it&#8217;s going to be a bunch of unreadable letters. You can tell the difference between an amino acid sequence and a nucleotide sequence by just glancing at the letters you see.</p>
<h3>Nucleotide</h3>
<p><tt>ATTGCGTTCGAGTCACTATGTATGGCCTCCACGGTAGGTTGAGCAGTACC<br />
TGGCGGTATGACCACCTCCTCAGCGACGATGCTTATGGAGGCGCTGGACA<br />
AGCGTTGACCCAGAGCTTTGGTCCCCAGAGCAAGAAGACCACTGGCCCGA<br />
CACAAGAACACTTCCTCCTTTCCATTAGGGTTCGAGAATAAAGCTATCAG<br />
CTGAGTCAATGCATTGCCACTTTTGAGTCCTCAAGCTAGATAAGTCTCCC<br />
TTTTAAGAAACGCACGAGTACGCCTCTCTAGCGGTTTCTCATCGGACAGC<br />
TCCTACGAAAGCGATCTTTATCGGGATCCACCGACTGTCGGCCTACAAGG<br />
TGGGCCTTTTTGGACCACCCCGAGTAGATCGGCGACCTTTCTTTGTATGC<br />
CAATTCATGAGTAACCTGAGCAGATTGAATGTACACGCAAAATGTCGATC<br />
TAAGTGTCCCGTCCAAGAAGAATTTTTTCTTACTACCCCAGCCTGGTTTA</tt></p>
<p>Notice how this is all A, T, G, and C? It&#8217;s a nucleotide sequence of a nucleic acid (which is made up of A, T, G, and C&#8230; so it&#8217;s DNA. But you already knew that.) (&#8230;even if you didn&#8217;t realize that you knew it&#8230;)</p>
<p>This isn&#8217;t what we&#8217;re looking for this time. We want a protein (amino acid) sequence.</p>
<h3>Amino Acid</h3>
<pre>MDPHNPIVLDQGTGFVKIGRAGENFPDYTFPSIVGRPILRAEERASVATPLKDIMIGDEA
SEVRSYLQISYPMENGIIKNWTDMELLWDYAFFEQMKLPSTSNGKILLTEPPMNPLKNRE
KMCEVMFEKYDFGGVYVAIQAVLALYAQGLSSGVVVDSGDGVTHIVPVYESVVLSHLTRR
LDVAGRDVTRHLIDLLSRRGYAFNRTADFETVRQIKEKLCYVSYDLDLDTKLARETTALV
ESYELPDGRTIKVGQERFEAPECLFQPGLVDVEQPGVGELLFNTVQSADVDIRSSLYKAI
VLSGGSSMYPGLPSRLEKELKQLWFSRVLHNDPSRLDKFKVRIEDPPRRKHMVFIGGAVL
ASIMADKDHMWLSKQEWQESGPSAMTKFGPR*</pre>
<p>BINGO! Since this one is all random letters (instead of A, T, G, and C), we can assume pretty safely that this is an amino acid sequence.</p>
<p>Sidenote: I know I said it was all random letters, but all the letters actually represent certain amino acids. Just FYI.</p>
<p>Just for the sake of science, I&#8217;m going to use this as my example for the rest of the article.</p>
<h2>Step 2: Choosing the Correct Search</h2>
<p>At the <a href="http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&amp;PAGE_TYPE=BlastHome" target="_blank">BLAST homepage</a>, you&#8217;ll notice that under &#8220;Basic Blast,&#8221; there are five different&#8230;. ummm&#8230; things.</p>
<p><img class="aligncenter size-large wp-image-497" title="home page" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/blasthomepageproteinquery-600x411.png" alt="home page" width="600" height="411" /></p>
<p>Each of these things are types of BLAST search that you can run. I&#8221;ll try to write a post on the differences between them all, but for now, just know that we&#8217;re going to use &#8220;protein blast&#8221; because we have a protein query (a list of amino acids) and we want to find out what protein it probably is (by searching protein databases).</p>
<p>So, click on &#8220;protein blast&#8221; and we&#8217;ll move on.</p>
<h2>Step 3: The Search Interface</h2>
<p>If you&#8217;ve found this site (MPSN), then you&#8217;re probably VERY familiar with the one pictured below.</p>
<p><img class="aligncenter size-large wp-image-498" title="google" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/google-600x200.png" alt="google" width="600" height="200" /></p>
<p>The screen that you use for the BLAST search isn&#8217;t EXACTLY the same, but there are enough similarities that we can compare them. In fact, the basic principles are ALL the same.</p>
<p>Do you see where it says &#8220;Enter accession number, gi, or FASTA sequence&#8221; ? This is where you&#8217;re going to paste the amino acid sequence.</p>
<p><img class="aligncenter size-full wp-image-499" title="Search Query" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/Search-Query.png" alt="Search Query" width="574" height="306" /></p>
<p>In the entry field that says &#8220;Job Title,&#8221; you can come up with something to title your search if you&#8217;d like. But it won&#8217;t ruin your search if you leave it blank. I usually leave it blank.</p>
<p>&#8220;Query Subrange&#8221; and &#8220;Upload File&#8221; are other tools that may someday help you in a large BLAST search. But as for now, they&#8217;re pretty much useless to you.</p>
<p>Moving on down the page, we find that the next significant plaything is the database chooser-thingy. If you click on it, you&#8217;ll see a dropdown menu with a cornucopia of different databases you can choose. <strong>Don&#8217;t let this discourage you.</strong> The only one that&#8217;s going to help you most of the time is &#8220;Non-redundant protein sequences.&#8221; Although there isn&#8217;t a single database that holds all of the info at NIH, the non-redundant option holds &#8220;All non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRF.&#8221; If you don&#8217;t know exactly what that means, that&#8217;s ok. Just realize that it means that non-redundant is the choice you want to make.</p>
<p><img class="aligncenter size-full wp-image-500" title="Databases" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/databaseblast.png" alt="Databases" width="378" height="224" /></p>
<p>If you&#8217;re dealing with a teacher who has told you what organism you&#8217;re studying, you can put the <strong>scientific name</strong> of the organism in the input marked &#8220;Organism Optional.&#8221; Although that may seem like common sense to you, I can&#8217;t tell you how many times I&#8217;ve seen people ask for help concerning that input.</p>
<p>ALMOST DONE HERE&#8230;</p>
<p>Now, all you need to do is make sure that the &#8220;blastp&#8221; is marked in the &#8220;Algorithm&#8221; box, and then you can click BLAST at the bottom of the screen. You should be redirected to a page that includes this:</p>
<p><img class="aligncenter size-large wp-image-501" title="waiting" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/blastwaiting-600x80.png" alt="waiting" width="600" height="80" /></p>
<h2>Step 4: Interpreting Results</h2>
<p>You did it! You conducted a BLAST search like a pro, but now what? Well, let&#8217;s start with that funky-lookin&#8217; graph in the middle of the page.</p>
<p><img class="aligncenter size-full wp-image-504" title="alignment scores" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/alignment-scores.png" alt="alignment scores" width="622" height="259" /></p>
<p>This graph shows you different results that matched your search sequence. We gave the BLAST program a sequence and it compared our sequence to a few hundred thousand others (or more) and spit out the results that it found in the form of this graph.</p>
<p>You&#8217;ll notice that at the top, there are five colors: black, blue, green, pink, and red. There are also numbers that go with the colors. A common mistake is to think that these numbers represent the amount of amino acids that were matched in the result. For instance, if we had searched &#8220;FJLDKJ&#8221; and the BLAST had returned with a result of &#8220;FJDDKJ,&#8221; then that would give a score of 5 rather than 6. That&#8217;s not quite right&#8230;</p>
<p>The numbers you see are scores. They DO compare the amount that matched between the search and the result, but the don&#8217;t do that by counting the individual number. Think about it like a chapter exam in class: the number of questions isn&#8217;t always 100, but the score is always out of 100. This means that the teacher is &#8220;standardizing&#8221; the scores so that they all compare to each other on a scale of 100.</p>
<p>The BLAST system does basically the same thing, although the score is out of somewhere around 200. I&#8217;m not exactly sure what the maximum possible value is, but a 200 in blast is like an A on an exam. By that logic, I&#8217;m sure you can figure out that, in your blast searches, you want to use the results in the pink and the red.</p>
<p><img class="aligncenter size-large wp-image-505" title="description" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/description-600x370.png" alt="description" width="600" height="370" /></p>
<p>Moving down the page again, you&#8217;ll see a section called &#8220;Descriptions.&#8221; This is the first real encounter with your results. What you see here is really where you can get a lot of information. For instance, I&#8217;ll bet that you didn&#8217;t know that the organism that our protein was made in is Saccharomyces Cerevisiae (baker&#8217;s yeast).</p>
<p>HOW DID I KNOW THAT? Just look at the results. The highest score AND the second highest score results both come from baker&#8217;s yeast. (You can click the little blue link on the left side in order to go to the pubMed article where the results are found. Clicking on that link sent me to an article that showed me not only the organism, but also that this protein is the result of the <em>ARP2P</em> gene.</p>
<p>OH! And I almost forgot to explain the E-Value!</p>
<p><img class="aligncenter size-full wp-image-506" title="evalue" src="http://mypersonalsciencenerd.com/wp-content/uploads/2009/11/evalue.png" alt="evalue" width="195" height="282" /></p>
<p>.</p>
<p>The E-Value is a fairly simple concept, once you get the hang of it. For those of you that are familiar with statistics, it&#8217;s really similar to the standard deviation. For the rest of you, allow me to explain: the E-Value is the probability (the percent chance) of there being a better result than this one. The smaller the E-Value, the smaller the chance of there being a better result. Therefore, a GOOD BLAST RESULT is one that has a HIGH SCORE and a LOW E-VALUE.</p>
<p>I really hope that this helps somebody out there.</p>
<p>Best of Luck,</p>
<p>Grey</p>
]]></content:encoded>
			<wfw:commentRss>http://mypersonalsciencenerd.com/overallblog/bio1/blast-off-how-to-use-blast-for-running-a-protein-search/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
