“BLAST Off!!” : How to use BLAST for running a protein search

Ok. Since I’m big on crediting original authors to their work, I have to  begin by saying that I took the “BLAST Off!!” title from a former lab partner named Chloe’ Smith. If she reads this and doesn’t see her name, she’ll annihilate me.  So there it is!

Apollo 15 Launch

But now, on to brass tax. If you’re reading this, it’s because you either want or need to learn to use BLAST, the genetic Google hosted by the National Institutes of Health. This article is going to focus on searching if you already have an amino acid sequence, NOT A NUCLEOTIDE SEQUENCE. So, without further adieu, let’s go!

Step 1: Verification

So, the first thing you’ll need to do is to verify that you have an amino acid sequence rather than a nucleotide sequence. This is probably going to be the easiest part for those of you who are just getting started with BLAST. Your sequence is going to be a text file, just like any other research paper that you’ve ever written or read. But it’s going to be a bunch of unreadable letters. You can tell the difference between an amino acid sequence and a nucleotide sequence by just glancing at the letters you see.

Nucleotide

ATTGCGTTCGAGTCACTATGTATGGCCTCCACGGTAGGTTGAGCAGTACC
TGGCGGTATGACCACCTCCTCAGCGACGATGCTTATGGAGGCGCTGGACA
AGCGTTGACCCAGAGCTTTGGTCCCCAGAGCAAGAAGACCACTGGCCCGA
CACAAGAACACTTCCTCCTTTCCATTAGGGTTCGAGAATAAAGCTATCAG
CTGAGTCAATGCATTGCCACTTTTGAGTCCTCAAGCTAGATAAGTCTCCC
TTTTAAGAAACGCACGAGTACGCCTCTCTAGCGGTTTCTCATCGGACAGC
TCCTACGAAAGCGATCTTTATCGGGATCCACCGACTGTCGGCCTACAAGG
TGGGCCTTTTTGGACCACCCCGAGTAGATCGGCGACCTTTCTTTGTATGC
CAATTCATGAGTAACCTGAGCAGATTGAATGTACACGCAAAATGTCGATC
TAAGTGTCCCGTCCAAGAAGAATTTTTTCTTACTACCCCAGCCTGGTTTA

Notice how this is all A, T, G, and C? It’s a nucleotide sequence of a nucleic acid (which is made up of A, T, G, and C… so it’s DNA. But you already knew that.) (…even if you didn’t realize that you knew it…)

This isn’t what we’re looking for this time. We want a protein (amino acid) sequence.

Amino Acid

MDPHNPIVLDQGTGFVKIGRAGENFPDYTFPSIVGRPILRAEERASVATPLKDIMIGDEA
SEVRSYLQISYPMENGIIKNWTDMELLWDYAFFEQMKLPSTSNGKILLTEPPMNPLKNRE
KMCEVMFEKYDFGGVYVAIQAVLALYAQGLSSGVVVDSGDGVTHIVPVYESVVLSHLTRR
LDVAGRDVTRHLIDLLSRRGYAFNRTADFETVRQIKEKLCYVSYDLDLDTKLARETTALV
ESYELPDGRTIKVGQERFEAPECLFQPGLVDVEQPGVGELLFNTVQSADVDIRSSLYKAI
VLSGGSSMYPGLPSRLEKELKQLWFSRVLHNDPSRLDKFKVRIEDPPRRKHMVFIGGAVL
ASIMADKDHMWLSKQEWQESGPSAMTKFGPR*

BINGO! Since this one is all random letters (instead of A, T, G, and C), we can assume pretty safely that this is an amino acid sequence.

Sidenote: I know I said it was all random letters, but all the letters actually represent certain amino acids. Just FYI.

Just for the sake of science, I’m going to use this as my example for the rest of the article.

Step 2: Choosing the Correct Search

At the BLAST homepage, you’ll notice that under “Basic Blast,” there are five different…. ummm… things.

home page

Each of these things are types of BLAST search that you can run. I”ll try to write a post on the differences between them all, but for now, just know that we’re going to use “protein blast” because we have a protein query (a list of amino acids) and we want to find out what protein it probably is (by searching protein databases).

So, click on “protein blast” and we’ll move on.

Step 3: The Search Interface

If you’ve found this site (MPSN), then you’re probably VERY familiar with the one pictured below.

google

The screen that you use for the BLAST search isn’t EXACTLY the same, but there are enough similarities that we can compare them. In fact, the basic principles are ALL the same.

Do you see where it says “Enter accession number, gi, or FASTA sequence” ? This is where you’re going to paste the amino acid sequence.

Search Query

In the entry field that says “Job Title,” you can come up with something to title your search if you’d like. But it won’t ruin your search if you leave it blank. I usually leave it blank.

“Query Subrange” and “Upload File” are other tools that may someday help you in a large BLAST search. But as for now, they’re pretty much useless to you.

Moving on down the page, we find that the next significant plaything is the database chooser-thingy. If you click on it, you’ll see a dropdown menu with a cornucopia of different databases you can choose. Don’t let this discourage you. The only one that’s going to help you most of the time is “Non-redundant protein sequences.” Although there isn’t a single database that holds all of the info at NIH, the non-redundant option holds “All non-redundant GenBank CDS translations + RefSeq Proteins + PDB + SwissProt + PIR + PRF.” If you don’t know exactly what that means, that’s ok. Just realize that it means that non-redundant is the choice you want to make.

Databases

If you’re dealing with a teacher who has told you what organism you’re studying, you can put the scientific name of the organism in the input marked “Organism Optional.” Although that may seem like common sense to you, I can’t tell you how many times I’ve seen people ask for help concerning that input.

ALMOST DONE HERE…

Now, all you need to do is make sure that the “blastp” is marked in the “Algorithm” box, and then you can click BLAST at the bottom of the screen. You should be redirected to a page that includes this:

waiting

Step 4: Interpreting Results

You did it! You conducted a BLAST search like a pro, but now what? Well, let’s start with that funky-lookin’ graph in the middle of the page.

alignment scores

This graph shows you different results that matched your search sequence. We gave the BLAST program a sequence and it compared our sequence to a few hundred thousand others (or more) and spit out the results that it found in the form of this graph.

You’ll notice that at the top, there are five colors: black, blue, green, pink, and red. There are also numbers that go with the colors. A common mistake is to think that these numbers represent the amount of amino acids that were matched in the result. For instance, if we had searched “FJLDKJ” and the BLAST had returned with a result of “FJDDKJ,” then that would give a score of 5 rather than 6. That’s not quite right…

The numbers you see are scores. They DO compare the amount that matched between the search and the result, but the don’t do that by counting the individual number. Think about it like a chapter exam in class: the number of questions isn’t always 100, but the score is always out of 100. This means that the teacher is “standardizing” the scores so that they all compare to each other on a scale of 100.

The BLAST system does basically the same thing, although the score is out of somewhere around 200. I’m not exactly sure what the maximum possible value is, but a 200 in blast is like an A on an exam. By that logic, I’m sure you can figure out that, in your blast searches, you want to use the results in the pink and the red.

description

Moving down the page again, you’ll see a section called “Descriptions.” This is the first real encounter with your results. What you see here is really where you can get a lot of information. For instance, I’ll bet that you didn’t know that the organism that our protein was made in is Saccharomyces Cerevisiae (baker’s yeast).

HOW DID I KNOW THAT? Just look at the results. The highest score AND the second highest score results both come from baker’s yeast. (You can click the little blue link on the left side in order to go to the pubMed article where the results are found. Clicking on that link sent me to an article that showed me not only the organism, but also that this protein is the result of the ARP2P gene.

OH! And I almost forgot to explain the E-Value!

evalue

.

The E-Value is a fairly simple concept, once you get the hang of it. For those of you that are familiar with statistics, it’s really similar to the standard deviation. For the rest of you, allow me to explain: the E-Value is the probability (the percent chance) of there being a better result than this one. The smaller the E-Value, the smaller the chance of there being a better result. Therefore, a GOOD BLAST RESULT is one that has a HIGH SCORE and a LOW E-VALUE.

I really hope that this helps somebody out there.

Best of Luck,

Grey

Trackback URL

, , , ,

No Comments on "“BLAST Off!!” : How to use BLAST for running a protein search"

Hi Stranger, leave a comment:

ALLOWED XHTML TAGS:

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Subscribe to Comments