Calculations in Scaffold

The following article contains a list of frequently asked questions on a variety of topics relating to calculations used in Scaffold. For specific questions not covered in our documentation we are available by telephone Monday through Friday from 8 AM to 5 PM PST. Our toll free number is 1-800-944-6027. Additionally, feel free to send an email to support@proteomesoftware.com.

Label-Free Quantitation

What is label-free quantitation? Why would I use it?
When using MS/MS proteomics, label-free quantitation is a method of looking for differentially expressed proteins without the use of a stable isotope or tag. While not as robust as iTRAQ, TMT or SILAC label-free quantitation (particularly precursor ion intensity) provides quantitation without the need for additional reagents or experimental procedures. 

What is precursor ion intensity?
Spectral counting counts the number of spectra identified for a given peptide while precursor intensity refers to the area under an MS1 spectrum peak corresponding to a specific peptide. Please see the following document for more detailed information on precursor ion intensity quantitation: https://www.dropbox.com/s/e76i2cabzkznfb2/precursor_intensity_quantitation_scaffold_qplus.pdf?dl=1

What is the total ion current (TIC)? How does Scaffold calculate the TIC?
The TIC is the sum of the areas under all the peaks contained in a MS/MS spectrum. Scaffold assumes that the area under a peak is proportional to the height of the peak and approximates the TIC value by summing the intensity of the peaks contained in the peak list associated to a MS/MS sample.

How do I select the quantitation method?
You can choose which quantitation method Scaffold will use in the Quantitative Analysis Dialog Box. The Quantitative Analysis dialog box can be opened from the Experiment > Quantitative Analysis menu option, or by clicking the bar graph icon next to the "Min Protein" drop-down on the tool bar.

There are eleven Quantitative Methods available for statistical analysis in Scaffold:

Total Spectra
Weighted Spectra
Average TIC
Total TIC
Top Three TIC
Average Precursor Intensity
Total Precursor Intensity
Top Three Precursor Intensities
emPAI
NSAF
iBAQ

Other Calculations

What statistical tests does Scaffold use to validate data?
There are several statistical tests you can apply to your data in Scaffold for enhanced organization: Fold change, CV, T-Test, ANOVA, and Fisher's Exact Test. For two biosamples, you can apply the fold change, which simply calculates the fold change difference in terms of spectrum counts. If you have three biosamples, you can apply the coefficient of variance, which is expressed as a percentage. Both of these examples don't actually involve using Scaffold's categories.

When you have at least two categories, each with at least one biosample and one category having at least two biosamples, you can apply the T-Test. More information on this statistical test can be found here. Now if you have at least three categories, and at least one of these categories has at least two biosamples, you can apply the most robust test Scaffold offers, the Analysis of Variance (ANOVA) test. Finally, the newest test Scaffold offers, introduced in Scaffold 3 in 2010, is the Fisher's Exact test, similar to the T-Test, but handles data slightly differently, read about it here.

What is the false discovery rate (FDR) and how is it calculated?
Scaffold calculates False Discovery Rate (FDR) in two different ways depending on whether the data has been searched against a decoy database. The method Scaffold has used for calculating FDR can be determined from the color of text in the box in the lower left corner of the Scaffold window. If the search was not performed against a decoy database, Scaffold uses a probabilistic method and displays a green box If a decoy database has been used, Scaffold uses an empirical method and displays a red box.

Probabilistic Method: This is the approach used by the Trans-Proteomic Pipeline and the ProteinProphet algorithm. In this method, Scaffold calculates its protein FDR by using the assigned protein probabilities. Consider the following situation:
Four proteins are identified: A, B, C, D:

Protein A has 100% probability in Scaffold
Protein B has 100% probability in Scaffold
Protein C has 80% probability in Scaffold
Protein D has 50% probability in Scaffold

To calculate the FDR, Scaffold would sum the probabilities of A, B,C, and D and then divide by the sum of max probabilities for each.

(100+100+80+50)/(100+100+100+100)=0.825

1.00-0.825=0.175*100=FDR

In the above scenario Scaffold would calculate an FDR of 17.5%.

Calculating a false discovery rate (FDR) in Scaffold empirically is also straight forward (method of Kall et al, 2008). What is required is to count total number of reverse hits and divide by forward hits (by counting in Scaffold) or to export a Proteins report, and then sum up the total number of reverse or randomized protein hits and divide by the total to get a FDR. Reverse or randomized database entries have a "-r" appended to their accession numbers. Some people will double the number of false hits when calculating their FDR, reasoning that there are at least as many wrong hits from the database as there are false hits.

For example:
256 accession numbers are in the proteins report. Of these, 14 have a "-r" appended to their accession number indicating a false hit.

14/256 = 5.5%, but if there were 14 hits with a -6, then probably at least 14 of the hits to regular accession numbers are wrong. So a more accurate FDR would be 14*2/256 = 11%.

However, the accepted FDR calculation is found in this article proposed by Kall et. al. 2008 where the form Decoy/Target is preferred: Assigning Significance to Peptides Identified by Tandem Mass Spectrometry Using Decoy Databases

What does it mean to have a False Discovery Rate?
Scaffold requires proteins to contain at least one 50% peptide before we count it as good. This seems to help substantially with both memory usage and the Protein Prophet fitting (not shown). In this case, a single 50% peptide ID could correspond to a low-percentage protein probability (as shown by the Protein Probability Calculation chart).

Incidentally, this fixed "internal" filter is the reason why we limit lowering the protein probability to the 20% level. Usually we don't see this much of an effect from the 50% threshold because most data sets contain some level of smear between the correct and incorrect distribution. The dramatic separation between good IDs and bad in general is representative of a high quality data set.

What is a Discriminant Score?
A discriminant score is a function that represents each spectra's score within a search engine. For example, Mascot scores each peptide with an ion score and an identity score. These scores are combined into a discriminant score. Scaffold uses PeptideProphet to assign all peptides a discriminant score for comparison to one another. These scores are then plotted in a distribution.

How is the Discriminant Score distribution interpreted?
The discriminant score distribution can be seen in Scaffold and is fitted by PeptideProphet with correct and incorrect curves. For each set of filter settings in Scaffold, you can easily see what the probability of a certain peptide with a certain discriminant score is correct or incorrect.

Where do Scaffold's peptide probabilities come from?
The peptide probabilities are generated using the algorithm PeptideProphet. PeptideProphet changes the output from a standard search engine (SEQUEST, Mascot etc.) into a discriminant score. The data from the previous search is mapped on a histogram demarcated by discriminant scores, and Bayesian statistics are used to determine the probability that a match is correct at each discriminant score.

For more about Scaffold’s peptide prediction software, the PeptideProphet publication can be found here: PeptideProphet

Reference:
Keller A, Nesvizhiskii AI, Kolker E, Aebersold R. 2002. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal Chem. 2002, 15;74(20):5383-92

How does Scaffold calculate peptide mass?
Each peptide has four mass values associated with it in Scaffold's Spectrum Report. These are Observed m/z, Actual peptide mass (AMU), Calculated peptide mass (AMU), and Actual minus calculated peptide mass (AMU). Although it is not directly a mass, the Spectrum charge is a very important part of Scaffold's mass calculations.

Observed m/z = Observed mass to charge ratio given by the MS/MS machine

Actual peptide mass (AMU) = (Observed m/z)*Spectrum charge – Spectrum charge*(1 proton). This is the actual weight of the Observed m/z spectrum.

Calculated peptide mass (AMU) = Sum(amino acid weights) + 1 proton. This is the summation of amino acid weights for the peptide, plus one proton weight. Scaffold is really reporting the Calculated mass for the ionized peptide.

Actual minus calculated peptide mass (AMU) = (Actual peptide mass) – (Calculated peptide mass – 1 proton). It is important to remember that 1 proton mass (in AMU) is subtracted from what Scaffold calls the Calculated peptide mass before being subtracted from the Actual peptide mass. This has caused users some confusion in the past.

Example calculation: The following is an example from a real Scaffold Spectrum Report

How_does_Scaffold_calculate_peptide_mass-.png

 The calculations are as follows

Actual peptide mass
1244.54*3 – 3*1.0078
3730.58

Actual minus calculated peptide mass
3730.58 – (3729.94 – 1.0078)
1.65

Scaffold uses a floating point numerical interpretation system for its spectrum analysis, so the actual numbers you get when you do the calculations may be slightly different from those Scaffold reports because of the floating point analysis.

What is the Protein Grouping Algorithm used in Scaffold and where is it shown?
Scaffold uses a protein grouping algorithm to reduce the number of proteins under consideration. Generally, this algorithm works by creating a table like that shown in the Similarity View of Scaffold. This table shows each protein to which the peptides could potentially be assigned, and in that protein’s column shows the probabilities of all of the peptides that match it. The sum of the probabilities is calculated for each protein. Only the “valid” peptides are considered.

Each peptide is then assigned to the protein with the highest total probability in whose column it appears. If two or more proteins have equal total probabilities and that is the highest for that peptide, it is assigned to all of them. Now the grouping begins. Proteins with no peptides assigned are eliminated from consideration, as all of the evidence for those proteins has already been accounted for in proteins which are more likely. Proteins with the same peptides assigned to them are combined into a group.

There is one further complication, however. If the only evidence for a group is a single protein with probability less than 95%, Scaffold disregards this group. This is based on a heuristic rule built into the algorithm which cuts down on the number of false protein matches displayed. Generally, this works well to eliminate false assignments. However, in certain rare instances, it can result in a protein that may actually be found in the sample being eliminated from consideration, and thus not seen in Scaffold’s other views. Unfortunately, changing the filter settings has no effect upon the grouping algorithm.

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.