Loading and Organizing Scaffold Data

The following article contains a list of frequently asked question relating to loading data into Scaffold.

Loading Data into Scaffold

How do I load data into Scaffold?
There are two ways to load your data into Scaffold. First is to load all the intended data using the Load Data wizard. The wizard starts automatically when you create a new experiment. The wizard will first ask you for the biosample name, sample category, and a sample description. The naming convention is deliberately flexible so that it can be adapted to the way in which you want to view your data. As an example, if you're looking for biomarkers, or differential expression between samples, then your biosample names might be "Control", "Diseased", and "Treated". Follow the directions of the wizard to complete the data loading process.

The second way to load data is to run the data loading wizard from the Load Data view page. From this page, you can add multiple biosamples and load separate data files accordingly.

How many files or how much data can I load?
Because each file, data type, and detection techniques vary so widely, it is difficult to quantify the file size or number of spectra that Scaffold can load. As a good standard rule, however, the more RAM the computer running Scaffold has, the larger file and more spectra you can typically load into the program.

Can I merge two saved SF3 files together?
Yes, to do so, open one of the saved SF3 files and then use Scaffold's Merge feature (available under File>Merge) to merge that file with one of your other saved SF3 files.

Can I open SFD files with Scaffold 4 and vice versa?
Forward compatibility works but backwards does not. In other words, you can open a SFD file with Scaffold 3 and Scaffold 4, but not an SF3 with Scaffold 2 (Scaffold 2 is deprecated and no longer supported).

What kind of data files can I load into Scaffold?
Scaffold has the ability to load output files from various open source and commercial search engines. For complete details and the most up-to-date reference, please see our file compatibility matrix.

How do I get my Mascot DAT files from my Mascot Server?
The Scaffold User's Guide contains instructions for connecting Scaffold directly to your in house Mascot server along with details on how to find the FASTA database used in the Mascot search so that the same FASTA database can be used in Scaffold.

Additionally, Mascot files stored locally can be loaded into Scaffold using the Load Data wizard. 

Can I load search results from multiple search engines in Scaffold?
Yes, this is one of the main features of Scaffold. If you have search results from multiple search engines, simply load them into Scaffold during the Load Data wizard. You must have used the same database for all these results in order to load them into the same experiment.

How can I maximize the amount of data loaded into Scaffold?
It is possible to load very large data sets in portions, save the resulting Scaffold SF3 file, then save that file again without any spectra using Scaffold's Save Condensed feature. The data can be saved in as many SF3 files as it takes to load all of the data successfully. Now go back and open one of the saved SF3 files and then use Scaffold's Merge file option (available under File>Merge) to merge that file with one of your other saved without spectra files. Doing this can allow you to get huge amounts of data into Scaffold. We have seen up to roughly 24 GB of data into a single Scaffold SF3 file using this technique. The reason you save the Scaffold file twice, once with spectra and once without, is that you may need to go back to the spectra to verify peptide assignments.

Why do I see different proteins in Scaffold and Mascot?
Scaffold and Mascot calculate protein probabilities in different ways which can lead to differences in protein predictions between the two programs. As a general rule you have a much higher confidence that a protein identified by both search engines is present in your sample than if that protein was identified with only one search engine.

Using X! Tandem

What is X! Tandem?
X! Tandem is a fast, open source search engine available from The Global Proteome Machine Organization that automatically searches for missed cleavages, semi-tryptic peptides, post-translational modifications, and point mutations. X! Tandem comes packaged with Scaffold and improves the number and confidence of protein identifications by combining X! Tandem's results with the original search engine results. 

How do I obtain X! Tandem XML files as input files?
The input files Scaffold accepts from X! Tandem are XML files. If you ran X! Tandem on the internet you will need to locate the X! Tandem XML output files, because Scaffold will not accept the HTML files displayed on the web. If you are viewing your X! Tandem results in your browser the location of the XML file is usually displayed in the browser's address box.

Why would I want to search with X! Tandem?
Different MS/MS analysis software use different protein prediction algorithms. So when data is run through multiple MS/MS analysis software, each using different algorithms to predict the presence of proteins, an increased confidence can be had in the results.

Using Multiple Search Engines Increases Confidence in Protein Identifications


Searching your data with X! Tandem is useful for verifying the presence of the proteins your primary search engine identified. If a protein is identified in Mascot or SEQEUST, but not by X! Tandem you may want to look at the data for that protein to ensure it is truly present. Searching with X! Tandem is also useful for identifying additional modifications that the primary search engine may have missed. It is important to remember that if you decide to run X! Tandem the analysis will take an appreciably longer amount of time. There are a two options you can choose which effect the speed of the X! Tandem analysis. The options are search subset database and search amino acid modifications. Choosing to search a subset database decreases X! Tandem's search time, while search time is increased for every additional modification you choose to search. Note: X! Tandem is automatically set to search for semi-tryptic peptides.

Can I run X! Tandem after the data has been loaded?
No, X! Tandem must be run as the data is being loaded into Scaffold.

What is a subset database? Why should I use it?
When using Scaffold you can choose to run X! Tandem on a subset database. A subset database is a database consisting of only those proteins identified by the first search engine. If you choose to run an X! Tandem analysis on a subset database your analysis will run faster than if you choose to run it on the whole database, but you will not identify any new proteins. A subset database is useful for increasing your protein coverage by identifying new peptides. It does this in two ways. First, the subset database dramatically decreases the amount of time it takes to run your analysis. Second, because it takes less time to search a subset database it is possible to add additional amino acid modifications to your search parameters thereby increasing the number of peptide identifications your search will make.

Note: If there are not at least 100 proteins in your subset database, Scaffold will randomly pull proteins out of the full database until there are at least 100 in the subset. Scaffold requires a minimum of 100 proteins in a subset database. Because Scaffold pulls additional proteins into a subset database search it is possible that X! Tandem could identify a protein not found in your original search. You should look carefully at any new identifications, recognizing them as suspect.

How do I add modifications to the X! Tandem search?
When you get Scaffold's Load and Analyze Data Screen, you will be given a choice whether or not you wish to run X! Tandem. If you choose to run X! Tandem, you can edit the protein modifications by clicking on the modification you wish to edit and adding it to the selected variable modifications Note: X! Tandem automatically searches for pyroglutamic acid as a variable modification.

What if the X! Tandem modification I need to add is not on the list?
If the modification you wish to consider is not on the drop down list, you can create a new modification. To create a new modification click on the New button. In the pop-up box you can add the name of your custom modification, the change in mass, and the amino acid modified. 

Using MudPIT

What does MudPIT mean?
MudPIT (Multidimensional Protein Identification Technology) is a way of analyzing spectra coming from two or more data sets each derived from the same biological sample. Using MudPIT considers all of the data to have come from one biological sample and treats it as such. If you analyze your data as a MudPIT you will not be able to view peptide or protein analysis from individual MS samples.

When would I check the MudPIT checkbox?
You would want to check the MudPIT checkbox when your experimental parameters separate peptides, not proteins. A MudPIT analysis (also known as "Shotgun Proteomics") lumps all MS samples into one biosample, and reports the results for the entire biosample.

A MudPIT analysis differs from a Biosample analysis in that a peptide from any MS sample in the MudPIT can be combined with a peptide from any other MS sample in the MudPIT to form a protein. Also, when viewing a MudPIT sample in Scaffold's Samples view you cannot view individual MS samples.

For example: If you collected two blood samples, digested the proteins in each sample with trypsin, and ran the digested proteins through LC MS-MS, you would want to check the MudPIT analysis checkbox.

How can I see the separate fractions in my MudPIT experiment?
You cannot view individual file samples in a MudPIT analysis. The only way to view these files is to load them into Scaffold and choose not to analyze them as a MudPIT experiment.

Organizing Data in Scaffold

Where is the list of all my loaded files?
You can view all of your previously loaded files in the Load Data view. If you have multiple biosamples in your analysis you can click on the biosample tabs to select a particular sample of interest and see what data has been loaded.

What are biosamples and categories?
A biosample is a name given to the overall grouping of spectra coming from the same biological sample. The definition of a biological sample depends on your individual experiment parameters.

A category is a name given to the specific category to which you would place your biosample (for example treated and untreated). If you use the same category name for two different biosamples, then the results for the biosamples will be displayed next to each other in Scaffold’s browser. Again, the definition of a category depends on your individual experiment parameters.

How do I compare multiple groups of samples side by side?
Scaffold has the capability to display samples within a organizational structure and samples with the same category name will be displayed side by side. Biosamples can contain multiple technical replicates, and can be grouped within categories. For example, you might have three categories as follows: drug 1, drug 2, and control. Then each category can have multiple biological samples (big mouse, little mouse, black mouse, gray mouse, etc), all with technical replicates, or runs on an instrument. Statistical tests such an an ANOVA can be applied to provides users information about their data.

How can I compare numbered spots from different gels?
You can place the MS/MS spectra files for each gel into two different biosamples. In this way you can compare spots with the same number from different gels without confusion.

Can I re-organize the data after I load it?
Yes, it is possible to remove files from a biosample, or queue files for loading in a biosample from the Load Data view. To remove a sample right click on the file you wish to remove from previously loaded data and choose Remove Selected Samples. To add files to a biosample click Queue Files For Loading in the upper right of the window and add the files.

How do I change a sample's description?
Go to the Load Data view and right click on the biological sample name in the upper portion of the window. Choose Edit Biological Sample from the dropdown options. In the window that pops up you can change the sample description by writing in a new description and left clicking on the Apply button.

Can I move the sample columns in the Samples view?
Currently it is not possible to move samples by dragging and dropping in the Samples view. Scaffold arranges samples alphabetically by their category name. So, if you really want to see two samples side-by-side in the samples view then you must give them category names in alphabetical sequence.

You will have to reload the data into separate biosamples using new category names so that Scaffold will properly display the samples in the Samples view. If you simply change the category name in already loaded data Scaffold will not change the way its Samples view is displayed, because Scaffold arranges the order only once during initial analysis.

