Department of Microbiology and Immunology, University of Oklahoma Health Sciences Center, PO Box 26901, BMSB 1053, Oklahoma City, OK 73190, USA, 1 Department of Chemistry and Biochemistry, University of Oklahoma, Norman, OK 73019, USA and 2 Department of Microbiology and Immunology, Tulane University Medical Center, New Orleans, LA 70112, USA
*To whom correspondence should be addressed. Tel: +1 405 271 6622; Fax: +1 405 271 3117; Email: juneann-murphy@ouhsc.edu
FELINES -- Finding and Examining Lots of Intron 'N' Exon Structures
This package provides a set of perl scripts that extends the productivity of widely available programs enabling more complex analysis of genomic data. This package is especially useful for creating intron and exon databases using EST to genomic sequence alignments and for identifying conserved motifs in DNA/RNA sequences. The package is divided into three different layers:
Each EST sequence is paired to its homologous genomic sequence and the EST sequence is aligned to the genomic sequence.
Using the alignments from above, the intron and exon regions are extracted, filtered, and added to their respective datasets.
In the analysis layer, the sequences are searched for consensus sequence elements.
The Felines programs are located at ftp://ftp.genome.ou.edu/pub/programs/.
The authors cannot control changes made by the authors of the programs below which break FELINES. The programs of the FELINES utility are most easily used when placed in a directory that is part of the path (e.g., /usr/local/bin/).
The package has been developed using the programs below. If any of these programs are not currently available locally, they will need to be downloaded and installed in the directory path or in their proper places.
All testing was done on a Linux PC system and should work on any similar system.
1. Create a subdirectory.
2. Run all2many on EST and Genomic sequence FASTA files.
3. Make genomic and est list files.
4. Customize options file.
5. Run wiscrs.pl
6. Run gumbie.pl
7. Run icat.pl or findnmers.pl or cattracts.pl
To use FELINES, one must first create a subdirectory below the directory that FELINES is to be used in. In this directory, two FASTA files should be placed -- one containing the genomic sequences and one containing the EST sequences. Check the file names in both of these files. If they exceed about 20 characters, the names will be truncated by Spidey and the FELINES utility will choke. Also, this is a good time to change any X's in either file into N's. Spidey doesn't accept X's. It may be desired to remove N's as well if there are a substantial number. While you are at it, if the EST and genomic sequence names are similar, change one group. The computer will have no problem keeping them separated (provided they have at least slightly different names) but it may test your sanity.
Next, the user will need create a separate file for each genomic and EST sequence. The easiest way to do this is using all2many (James D. White & Bruce Roe, unpublished, http://genome.ou.edu/informatics.html). The user now has a subdirectory containing all of the EST and genomic sequences in separate files.
While still in the subdirectory, format the genomic sequence database for BLAST using formatdb. It is important that the file formatted for BLAST is identical to the file split apart using all2many. This is because FELINES depends on the data being consistent (Contig5 is always Contig5).
Back in the main directory, create two files. The first will contain the file names of all of the EST sequence files. The second will contain the names of all of the genomic sequence files. Each name is one line of the file and will provide the relative location of the file. Also, place a copy of the options file in this directory level and change the options file to reflect this. While in the options file, adjust the other parameters to suit your needs.
There are now three files in the main directory to be used and many files in the subdirectory. At this point you can run wiscrs.pl. All you have to do is enter the -O flag at the command line with the name of the options file you are using.
Now you can run gumbie.pl. At the command line give gumbie.pl the name of the alignment files (with the -S flag), the name of the options file (with the -O flag) and any other flags that you want to use. Be sure not to mix these up. Gumbie.pl may be a "smart cat" but it can't read your mind. This sometimes catches the authors.
Okay, now you have run wiscrs.pl and gumbie.pl. Now it's time to do the real analysis. Play around with the other programs. Help is available if you try to run the program without any flags. This is true for all of the FELINES components. If you get really stuck, contact the authors (Scott Drabenstott).
Bruce Roe, broe@ou.edu