prss

Name prss
Description

prss is part of the fasta3 package. FASTA contains many programs for searching DNA and protein databases and one program (prss3) for evaluating statistical significance from randomly shuffled sequences.

prss is used to evaluate the significance of a protein:protein or DNA:DNA sequence similarity score by comparing two sequences and calculating optimal similarity scores, and then repeatedly shuffling the second sequence, and calculating optimal similarity scores using the Smith Waterman algorithm. An extreme value distribution is then fit to the shuffled-sequence scores. The characteristic parameters of the extreme value distribution are then used to estimate the probability that each of the unshuffled sequence scores would be obtained by chance in one sequence, or in a number of sequences equal to the number of shuffles. prfx is a related program for evualating translated-DNA:protein sequence similarity scores.

References:
Pearson, W.R. Flexible sequence similarity searching with the FASTA3 program package. Methods Mol Biol. 2000;132:185-219 [Entrez]

Pearson, W.R. Empirical statistical estimates for sequence similarity searches. J Mol Biol. 1998 Feb 13;276(1):71-84 [Entrez]

Pearson WR, Wood T, Zhang Z, Miller W. Comparison of DNA sequences with protein sequences. Genomics. 1997 Nov 15;46(1):24-36. [Entrez]


Homepage http://www.people.virginia.edu/~wrp/pearson.html  
Remote Documentation http://www.people.virginia.edu/~wrp/papers/ismb2000.pdf