|
|
seqmatchall |
The larger the specified word size, the faster the comparison will proceed. Regions whose stretches of identity are shorter than the word size will be missed. You should therefore choose a word size that is small enough to find those regions of similarity you are interested in within a reasonable time-frame.
Here is an example using an increased word size to avoid accidental matches:
% seqmatchall Does an all-against-all comparison of a set of sequences Input sequence set: tembl:eclac* Word size [4]: 15 Output file [eclac.seqmatchall]: |
Go to the input files for this example
Go to the output files for this example
Mandatory qualifiers:
[-sequence] seqset Sequence set USA
-wordsize integer Word size
[-outfile] outfile Output file name
Optional qualifiers: (none)
Advanced qualifiers: (none)
Associated qualifiers:
"-sequence" related qualifiers
-sbegin1 integer First base used
-send1 integer Last base used, def=seq length
-sreverse1 boolean Reverse (if DNA)
-sask1 boolean Ask for begin/end/reverse
-snucleotide1 boolean Sequence is nucleotide
-sprotein1 boolean Sequence is protein
-slower1 boolean Make lower case
-supper1 boolean Make upper case
-sformat1 string Input sequence format
-sopenfile1 string Input filename
-sdbname1 string Database name
-sid1 string Entryname
-ufo1 string UFO features
-fformat1 string Features format
-fopenfile1 string Features file name
"-outfile" related qualifiers
-odirectory2 string Output directory
General qualifiers:
-auto boolean Turn off prompts
-stdout boolean Write standard output
-filter boolean Read standard input, write standard output
-options boolean Prompt for required and optional values
-debug boolean Write debug output to program.dbg
-acdlog boolean Write ACD processing log to program.acdlog
-acdpretty boolean Rewrite ACD file as program.acdpretty
-acdtable boolean Write HTML table of options
-verbose boolean Report some/full command line options
-help boolean Report command line options. More
information on associated and general
qualifiers can be found with -help -verbose
-warning boolean Report warnings
-error boolean Report errors
-fatal boolean Report fatal errors
-die boolean Report deaths
|
| Mandatory qualifiers | Allowed values | Default | |
|---|---|---|---|
| [-sequence] (Parameter 1) |
Sequence set USA | Readable sequences | Required |
| -wordsize | Word size | Integer 2 or more | 4 |
| [-outfile] (Parameter 2) |
Output file name | Output file | <sequence>.seqmatchall |
| Optional qualifiers | Allowed values | Default | |
| (none) | |||
| Advanced qualifiers | Allowed values | Default | |
| (none) | |||
The sequences must be either all protein or all nucleic acid.
1832 5646 7477 ECLAC 1 1832 ECLACA 1113 49 1161 ECLAC 1 1113 ECLACI 1500 4305 5804 ECLAC 1 1500 ECLACY 3078 1287 4364 ECLAC 1 3078 ECLACZ 159 1 159 ECLACA 1342 1500 ECLACY 60 1 60 ECLACY 3019 3078 ECLACZ |
ECLAC (the complete E.coli lac operon) matches ECLACI ECLACZ ECLACY and ECLACA (the individual genes), and there is a short overlap between ECLACY and the flanking genes ECLACZ and ECLACA
The output is a list of regions of identity in pairs of sequences, each consisting of one line with 7 columns of data separated by TABs or space characters.
The columns of data consist of:
| Program name | Description |
|---|---|
| matcher | Finds the best local alignments between two sequences |
| supermatcher | Finds a match of a large sequence against one or more sequences |
| water | Smith-Waterman local alignment |
| wordmatch | Finds all exact matches of a given size between 2 sequences |
polydot will give a graphical view of the same matches.