Stepwise ANOSIM

 

Introduction

In this page you'll find a link to a stepwise version of the ANOSIM (Clarke, 1993). Please, have a look at the directions at the bottom of this page before running the program.

Basically, there is nothing new in this stepwise version of ANOSIM. In fact, it just runs several ANOSIMs while changing the number of samples that are assigned to each one of two groups (e.g. disturbed and undisturbed sites). The user will be prompted for the mininum number of samples that are to be assigned to each group (that correspond to samples that represent certainly disturbed and certainly undisturbed sites, respectively). However, we're developing some improvements to this basic version, which include a smarter search for neighbouring samples and the detection of the effects intermediate disturbance. So, if you're giving SWANOSIM a try, have a look at this page from time to time for updated versions.

In order to work properly, this stepwise ANOSIM needs an ordered distance or dissimilarity matrix in input. The first block of rows and columns (i.e. the upper left one) must refer to distances or dissimilarities among disturbed samples, whereas the last block (i.e. the lower right one) must refer to distances or dissimilarities among undisturbed samples. Obviously, the opposite arrangement (undisturbed first, disturbed last) will also work and it will produce exactly the same output, although mirrored. Independently of the arrangement of these reference blocks, distances or dissimilarities between disturbed and undistrubed samples as well as those among samples whose role is to be tested will be located in the middle of the matrix, i.e. in rows and columns that separate the two outer blocks. For instance, let's a set of 8 samples containing only 2 species:

 

Sample Species 1 Species 2
1 1 2
2 2 3
3 1 1
4 3 2
5 5 7
6 6 6
7 8 7
8 7 5

We'll assume that sites 1 and 2 (imn red) are certainly disturbed, while sites 7 and 8 (in green) are certainly undisturbed. Sites from 3 to 6 are arranged according to their distance to the closest site in the disturbed (red) group. This way we'll assume that disturbance decreases from site 3 to site 6. The resulting distance matrix, using and Euclidean distance, is:

 

  0   1.414214   1   2   6.403124   6.403124   8.602325   6.708204  
  1.414214   0   2.236068   1.414214   5   5   7.211102   5.385165  
  1   2.236068   0   2.236068   7.211102   7.071068   9.219544   7.211102  
  2   1.414214   2.236068   0   5.385165   5   7.071068   5  
  6.403124   5   7.211102   5.385165   0   1.414214   3   2.828427  
  6.403124   5   7.071068   5   1.414214   0   2.236068   1.414214  
  8.602325   7.211102   9.219544   7.071068   3   2.236068   0   2.236068  
  6.708204   5.385165   7.211102   5   2.828427   1.414214   2.236068   0  

The first two rows and columns refer to samples 1 and 2, while the last two refer to samples 7 and 8. Distances among disturbed sites are in red, while those among disturbed sites are in green.

Passing this matrix to SWANOSIM, with at least 2 samples in group #1 (i.e. samples 1 and 2) and 2 samples in group #2 (i.e. samples 7and 8), produces the following output if 1000 permutations are used for estimating R significance level (actual p(R) values may vary because of random initialization):

The outcome of the stepwise ANOSIM suggest that the first 4 samples are probably disturbed, as they are similar to each other and far from samples 5 to 8. In fact, R is maximum when n1=4 and n2=4. A quick look at the following picture, that contains a scatter plot of the raw data, confirms this diagnosis.

 

Using SWANOSIM

In a world where every single piece of software is supposed to be user friendly (and to be needlessly bloated), SWANOSIM adopts a somewhat different approach. In fact, it is a console application, i.e. a tiny program that runs from the command prompt. This solution is very common in the UNIX/LINUX world and it used to be the only way we ran MS-DOS programs when PCs were ...young. However, younger and less experienced users might need some help.

In order to get SWANOSIM up and running, you have to save it somewhere on your PC by yourself and you must remember the name of the folder in which you saved it (that might be your default folder, or maybe the Windows desktop), as no automatic installation is available.

The best choice is to create a folder for it instead of accepting your default destination for downloaded files. In this new folder you can also save all the data files you are going to process. This way, when you'll be prompted for the data file name, you'll only need to pass the file name (including its extension, if any) instead of the complete path. This will also help users who are not familiar with Windows command prompt and with long path names.

Assuming that you saved SWANOSIM in a folder and that you opened that folder, you can launch SWANOSIM as usual, just double-clicking its icon (that will be a small empty window). Experienced users will probably open a command prompt window and they'll run SWANOSIM from there, but this is not strictly necessary.

In order to successfully execute SWANOSIM, you must provide a pure ASCII, space-separated distance or dissimilarity matrix in input. No column or row labels are allowed. The distance matrix we used in the above example can be used for testing SWANOSIM and as a template for your own distance matrices. You can download it from here.

Please, in case you're not familiar with the command prompt, pay attention to the file name extension (i.e. the rightmost part of the file name, the one that follows the dot, like "doc" for Word files and "xls" for Excel files), that in many cases is not displayed. SWANOSIM will prompt you for the distance matix file name and it will need the complete name, e.g "tab.euc" in the case of the above mentioned sample distance matrix. In case you can't see the file name extension in your Explorer window, you should change your "Folder options/View" properties.

In case you need a program for computing distance or dissimilarity matrices, you might want to give DISTSIM5 a try. In that case, the best option is to save it in the same folder you created for SWANOSIM. DISTSIM5 accepts pure ASCII, space or comma separated data as input. No column or row labels are accepted. A sample file, containing the same data we used in the above example, can be downloaded from here. Please notice that there are no sample and species labels (i.e. the file contains only Species 1 and Species 2 columns without the first row - just plain data).

DISTSIM5 was compiled several years ago, so it has tight memory limits and it won't be able to handle very large data files and matrices. In case you want to process very large data sets, drop me a line (mscardi@mclink.it). A slightly different version that is able to address 2 Gb of memory is also available, although it is even less user-friendly than DISTSIM5.

SWANOSIM saves its output in a file (SWANOSIM.OUT) that can be used for importing the results in spreadsheets or other programs. You'll find it in the same folder where you saved SWANOSIM, and it will be changed after each run, so you must rename it in order to prevent SWANOSIM from overwriting it. In case you want to import SWANOSIM.OUT into Excel, just drag its icon onto the Excel window, then select the "Text to columns" option in the "Data" menu and select "delimited" when prompted and "space" as a separator.

SWANOSIM was compiled using FreeBasic, a very powerful (and fast!) QuickBasic-compatible compiler. It can compile Win32 executables that handle up to 2 Gb of memory and run faster than most other programs. Recommended and Linux-compatible!

In case you have problems with SWANOSIM, please, just drop me a line.

 

References

Clarke, K.R. 1993. Non-parametric multivariate analysis of changes in community structure. Australian Journal of Ecology 18:117-143.