Multivariate ststistics had a significant impact on modern Ecology (and - to a given extent - viceversa), especially when ordination and classification techniques are taken into account. As for testing significance of something (usually means, but not only that, obviously), a number of problems have to be considered and - if possible - solved. For instance, many processes involve dilution or concentration of something while others involve growth. In all these cases data dispersion is not homegenous, and therefore variances of different samples are very different from each other. This problem can be solved, for instance, by appropriate transformation, but even transformation will not cure all problems (e.g. those related to the interaction between the way data are acquired and the way data vary because of other causes). Other problems, however just cannot be fixed. For instance, those related to the assumption of normal (or, more generally, known) distribution of data. In case data distribution is ate least unimodal, then most parametric statistical test are robust enough as to provide sound results even in marginal conditions, but in many cases even the assumption of unimodality is violated. A typical case is the effect of competition, that may cause a double response of the same species, whose abundances are the combination of two (possibly unimodal) distributions: one involving responses in absence of competition and the other involving responses to competition. It is obvious that no transformation can make such a data distribution manageable by the conventional parametric techniques, epsecially when multivariate data are considered.
When only the assumption of normality is violated, however, non-parametric methods can still be applied, and, in the multivariate case, permutation tests can be used. The general rationale supporting these tests is that an empirical distribution of an appropriately defined test statistics can be obtianed by shuffling the data set and resampling it according to the null hypothesis. For instance, in case one wants to test the difference beteween the means in two groups, a null hypthesis of equal means can be assumed, and therefore data can be randomly re-assigned to the two test groups a number of times and differences betwen means can be computed. In case the original difference between means is large relative to the distribution of re-sampled means, then the null hypothesis can be rejected, concluding that probably the means are really different.Although permutation tests can be applied in most situation, they are particularly useful when data sets are small. In fact, in these cases we do not have information about data distribution and therefore selecting a permutation procedure is the best way to stay on the safe side. This is very often the case of community studies involving the comparison of lists of species, that are inherently multivariate and never multinormal.
In this page you will find links to my programs for some of the most popular multivariate permutation test that are cited in the ecological literature. All of them run in a command prompt (aka MS-DOS or console) window in any flavour of 32 or 64 bits Microsoft Windows (i.e. from Windows 95 to Windows Vista). Each program comes in a Zip files, with sample data file(s) and sample output. Just unzip, copy in a folder of your choice and run: no need for installation procedure, but obviously no fancy user interfaces. The good news is that these prograns are significantly faster than the nicer "relatives" you may found in some statistical analysis software packages.
Those who really cannot stand a command prompt, or just those who are looking for a comprehensive and affordable (free!) software tool for multivariate statistics, my advice is to give PAST a try. It's a very neat piece of software and it is by far the easier to use. The only con is speed, that can be limiting when large data sets are processed. Another excellent solution is to use R packages that support permutation methods, some of which are absolutely outstanding (e.g. Jari Oksanen's Vegan), but the con is that R is not for the casual user.
ANOSIM, which stands for ANalysis Of SIMilarities, is a technique proposed by Clarke (1993) that is aimed at comparing groups of samples on the basis of the difference between the average rank of among-groups similarities and the average rank of within-groups similarities. In case the latter is small relative to the first one, then the groups are probably different. Obviously you can also run ANOSIM on distance or dissimilarity data, and the software will take care of the appropriate conversions.
Download anosim.zip and unzip it to any folder you like. The test.dis file contains a 8 x 8 dissimilarity matrix. The file format is pure ASCII and commas as well as spaces are accepted as separators. As seen in the sample file, no labels are allowed for lines and/or columns. The program anosim.exe can be used in two different way: interactively ordirectly from the command line. If you prefer the first case, launch the program, and it will prompt you for all the info it needs. If you prefer running the program with a single command, then you can see and example in file test.out. This file also shows the kind of output you are going to get. Obviously, in case you have more than two groups, you have to specify the size of more than two groups. This program is able to handle really huge matrices (more then 2500x2500 ones have been successfully tested, the only limit is the ability of the 32 bit Windows environment to address up to 2 Gb) and it also provides a rough estimate of the execution time.
A step-wise version of ANOSIM, useful for spotting a boundary between two areas, e.g. exposed to different environmental conditions, can be found in a dedicated web page.
MRPP, which stands for MultiResponse Permutation Procedure, is similar in scope to ANOSIM, although quite different in its rationale. It was developed in a non-ecological context at first (Mielke et al., 1976; Berry, 1983; Mielke, 1984), but it had been often used by ecologists since then (e.g. Biondini et al., 1985; Zimmerman et al., 1985). MRPP is based on the assumption that, in case two or more groups are different from each other, the average within-group distance is smaller than the overall average distance between samples (or objects, specimens, etc.).
At the momento of this writing my MRPP program only works with Euclidean distance and a single weighting scheme (each group contributes proportionally to its size), but more options can be found in the software available from Mario E Biondini's web page. As for commercial packages, I am aware of the implementation of MRPP in the commercial package PC-ORD.
Download mrpp.zip and unzip it to any folder you like. The file test.dat is a sample data file about six samples containing ten species. Two groups of samples are defined, containing three samples each. This information is stored in the first two lines of the data file. The first line must contain: number of samples (or objects), number of species (or variables) and number of groups. The second line must contain the number of samples (or objects) in each group. The following lines must contain the data: lines are samples, columns are species. All data must be in pure ASCII, with no labels. Spaces or commas are accepted as separators. The program mrpp.exe assumes that this information is stored in the data file, so it will only prompt the user for the input (data) file name and for an optional output file name.
The output obtained by analyzing test.dat is shown in test.out and is fairly self-explaining. However, the most relevant information is in the observed delta (i.e. the average within-group distance) and in the p-level that is associated to it (last line). The within-group agreement R can be a useful index (it will tend to 1 for groups that are more and more homegeneous) as well as the within-.group distance, labeled as avg(d), for each group. The latter is useful for pointing out differences in within-group homogeneity.
The Mantel test (Mantel, 1967) is aimed at comparing two matrices, usually expressing distances between samples. In the original application, geographical distances were compared to another kind of distances, but it is obviously possible to compare any kind of distances as well. The test statistics Z is the sum of the products of the corresponding elements of the two matrices, excluding those on the diagonal and taking into account only one of the triangulare halves in case of square symmetrical matrices. A standardized statistics R is also available, and it is to be interpreted (and it actually is) a linear correlation coefficient between the corresponding elements of the two matrices.
In order to run the test, download and unzip to any folder you like mantel.zip. The two files p1.dis and p2.dis are 5 x 5 distance matrices, whereas mantel.exe is the executable file and mantel.out shows a sample output and the way command line options can be used to execute the program directly from the command prompt. In fact, mantel.exe can be executed in interactive mode (and it will prompt the user for the needed info) or in batch mode: in the first case the program must be just launched (nothing else should be added on the command line), while file names and other information have to be passed on the command line after the program name for batch execution (this option is useful when many data sets have to be processed automatically using a batch file). In the latter case, after the program name the user must append the names of the two files containing the distance/dissimilarity matrices, a "1" for the standardized version of the test (R) or a "0" for the non-standardized version (Z) and the number of permutation. The example on the right of the command prompt shown in the file mantel.out is about a test performed on the p1.dis and p2.dis data files, in standardized form and performing 10000 permutations.
Indicator Species Analysis (Dufrene & Legendre, 1997) is not related to other permutation or conventional techniques, but it may be very usefule in practical applications. In fact, it allows to define whether a species is distributed among groups of samples in a way that is not compatible with the hypothesis of random generation (or definiton) of the groups. The basis for this procedure is the computation of Indicator Values (IVs) that are a combination of the frequency of occurrence and of the abundance of a given species. Each species has an IVs for each group and the value of these IVs is larger in case a species is more frequent and/or more abundant in a given group than in others. The IVs are computed on the original data and then the raw data matrix is shuffled a large number of times to obtain an empirical distribution of all the IVs based on resampling. The IVs that are as large as to be unlikely to happen in case of random distribution of the corresponding species among groups can be regarded as an evidence for a significant deviation from that condition.
Download isa.zip and unzip it to any folder you like. The file test.prn contains a small data set, i.e. 10 species (columns) in 6 samples (lines). File format is pure ASCII and commas or spaces are the allowed separators. Information about the execution can be provided by launching isa.exe and then following the directions of the program, or by passing the name of a configuration file on the command line after isa.exe and a space. The test.cfg file is a sample configuration file for isa.exe. The lines whose first character is a hash (#) are comments that explain what follows on the next line(s). Obviously, this file can be used as a template for other data sets. The output from the program is shown in test.out, whose columns contain a serial number for each species, then the species IVs for all the groups and finally a p-level for those IVs. For each species a "<" is appended to the largest IV, while asterisks are appended to the p-level value to highlight those that are lower than the usual alfa-levels (0.05, 0.01 and 0.001).
This test can also be performed using software developed by one of its Authors, or using PC-ORD.
| Back to contents |