
 |
Michele Scardi
Stazione Zoologica A. Dohrn di Napoli, Villa Comunale,
80121 Napoli, Italy
(Click on the figures
to see an enlarged version)
Why neural networks?
Phytoplankton production models are very important tools in oceanographic
research, mainly because direct production measurements are expensive,
time consuming and difficult to carry out on a routine basis. They are
also necessary to exploit remote sensing and other instrumental estimates
of phytoplankton biomass and photosynthetic efficiency (e.g. by pump and
probe fluorometers).
Several phytoplankton production models have been developed during the
last three decades and some have provided useful results (see Behrenfeld
and Falkowski, 1996, for an up to date review). Empirical models are probably
the most important subset of these models, since they provide reasonably
accurate production estimates on the basis of widely available predictive
variables (e.g. irradiance, biomass, etc.) that are linked to primary
production by direct causal relationships.
But oceanographic data sets are growing larger; therefore new and more
effective approaches are needed. Neural networks have been recently applied
to phytoplankton production modeling (Scardi, 1996) as well as to other
ecological problems (e.g. Lek et al., 1996). They are a powerful tool
for empirical modeling of complex processes, since they can accurately
reproduce any non-linear relationship, even those that are unknown or
not fully understood, provided that enough (representative) data are available.
What is a neural network?
Neural networks (NN), more properly referred to as "artificial"
neural networks (ANNs), are a very large and diverse set of processing
devices. Their design mimics, in a simplified way, the neuronal structure
of the mammalian cerebral cortex and most of them have some sort of "training"
rule that makes them "learn" from examples.
The most common NNs are those that use the Error Back-Propagation (EBP)
algorithm (Rumelhart et al., 1986) for their training. A typical EBP NN
is shown in Fig. 1. It consists of several layers of nodes somehow analogous
to neurons: an input layer (i), one or more hidden layers (h) and an output
layer (o). The example in Fig. 1 is a 3-5-1 NN, since it has 3 input nodes,
5 hidden nodes and 1 output node.
Each node receives its input from the output of the previous layer nodes
or from the network input. The connections between nodes are associated
to weights (W, Z) that are adjusted by the EBP training procedure. In
the input and hidden layers is also included a bias node with a constant
output (usually 1), that plays the same role as the constant term in a
multiple regression. Each hidden and output node is associated to an activation
function, i.e. a differentiable function of the node total input. Even
though several functions can be used, the most common is the sigmoid function:
f(a)=1/[1+exp(-a)]. If the activation function of the hidden layer nodes
is non-linear and an adequate number of nodes is available, an EBP NN
can approximate any non-linear function.
A good introduction to neural networks can be found in Abdi (1994), but
check also the URL below the title of this poster.
A simple model
A simple phytoplankton production (PP) model was set up using three predictive
variables that are easily available from remote sensing: Sea Surface Temperature
(SST), surface irradiance (I0) and surface chloropyll concentration
(CHL0) as predictive variables. The model was based on a 3-4-1
EBP NN and was calibrated (i.e. trained) using a small data set that was
extracted from the OPPWG data base (see ftp://warrior.das.bnl.gov/pub/Database/Database2.html).
This data set includes 97 PP measurements that were carried out in the
western Mediterranean during 3 different spring cruises. The scope of
the resulting model is somewhat limited, but it provides a good example
of the way a NN works in comparison with other empirical models calibrated
on the same data set: a very simple linear model and the Vertically Generalized
Production Model (VGPM) developed by Behrenfeld & Falkowski (1997).
The linear model is based on a composite predictive variable (SST·I0·CHL0)
and its intercept is set to zero. The VGPM is much more complex and is
based on a set of empirical relationships that take into account several
predictive variables, some of which have to be indirectly assessed from
available data. Since VGPM was developed from MARMAP data (NW Atlantic),
a linear correction was applied to improve its fit to Mediterranean data.
The color-coded outputs of the models are compared with observed PP values
in Fig. 2. The 3-4-1 NN (red diamonds) provides the best results in terms
of overall goodness of fit and error distribution (the latter does not
differ from a normal one according to a Shapiro-Wilk test, W=.968, p<.11).
Both the VGPM (blue squares) and the linear model (green triangles) tend
to underestimate PP in the 100-1000 mg C m-2 day-1
range and their error distributions are less symmetrical and leptokurtic
than the NN one, even though in the VGPM case the hypothesis of normality
could not be rejected. The mean square error (MSE) of the log-transformed
3-4-1 NN outputs (0.059) is much lower than the MSEs of the VGPM (0.130)
and of the linear model (0.239).
Even though a NN can fit observed data better than other models do, its
real power can be verified in its response to different combinations of
input variables values.
The surfaces in Fig. 3 represent the PP estimates provided by the different
models given varying I0 and CHL0 values and SST=18.94
°C (i.e. the mean June SST at 43°N, 8°E). It can be clearly seen that
the NN surface is more complex and "feature-rich" than the linear
model and the VGPM surfaces are. Of course, it does not meet some of the
theoretical constraints that the other models meet (e.g. PP is not null
when I0=0), but under real world conditions this is hardly
a problem.
NNs are casted in the data mould, so they tend to retain all the features
that are found in their training data sets. However, especially when small
training data sets are used, some generalization is needed. The surface
in the lower right corner of Fig. 3 shows the output of an overtrained
NN, i.e. of a NN that acts as a memory rather than as a model of a process.
In this case the NN response is much more complex and accurately "maps"
a subset of the training data, even though it does not make sense from
a more general point of view. Of course, overtraining can be easily avoided
by means of different techniques: limiting the number training cycles
(early stopping), adding white noise to the training patterns, etc.
Finally, an application of the 3-4-1 NN model is shown in Fig. 4, where
the mean June PP was assessed for the whole Mediterranean Sea. Since the
NN model was trained on Western Mediterranean data only and CZCS data
were used as model input, this map has to be regarded as an example rather
than as an accurate PP estimate.
Sensitivity analysis
A better understanding of the role of each input variable in a model
can be achieved by means of sensitivity analysis. This is much more interesting
when NNs are used, since, as already shown, they issue a more complex
output than other models do.
Since SST, I0 and CHL0 values are independent from
each other under real world conditions, 3 very simple sensitivity tests
were carried out. In each test one of the input variables was allowed
to randomly vary in a [-50%,+50%] range about the observed value and 1000
input were extracted resampling the training data set. The results of
the sensitivity analysis are shown in Fig. 5. The random variations in
I0 caused the largest impact on MSE (+30.17%). CHL0
closely follows with a +22.85% variation in MSE, whereas SST induced only
minor changes in MSE (+4.11%). This result provides a good approximation
of the relative importance of these variables in determining PP under
the observed conditions.
A more complex model
A PP model that is based on SST, I0 and CHL0 can
hardly manage large scale (not to say global) systems. As shown in the
left plot in Fig. 6, these input variables were not sufficient to adequately
train a NN when a large data set, including about 3000 observations in
many different geographical regions, was used. Therefore additional information
was needed to model the relationships between predictive variables and
PP. Photoperiod, latitude, longitude and julian date (the latter two variables
mapped into a circular reference system by means of sine and cosine transformations)
provided such information. The much improved results of an upgraded
9-9-1 NN model are shown in the right plot in Fig. 6 (this model is currently
participating in the Primary Productivity Algorithm Round Robin, that
will select a consensus algorithm for SeaWiFS data processing).
Discussion
NN models of phytoplankton production are intrinsically more effective
than other empirical models because NNs are powerful computational engines.
Like others empirical models, of course, NN models are just as good as
the data set on which they are based, but the capability of incorporating
information which is usually difficult to manage (e.g. binary or nominal
data, geographical coordinates, etc.) gives them a significant edge over
conventional models.
NNs will certainly be used more and more frequently in oceanographic research,
since they have provided useful results also in other oceanographic applications
(Scardi et al., in prep.) and their computational requirements are now
easily met by most PCs.
References
- Abdi H. (1994). A neural network primer. Journal of Biological Systems,
2 (3): 247-281
- Behrenfeld M.J. & Falkowski P.G. (1996). A consumers guide to
phytoplankton primary productivity models [on line manuscript]. Available
from Internet: <ftp://warrior.das.bnl.gov/pub/Reports/ps_files/paper2.ps>
[14/11/97]
- Behrenfeld M.J. & Falkowski P.G. (1997). Photosynthetic rates
derived from satellite-based chlorophyll concentration. Limnology &
Oceanography, 42(1): 1-20
- Lek S., Delacoste M., Baran P., Dimopoulos I., Lauga J. & Aulagnier
S. (1996). Application of neural networks to modelling nonlinear relationships
in ecology. Ecological Modelling, 90(1): 39-52
- Rumelhart D.E., Hinton G.E. & Williams R.J. (1986). Learning representations
by back-propagating errors. Nature 323: 533-536
- Scardi M. (1996). Artificial neural networks as empirical models of
phytoplankton production. Marine Ecology Progress Series, 139: 289-299
- Scardi M., Conversano F. & Ribera d'Alcalą M., in prep. A new
method for calibration-validation of oxygen data collected with CTD
oxygen probes.
|