In a paper published a couple of years ago (Pilger, 2012), I describe the application of a simple principle, transformed into a distinctive abstract object, to an optimization problem (within the plate tectonics paradigm): simultaneous reconstruction of lithospheric plates for a range of ages from marine geophysical data . It is rare that the relation of the principle, maximum entropy, with a particular transformation, power-series fractals, is recognized, since Pastor-Satorras and Wagensberg derived it. I'm unaware of any other application of fractal forms to optimization problems analogous to the paper. The following derivation is taken from the 2012 paper, with slight modification, in hopes that it might prove useful in other fields, not merely the earth sciences, but beyond. I'm investigating applications in a variety of other areas, from plate tectonics, to petroleum geology, and, oddly enough, the arts.
Pastor-Satorras
and Wagensberg (1998) showed that fractal self-similarity could be derived
utilizing Shannon’s (1948) information theory and Jaynes’ (1957) maximum entropy
formalism. For convenience, straight-forward derivations of both maximum
entropy and fractal distributions are provided here.
Mandelbrot (1982) provided a basic definition of the mathematical formulation of fractals:
p = k d-D
For statistical fractals the
formula is
p @ k d-D.
In either case, p is a measure of a self similar structure, such as
length, width, mass, or even probability, over a range of scales, d;
k is a constant, in part dependent on the units of the measure, if any; and D
is the fractal dimension.
Given
a measure space – for the purposes of this study, a two-dimensional map – the
map area can be divided into successive subareas. If the dimensions of the map
are 1 by 1 unit, divide the map into square cells of dimension 1/k by 1/k, in
which, for convenience, k=1, 2, 4, 16…N {kj = [1, for j=1; 1/( 2kj-1)
for j>1]}.
1.
The amount of information, I, supplied for each successive division j, is equal
to ln(1/kj2) = ln (dj) (see Shannon, 1948; or Kanasewich,
1974). The average (expected) information
<I> = Sj=1,N
pj ln(1/kj2) = Sj=1,N pj ln(dj) (eq. 1)
in
which pj is the probability of each division. (Note that dj could be equal to either the
inverse cell width, kj, or area, kj2, without
affecting the derivation.)
2.
In the presence of inadequate information, Jaynes (1957) proposed (using Grandy’s,
1992, paraphrase here): “The optimal probability assignment describing that
situation is the one which maximizes the information-theoretic entropy subject
to constraints imposed by the information that is available.” What is
the information-theoretic entropy? From Grandy’s derivation:
An experiment is repeated n times
with potentially m results. Thus there are potentially mn outcomes
of the experiments. Each outcome produces a set of sample numbers nj
and frequencies of occurrence of those sample numbers, fj = nj/n,
1 ≤ j ≤ m.
As a result, the number of outcomes
that yield a particular set of frequences fj is given by the
multiplicity factor:
W = n!/[(nf1)! . .. (nfm)!] (2)
The
set of frequencies (fj) that can be realized in the most ways is
that which maximizes the multiplicity, W.
It
is convenient to note that maximizing W is equivalent to maximizing ln (W).
Thus
ln(W)
= ln {n!/[(nf1)! . .. (nfm)!]} = ln (n!) – S
j = 1,m ln (nfj)! (3)
Stirling’s
approximation, ln(q!) @ q ln(q) – q , is applicable for
large n:
ln(W)
@ n ln(n) – n – S
j = 1,m [n fj ln(fj) – nfj ln (n) –
nfj ] (4)
and,
then
ln(W)/n
@ ln(n) – 1 – S
j = 1,m fj ln(nfj) – ln n S
j = 1,m fj - S j = 1,m fj (5)
As Sj = 1,m
fj = 1, therefore,
ln(W)/n
@ – S j = 1,m fj ln(fj) (6)
If
the set of frequencies accurately represent the probabilities (pj , j = 1,m) of
the phenomenon being investigated, then,
S = ln(W)/n @ – Sj = 1,m
pj ln(pj) (7)
which is Shannon information
entropy.
Shannon information entropy has the
following properties (Jaynes, 1957): (1) It is a continuous function of the
probabilities, (2) if all probabilities are equal, it is a monotonic function
of n (the number of probabilities), (3) grouping of events produces the same
value as separate events, and (4) S(m) + S(n) = S(mn), which can only be
satisfied by a form such as S(n) = k ln (n).
3. In the Pastor-Satorras and
Wagensberg formalism, using the Lagrangian variational principle, information
entropy (eq. A.7) is maximized subject to the constraining information (eq.
A.1) and normalization of the probabilities (Sj=1,N pj = 1) with Lagrange
multipliers a and b:
F
= – Sj = 1,N pj ln(pj) + a
[- Sj=1,N pj ln(dj)] + b [1 - Sj=1,N pj] (8)
Maximizing F
for each pi:
¶F/¶pi = 0 = - ln(pi)
– 1 – a ln(di) – b (9)
Reorganizing:
ln(pi) = – 1 + a ln(di) + b (10)
exp[ln(pi)] = pi =
exp (-b -1) di-a (11)
or
pi = g
di
-a (12)
in
which a
and g
= (-b
-1) are constants.
For such a power series, eq. 12,
the probabilities exhibit self-similarity across scales, a,,
and a, then, is equivalent to Mandelbrot’s (1975, 1982)
fractal dimension, D, above.
The novelty of the Pastor-Satorras
and Wagensberg (1998) derivation is the incorporation of the constraint of
average information, = Sj=1,N pj ln(d),
into the maximum entropy formalism (eq. A.8). Conventional applications of the
formalism utilize statistical measures, such as the mean or variance from experimental
data, as, e.g., = Sj=1,N pj f(xj),
rather than information.
Fractal structure can be interpreted
as the manifestation of iterative processes which propagate information kernels
across a range of scales, in such a manner that they achieve the most probable
outcome. And, the fractal structure, the information, is thereby apparent
across every scale over which the process operates.
References
(Links as of January 13, 2014; if link is broken, it may be necessary to search for it.)