Replying to LO26110 --
"Malcolm Burson" <mburson@mint.net> wrote:
> Back on February 5, Michael Bayers, replying to Peggy Stuart, wrote in part,
> > Here's a
> > paraphrase from a textbook I used twenty years ago: something is
> > information if it serves to reduce uncertainty in a decision-making
> > situation. That is, the value of information lies in its ability to
> > reduce uncertainty.
>
> Michael, I'm intrigued and a little disturbed by your assertion. Do you
> mean to suggest that whatever "information" may be, its _only_ value lies
> in its capacity to reduce uncertainty so we can make decision?
>
> To me, this would suggest that information that charms, amuses, unsettles,
> creates awe...well, you can see where I'm going... Would we say, in other
> words, that information is valuable only for its immediate utility?
> what, then, of the "information" created about the beauty of the universe
> I obtain from a Bach unaccompanied sonata?
>
> Or have I misunderstood you?
>
> Malcolm Burson
> Director of Special Projects
> Maine Department of Environmental Protection
Let's turn all of the dialog on LO into 1's and 0's (oops--too late--we
already do that). We can define how much "information" is in yesterday's
digest by counting the number of 1's and 0's. More of them denotes more
information, right?
Not quite. Sue can say something in 2 sentences that Sarah takes 40
sentences to communicate, so there should be something in the measure of
information that takes redundancy into account.
Before we get that far, let's think simply about communicating a distinct
message from a set M of N different messages, where N=2^n. (I'm borrowing
this argument from Henze's _Einführung in die Informationstheorie_.) M
can include messages such as the names of who won the World Series (US
baseball championships) last year. There are approximately 32 teams (I'm
not an avid sports fan, so it may not be 32, but 32=2^5 is handy). I
could save bits and transmit a string of 1 '1' and 31 '0's to tell which
team won instead of sending a sports report. We'd agree up front which
position in the row represented each team, and I'd ensure that position
contained the sole '1'. Assuming it were equally likely that any of the
32 teams would win, then the "information" contained in any one year's
message would be log(base2) N = log(base2) 32 = 5. Put another way, I
could encode that string of 32 1's and 0's in a 5 bit binary number.
(I'm beginning to sound a bit like At, I suspect, with the math.)
Now we know it's highly unlikely that all teams have an equal shot at
winning. Indeed, after the end of the regular season, a number of teams
have gone home, leaving a smaller number of teams eligible. By the end of
the playoffs, there are only 2 teams left. So when you get a string of 32
1's and 0's that says the Yankees won, you don't really get 5 bits of
information; you get much less, as it wasn't 5 bits worth of a surprise.
Claude Shannon developed a formula describing the the average information
content of a message, and he called that entropy. For example, if you
simply get a message saying which of 2 teams won the World Series, you
might say that contains 1 bit (log(base2) 2 = 1). However, if you had
strong reason to believe the Yankees would win (say, you thought they had
a 75% chance), then hearing that they won would provide little surprise
and little "information." Hearing that the Mets won instead would indeed
surprise you. Somehow you get more information when you hear that the
Mets won than when you hear that the Yankees won. (Obviously, if I'm
speaking about last year's series and you know the result already, you get
_no_ information from that message; you already have gotten it.)
Without going into too much more detail, the entropy for this example
would be
H(Yankees, Mets) = -(0.75 log(base2) (0.75)) - (0.25 log(base2) (0.25))
= 0.81 bits
That is, before the message was sent, you can expect to learn 0.81 bits of
information from whatever message you receive. Had you believed the
Yankees had a 90% chance of winning, you would only have gotten 0.47 bits
of information from the message. Of course, if either were equally
likely, you'd have gotten a full 1 bit of information.
Please understand that this isn't meant to diminish our pleasure at art or
music, for example; going that way might land us in one of Fred Kofman's
ontological inversions. Information theory was intended primarily to give
a tool for talking about information transmission capacities. If you send
that message with 2 bits, for example, rather than 0.81, you have
redundancy in your message. That can be good (it can help you reduce
errors induced by noise/corruption in the message channel), or it can be
bad (you could have gotten 2.47 times as much information sent in the same
time in a channel with no noise, had you eliminated all redundancy by
encoding the knowledge more carefully).
Bill
-- Bill Harris 3217 102nd Place SE Facilitated Systems Everett, WA 98208 USA http://facilitatedsystems.com/ phone: +1 425 337-5541Learning-org -- Hosted by Rick Karash <Richard@Karash.com> Public Dialog on Learning Organizations -- <http://www.learning-org.com>
"Learning-org" and the format of our message identifiers (LO1234, etc.) are trademarks of Richard Karash.