Writing final papers with information theory

If the past is any indication, it’s likely that at this time of the semester, many Case Western Reserve University students will find themselves on the third floor of Kevin Smith Library at 2 a.m., rushing to finish final papers that they started only hours before. Suppose one such student, dejected after hours of fruitless work, gives up on actually trying to write his final economics paper and instead decides to cut his losses. Because he has done no research, he turns to Wikipedia and replaces the previously blank computer screen with various sentences, all randomly thrown together, in the hopes of scraping together whatever grade he still can.

Luckily, mathematics comes to the rescue of this poor soul. In writing any paper, it is logical to assume that the quality of the paper is related to how much information it contains. Intriguingly, mathematics, more specifically a branch called information theory, provides us with a quantitative measure for just that. Formally known as Shannon entropy, this measure allows us to give a value to the information content of a message by quantifying how much it differs from a purely random string of characters. The scale for this Shannon entropy goes from zero, which means that there is no randomness in the message and thus no new information is transmitted, to its maximum value, which occurs when the paper is made of randomly drawn letters, each letter having the equal probability of being chosen.

Thus, if our student were to randomly generate a sentence from, say, the Wikipedia page on economics in this manner, he would get something like “no   pm  .   ols5rgswininyDnf   ox  tde.” This string of characters has maximum entropy and thus zero information content. However, it’s clear that nobody wants his or her papers to be completely random. In fact, our student, in order to make a mathematically valid, albeit rather pedantic, argument to his professor, ought to be shooting for an entropy value around 2.8, which is approximately the average Shannon entropy for English text.

Suppose that our student, now versed in the basics of Shannon entropy, decides instead to sample random words from the Wikipedia page in the hopes of gaining a level of information similar to that of other economic writings. Indeed, such a sampling has an entropy value around our desired 2.8, however, it also produces sentences like, “Bagels looked along centers theories aligned.” While this may sound like a slam poem, an economics thesis it is not.

Therefore, we have found one of the main limitations of Shannon entropy. Mathematically we can only discuss information content in terms of how a message differs from a sample of equally likely random characters, not in terms of how much sense the produced sentences make. So while this method of random generation can create sentences with entropy values that are on par with papers in economic journals and can even make sentences like “prices and wages standard of considered key microeconomics”, unless you want to try and convince your professors that you deserve a passing grade based on an obscure field of advanced mathematics, my advice to you would be to avoid procrastinating in the next few weeks.