Monday, November 16, 2009

The Internet Repetition Code

In engineering, the signal-to-noise ratio, denoted S/N, compares the strength of a given signal to that of the noise that is distorting the signal. The S/N measure has been used more generally to describe the level of meaningful information as compared to spurious data. For example, the percentage of SPAM in email might be thought of as an S/N measure. About a year ago Dave Armano posted about the falling S/N rate on the internet in general, and how Twitter will be acting as a social filter to improve signal strength. We can squeeze a little more out of the S/N metaphor, à la Moore's Lore and Attention Crash.

The S/N term is from communication engineering referring to errors induced in data during transmission - what is sent is often not what is received, if it is received at all. The traditional source of noise is mother nature, omnipotent and largely benevolent to our purposes, altering bits through lightening, static, surges and so on.

Coding theory is the study of how to add redundancy (additional bits) to data so as to detect and correct errors (noise) induced by the communication channel. One of the simplest codes is the aptly named repetition code. In such codes each bit of the message is repeatedly sent an odd number of times so that a simple majority decode rule can be followed to recover the original message. For example, the repetition code may send each bit three times and then decide to decode a given bit as a one if two or more – the majority – of the received bits for any message bit are in fact ones. So in general, if each bit is repeated 2*K + 1 times then the code can tolerate up to K errors (bit flips from one to zero, or vice versa).

Repetition codes are not very efficient as they increase the size of a given message by a factor of 3 or 5 say, depending on the repetition parameter. The redundancy added by a repetition code is just additional copies of the message itself, whereas more sophisticated codes add a smaller more intelligent amount of redundancy that can specifically pin point errors and correct them. More compact and intelligent forms of redundancy come at a cost of additional message processing for both the sender and the receiver. A definite advantage of repetition codes is simplicity both in encoding, and particularly decoding – just keep a tally of the count for the received zeroes and ones for each message bit and decode the sent bit as the larger of the two counts.

Even so, repetition codes are not particularly popular, but most courses on coding theory use these codes to establish some basic concepts. For example, if bits in the channel flip independently with probability p then it is a nice exercise in binomial summation to compute the probability that the correct message will be received after decoding. There was an unexpected application of repetition codes in security last year when researchers in the Cold Boot Attack used these codes to analyse the best way to extract information key bits from RAM.

More generally, repetition codes are suitable for application in communication environments that are subject to fading, and we can certainly think of powerless RAM as a fading channel, with bits decaying towards their ground state. Fading is produced not just by loss or attenuation of signal, but also by a signal “colliding with itself”, as it arrives at the receiver through multiple paths, in particular when the environment of the receiver has objects that act as reflectors. The signal bounces off these reflectors and arrives multiples times at the receiver, superimposed upon itself. In this case repetition codes are effective in recovering the original message from amongst the many versions of itself.

Now let’s move up the communication stack, away from network mediums and their codings, towards the layer of applications and content, where Mr. Armano is looking for Twitter to filter content into a more favourable S/N ratio. Here signal and noise are no longer physical combatants – there is just what you like and what I like, by the way of content.

For each of us the web is a noisy channel, which we express through the need to search, subscribe, aggregate, recommend, post, tweet – in short a great cull of what finds its way onto our screens. Nicholas Carr has described the web, in particular the blogosphere, as a great echo chamber, with too many posts chasing too little content. Largely then, a high-level repetition code is in operation.

The fading channel where this repetition code thrives is, of course, our attention span. Messages ricochet from one reflective source to another, with content vying to sink into the collective consciousness of the web readership. While there is plagiarism of content, by an internet repetition code we mean those relatively minor adjustments that increase the visibility of the same base content while giving downstream authors some sense of originality or contribution.

The miniskirt theory of web readership states that each message has just over 60 words to pass the “So what?” criterion of short-term attention. A repetition code seems the best strategy here - increase the retention of long-term memory by increasing the number of entries into short-term memory. And all this works well when bandwidth is plentiful and cheap, and participants are youthful, motivated and awash with time.

Twitter itself is a wonderful example of a repetition code - the whole premise of the service is to provide massive asynchronous mirroring of short content. Will Twitter cut through the Internet repetition code, and boost the signal for Mr. Amano? The answer is yes of course, but only for a while. Twitter as a pure repetition or subscription service is already on the decline, and a microcosm of clients, tools, tagging and groups has been overlaid to make it look and feel less like raw Twitter. So we are now culling already culled content.

Welcome to the repetition of Web 2.0.

4 comments:

Michael Janke said...

I'd presume then, that the quality of the signal processing, and hence the S/N ratio, would depend somewhat on whom you choose to follow on something like twitter.

Worst case, my signal processors would lead me to dancing bears and mindless cats. Best case, they lead me to Feynmans lectures.

It's not hard to find signal processors on social media that amplify background noise, like a big feedback loop. It's also not hard to find signal processors that have poor echo cancellation.

Unknown said...

Michael, on web 2.0 the S/N ratio is certainly personal, and perhaps Twitter can increase someone's signal strength. But I would think not for anyone really in the long run - even if you found perfect signal it would just get too much. In the short term we live with repetition.

cheers Luke

Anonymous said...

Emensuits give big discount in all sits products such as [mens suit to check more visit at emensuits.com

Smith said...

Great Article
IEEE Projects on Information Security
Project Centers in Chennai



JavaScript Training in Chennai
JavaScript Training in Chennai