Understanding Cell Phone Technology: Lossy Data Compression

Last week we began looking at data compression codes. We talked about three lossless compression schemes: Morse, Huffman, and Lempel Ziv (LZ). This week we’re going to turn our attention to methods for lossy compression.

Lossy compression? Why would anyone want lossy compression, that is, a compression code that actually throws away some of the original information? Who in the world would use it?

Consider the state of the world today. We all wander around with our cell phones, expecting to have high data rate audio-video communications with our friends using apps like Zoom. We expect to be able to stream content so we can watch movies, TV shows, and YouTube videos in near real time (NRT), that is, with minimal transmission and presentation delay. We also shoot and share photos and videos with our friends.

This is all a lot of data (images, video, etc.) streaming to my phone, and my cellular channel has limited capacity. What’s a typical cellular data rate? When I checked my network today, my phone was downloading at 70.2 megabits per second (Mbps) and uploading at 7.5 Mbps. Clearly, the bottleneck is on the upload side.

Let’s consider cell phone photos. Cell phones today have 12 megapixel (MP) cameras. A 12 MP image has a resolution of about 4000 x 3000 pixels in width and height. The phone’s camera uses a 24 bit Red Green Blue (RGB) system to capture the brightness and color of each pixel in a scene. That’s nearly 16.8 million (2^24) different color values for each of the 4000 x 3000 = 12 million pixels!  Therefore, a photo taken in a lossless compression format (like RAW) could be over 200 megabits (million bits), or about 25 megabytes (MB, where 1 byte = 8 bits). However, if I use a lossy compression protocol like JPEG, the same photo could take up only 3 MB.

If you’re sharing video, you need the image stream of photo frames to upload/download at around 30 frames per second if you want to view it in NRT. So even at the compressed image rate, the data is going to have to travel across the channel at 3 MB/frame * 30 frames/sec = 90 MB/sec!

It’s all really a game of tradeoffs. You can stream data faster if you’re willing to accept a poorer quality signal. Alternatively, if you want a higher quality signal, you might have to buffer it first. Buffering means waiting for the data to transfer and become locally available before you try to access it on your phone.

Let’s keep life simple and talk about lossy data compression for still images. JPEG (the counterpart for lossy video compression is MPEG) accomplishes its compression magic in three stages. First it performs a simple conversion of the RGB values into numbers for luminance (Y), blue chrominance (Cb), and red chrominance (Cr). Why do we do this? It turns out that our eyes are much more sensitive to luminance (brightness) than chrominance (color). JPEG preserves all the luminance information, and throws away a little of the chrominance information. JPEG accomplishes a little compression here by eliminating unnoticeable chrominance details.

We now split this image data, with its downsampled color components, into 8×8 blocks of pixels. We’re going perform a Discrete Cosine Transform (DCT) on each block of pixels. The DCT is a specific flavor of a Fourier transform. Several months ago, we talked about how the Fourier transform converts data from the time domain into the frequency domain. Since images are two dimensional, we are going to apply the DCT in both the horizontal and vertical directions. This will transform the pixel data captured in time into the frequency domain. When we are all done, each block will be an 8 x 8 table (matrix) of components of cosine waves with different amplitudes and phases.

The next step is where most of the compression magic takes place. We are going to quantize the values of the coefficients. Quantize is a fancy word, meaning we’re going to reduce the number of coefficients by throwing away the insignificant high frequency components. This is actually logical, given that the human eye is far less sensitive to high frequency components than low ones.

In the final stage, an entropy coding scheme is applied to perform one last stage of (lossless) data compression. A scheme typically used for this is called Huffman coding, which we talked about last week.

Our modern connected world allows people to send megabytes of information to others using their cell phones. We see the logical need for compression schemes that will squeeze out not only the redundant, but sometimes the less significant information, in order to transmit information efficiently across the available channel capacity. By efficiently we may mean “by using the available bandwidth” and/or “in NRT or with an acceptable level of delay”.

Next week, we’ll start talking about how error correcting codes add some desirable redundancy back into the information in order to allow the receiver to detect and correct noise-induced errors which occur during transmission.