Compressed files

If you have ever downloaded anything from the Internet, you might have seen a file extension called "zip". You may already know that downloading a zip is like downloading a box containing all the files you were asking for.

But how does this zip file become one single file with many files in it? Haven't you wondered? Well, the idea is easy to understand. A compressed file is usually made and stored in the following way:

+ First, the program will make a list of the files you want to compress.
+ Then, it will start reading them, looking for common patterns.
+ Next, it will replace the common patterns with symbols, and create a sort of directory of which symbols it is using to replace each pattern.
+ Lastly, it will rewrite everything in a single file, adding the list of files and the directory.

Would you like an example? Sure.

Let's imagine we have two small files, each with a short text. In order to make it short and easy, I'll use tongue twisters:

File #1 is dummyfile1.txt, which reads:

"I wish to wish the wish you wish to wish, but if you wish the wish the witch wishes, I won't wish the wish you wish to wish."

...and file #2 is dummyfile2.txt, which reads:

"Del pelo al codo y del codo al pelo, del codo al pelo y del pelo al codo."

Are you ready to compress? Here we go.

Let's start by making our list of files:

"dummycompressedfile" will include:

dummyfile1.txt
dummyfile2.txt

Now let's read the files. The program will look for common patterns. I'll make a fast check:

"wish ", "the ", "you ", "to ", "l pelo", "l codo", "del " and "al "

OK, next I will take this common patterns and replace them with a few symbols:

"wish " will turn into [
"the " will turn into ]
"you " will turn into {
"to " will turn into }
"l pelo" will turn into +
"l codo" will turn into *
"de" will turn into _
"wi" will turn into -

Great, now we are ready to compress the information above. Let's replace:

I [}[][{[}-sh, but if {[][]-tch -shes, I won't [][{[}-sh.
/next file/
_+ a* y _* a+, _* a+ y _+ a*.

You see? Those two tongue twisters turned into a shorter chain of characters. Now let's write our file:

dummycompressedfile includes:


Start of file

list of files:
dummyfile1.txt
dummyfile2.txt
directory of replacements:
"wish " = [
"the " = ]
"you " = {
"to " = }
"l pelo" = +
"l codo" = *
"de" = _
"wi" = -
I [}[][{[}-sh, but if {[][]-tch -shes, I won't [][{[}-sh.
/next file/
_+ a* y _* a+, _* a+ y _+ a*.

End of file.


That was easy, huh? Although this example is not exactly shorter than the original, I am sure you get the point. You will always get better results if you compress larger files.

Questions and answers about compressed files

What criteria does the computer use to compress the information?

It depends on the compression format used by the software. Different programmers have created different ways to choose patterns and to store them in a compressed file. This is why there are so many compressed-file computer formats these days.

What are compressed files used for?

Compressed files have a variety of uses these days, but I will mention four examples. For a start, it allows a faster transmission of information, especially over the Internet. Every web page you see on Heptagrama, for example, arrives to your computer compressed, saving you downloading time. Your browser decompresses it and shows it to you, and it does this without making mistakes because it knows (and my web server is reminds it) that all the information that is being sent to your computer is compressed using the gzip compression format. This, among others, allows this site to open fast in your computer.

In addition, you may have noticed that many files you download from the Internet come in compressed format —usually Zip. If the Zip format wasn't there, we would perhaps need to download not one, but dozens or even hundreds of small files one by one before being able to use what we are downloading! Compressed files really save time.

Third, compressed files can be used to provide some more protection to sensitive or important files, as they can be password-protected. Compress a password protected file in a password-protected zip file, and you will get double protection!

Lastly, to make backups of your files, it is always easier to organize your personal information into ten or twelve large categories than to copy every single file one by one. Just create ten or twelve large compressed files with your information, and burn them into a CD or DVD. That's it.

Doesn't the computer lose information when compressing and decompressing the information?

Yes and no. It depends on the compression format, actually. Some compression formats will not lose any information. They are called "lossless" by experts (e.g. zip, 7z, and flac file formats). Others will drop a few chains of "unnecessary" information to help reduce the final file size. They are called "lossy" compression formats (e.g. jpg, gif, and mp3 file formats).

Experiment with compressed files and see what you can learn from them.


Knowledge + Computers