Search The Web

Today's Headlines

Tuesday, September 14, 2010

Why Are Hard Disk Sizes Always Less Than Advertised?

If you read reviews of hard disks and other data storage products, or participate in a lot of online discussions about computers, such as in help forums, you are bound to run into this sooner or later. The wording may be different, the tone may be different, but the meaning is always the same: My hard disk is supposed to be xx GB in size, but my operating system says it is only yy GB in size. Is there something hiding on my hard disk? Did the hard disk maker cheat me?

Many of these people asking the question may be new to computers. Others may have noticed the discrepancies only now. The fact is that there is nothing hiding on hard disks (usually), and the hard disk makers are not really cheating. The problem is that the same unit names are being used to mean two different quantities.

Since the time of the first computers, computer scientists have measured things in powers of 2 because computers are binary machines. Each bit (or binary digit) has a value of 0 or 1 (2 values, hence binary). A combination of 4 bits is called a nibble and a combination of 8 bits is called a byte (clever, huh?). Notice that 4 and 8 are powers of 2 too. Most CPU's today are 64-bit CPU's meaning that they can take in instructions that are 64 bits long in one processing cycle, and produce outputs that are 64 bits long. But before that there were 32-bit, 16-bit and 8-bit processores. 64, 32, and 16 are also powers of 2.

When memory became abundant enough to have reasonably large amounts of it on computers, higher units of memory than bits and bytes became necessary in order to avoid confusing everyone with long, unreadable numbers. Imagine there were no units of distance other than inches. Can you imagine how difficult it would be to express the distance between Los Angeles and Sydney, or the distance from the earth to the nearest star? That is why higher units of distance like miles, kilometers and light years were devised.

Similarly, computer scientists came up with kilobytes, and after that megabytes, gigabytes, terabytes, petabytes, and so on to express higher and higher amounts of bytes. Sticking to their original scheme of making all multipliers powers of 2 when it comes to dealing with bits and bytes, computer scientists also agreed to a convention in which a kilobyte is 1024 bytes, a megabyte is 1024 kilobytes, a gigabyte is 1024 megabytes, and so on.

Unfortunately, this is not strictly in compliance with the SI unit system which defines the prefixes kilo, mega, giga and tera in terms of powers of 10. Thus in the SI system of units, kilo means 1000 (thus a kilometer is 1000 meters), mega is 1,000,000 (thus a megaton is 1,000,000 tons), giga means 1,000,000,000 (1 billion), and tera means 1,000,000,000,000 (1 trillion). Note that 1000, 1,000,000, 1,000,000,000, etc. are not powers of 2).

This is the origin of the problem. At some point in the past, hard disk manufacturers decided to split with the computer scientists (either because of honest error or because it suited their purposes better). They started sticking to the SI system strictly, and made their definitions of kilobyte (KB), megabyte (MB), gigabyte, etc. different from that of computer scientists. In the hard drive makers' worlds, a kilobyte is 1000 bytes, a megabyte is 1000 KB, a gigabyte is 1000 MB and so on.

Operating systems are written by computer scientists, and they still measure memory using the computer scientists' units. Thus, when the operating system encounters a disk with 1,000,000 bytes (1 million bytes), it does not know that this was a disk designated by its manufacturer as having 1 MB in storage. Instead it sees it as having 1,000,000/(1024*1024) megabytes in storage. Obviously, this is lower than 1 MB, so the operating system reports the amount of space on the disk to be about 0.95 MB.

The problem is that as storage becomes more abundant, and disks become bigger and bigger, the discrepancy between advertised size and the size the operating system sees is going to get bigger and bigger. At every step of the way, hard drive manufacturers are multiplying the previous unit by 1000 to get to the next unit while computer scientists are multiplying the previous unit by 1024 to get to the next unit.

Hard drive size discrepancyThus the discrepancy between a hard drive maker's kilobyte and a computer scientist's kilobyte was only 1000/1024, or about 2.34%. The discrepancy between a hard drive maker's megabyte and a computer scientists megabyte is 1000*1000/(1024*1024), which is about 4.63%. Similarly, the discrepancy in gigabytes is about 6.87%, and in terabytes, it is about 9.05%. The accompanying chart illustrates the growing size of this dicrepancy as the units become bigger.

So, that is where the extra space is hiding. It is not holding some secret piece of software! And it is not sitting at the hard drive manufacturer's factory!! It is hidden in the obfuscation caused by the use of two different definitions for the same unit.

Some solutions have been proposed to get rid of this dual definition of units, but none has caught on so far. One of the simplest is to rename all binary prefixes with a "bi" at the end of the prefix to denote clearly that it is not an SI prefix, but a binary prefix. This scheme would make a computer scientists' kilobyte a kibibyte. Similarly the other computer scientists' units would become mebibyte, gibibyte, tebibyte and so on.

That still leaves open the confusion of what the short forms of those prefixes should be, and how they should be distinguished from the standard SI short forms for the prefixes, which are K for kilo, M for mega, G for giga, etc. One suggestion is to make the short forms for the binary suffixes two letters each, with the letter "b" appended to the standard SI short form. Thus hard drive manufacturers would have an MB, while computer scientists would have MbB. Hard drive makers would have GB and computer scientists we would have GbB, and so on.

I don't think these ideas are going to catch on quickly, if ever. In the meantime, we have to live with the discrepancy whenever we look too closely at hard disk sizes. Just remember that hard disk sizes as reported by the operating system will always be a few percent lower than the hard disk sizes claimed by the hard drive manufacturers. The discrepancy will be larger as the unit in which the hard disk size is expressed gets bigger. There is nothing nefarious about this discrepancy. There is nothing hidden on the hard drives. The discrepancy simply occurs because hard drive makers use a different definition of data storage units than software makers. Now you know!


cbdatabases said...

Thanks this is a very useful and well explained post.

Carlos said...

I concur!

Amar pawar said...

was very confused about got answer thanks for sharing.....My

Visitors Country Map

Free counters!

Content From

In the News

Article of the Day

This Day in History

Today's Birthday

Quote of the Day

Word of the Day

Match Up
Match each word in the left column with its synonym on the right. When finished, click Answer to see the results. Good luck!



Spelling Bee
difficulty level:
score: -
please wait...
spell the word:

Search The Web