Many of these people asking the question may be new to computers. Others may have noticed the discrepancies only now. The fact is that there is nothing hiding on hard disks (usually), and the hard disk makers are not really cheating. The problem is that the same unit names are being used to mean two different quantities.
Since the time of the first computers, computer scientists have measured things in powers of 2 because computers are binary machines. Each bit (or binary digit) has a value of 0 or 1 (2 values, hence binary). A combination of 4 bits is called a nibble and a combination of 8 bits is called a byte (clever, huh?). Notice that 4 and 8 are powers of 2 too. Most CPU's today are 64-bit CPU's meaning that they can take in instructions that are 64 bits long in one processing cycle, and produce outputs that are 64 bits long. But before that there were 32-bit, 16-bit and 8-bit processores. 64, 32, and 16 are also powers of 2.
When memory became abundant enough to have reasonably large amounts of it on computers, higher units of memory than bits and bytes became necessary in order to avoid confusing everyone with long, unreadable numbers. Imagine there were no units of distance other than inches. Can you imagine how difficult it would be to express the distance between Los Angeles and Sydney, or the distance from the earth to the nearest star? That is why higher units of distance like miles, kilometers and light years were devised.
Similarly, computer scientists came up with kilobytes, and after that megabytes, gigabytes, terabytes, petabytes, and so on to express higher and higher amounts of bytes. Sticking to their original scheme of making all multipliers powers of 2 when it comes to dealing with bits and bytes, computer scientists also agreed to a convention in which a kilobyte is 1024 bytes, a megabyte is 1024 kilobytes, a gigabyte is 1024 megabytes, and so on.
Unfortunately, this is not strictly in compliance with the SI unit system which defines the prefixes kilo, mega, giga and tera in terms of powers of 10. Thus in the SI system of units, kilo means 1000 (thus a kilometer is 1000 meters), mega is 1,000,000 (thus a megaton is 1,000,000 tons), giga means 1,000,000,000 (1 billion), and tera means 1,000,000,000,000 (1 trillion). Note that 1000, 1,000,000, 1,000,000,000, etc. are not powers of 2).
This is the origin of the problem. At some point in the past, hard disk manufacturers decided to split with the computer scientists (either because of honest error or because it suited their purposes better). They started sticking to the SI system strictly, and made their definitions of kilobyte (KB), megabyte (MB), gigabyte, etc. different from that of computer scientists. In the hard drive makers' worlds, a kilobyte is 1000 bytes, a megabyte is 1000 KB, a gigabyte is 1000 MB and so on.
Operating systems are written by computer scientists, and they still measure memory using the computer scientists' units. Thus, when the operating system encounters a disk with 1,000,000 bytes (1 million bytes), it does not know that this was a disk designated by its manufacturer as having 1 MB in storage. Instead it sees it as having 1,000,000/(1024*1024) megabytes in storage. Obviously, this is lower than 1 MB, so the operating system reports the amount of space on the disk to be about 0.95 MB.
The problem is that as storage becomes more abundant, and disks become bigger and bigger, the discrepancy between advertised size and the size the operating system sees is going to get bigger and bigger. At every step of the way, hard drive manufacturers are multiplying the previous unit by 1000 to get to the next unit while computer scientists are multiplying the previous unit by 1024 to get to the next unit.
So, that is where the extra space is hiding. It is not holding some secret piece of software! And it is not sitting at the hard drive manufacturer's factory!! It is hidden in the obfuscation caused by the use of two different definitions for the same unit.
Some solutions have been proposed to get rid of this dual definition of units, but none has caught on so far. One of the simplest is to rename all binary prefixes with a "bi" at the end of the prefix to denote clearly that it is not an SI prefix, but a binary prefix. This scheme would make a computer scientists' kilobyte a kibibyte. Similarly the other computer scientists' units would become mebibyte, gibibyte, tebibyte and so on.
That still leaves open the confusion of what the short forms of those prefixes should be, and how they should be distinguished from the standard SI short forms for the prefixes, which are K for kilo, M for mega, G for giga, etc. One suggestion is to make the short forms for the binary suffixes two letters each, with the letter "b" appended to the standard SI short form. Thus hard drive manufacturers would have an MB, while computer scientists would have MbB. Hard drive makers would have GB and computer scientists we would have GbB, and so on.
I don't think these ideas are going to catch on quickly, if ever. In the meantime, we have to live with the discrepancy whenever we look too closely at hard disk sizes. Just remember that hard disk sizes as reported by the operating system will always be a few percent lower than the hard disk sizes claimed by the hard drive manufacturers. The discrepancy will be larger as the unit in which the hard disk size is expressed gets bigger. There is nothing nefarious about this discrepancy. There is nothing hidden on the hard drives. The discrepancy simply occurs because hard drive makers use a different definition of data storage units than software makers. Now you know!
















2 comments:
Thanks this is a very useful and well explained post.
I concur!
Post a Comment