The compiler (more properly the "implementation") is free to choose the sizes, subject to the limits in the C standard (for instance int must be at least 16 bits). The compiler optionally can subject itself to other standards, like POSIX, which can add more constraints. For example I think POSIX says all data pointers are the same size, whereas the C standard is perfectly happy for sizeof(int*) != sizeof(char*)
.
In practice, the compiler-writer's decisions are strongly influenced by the architecture, because unless there's a strong reason otherwise they want the implementation to be efficient and interoperable. Processor manufacturers or OS vendors often publish a thing called a "C ABI", which tells you (among other things), how big the types are and how they're stored in memory. Compilers are never obliged to follow the standard ABI for their architecture, and CPUs often have more than one common ABI anyway, but to call directly from code out of one compiler to code out of another, both compilers have to be using the same ABI. So if your C compiler doesn't use the Windows ABI on Windows, then you'd need extra wrappers to call into Windows dlls. If your compiler supports multiple platforms, then it quite likely uses different ABIs on different platforms.
You often see abbreviations used to indicate which of several ABIs is in use. So for instance when a compiler on a 64 bit platform says it's LP64, that means long
and pointers are 64bit, and by omission int
is 32bit. If it says ILP64, that means int
is 64bit too.
In the end, it's more a case of the compiler-writer choosing from a menu of sensible options, than picking numbers out of the air arbitrarily. But the implementation is always free to do whatever it likes. If you want to write a compiler for x86 which emulates a machine with 9-bit bytes and 3-byte words, then the C standard allows it. But as far as the OS is concerned you're on your own.
与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…