To me, the lesson is: Do not duplicate code! and keep it simple stupid
When implementing Unicode, MS said "hey let's copy all of our API functions to new names". Now they have two problems.
When implementing 64-bit, MS said "hey let's have a completely different OS for 64-bit." Now they have four problems.
Compare this to Unix/Linux/MacOS where Unicode is just implemented as a standard way (UTF-8) of encoding characters into the existing API. And there need only be one shipping OS which supports both 32-bit and 64-bit process types equally since there are only a handful of kernel APIs compared to thousands for Windows.
There is a great deal of both misunderstanding and ignorant Microsoft bashing in this comment.
First of all, you are mixing up two completely different concepts.
For character encoding on Windows:
For many functions in the Windows API there two versions of a function, one with an A (for ANSI) at the end and one with a W (for wide). This was added to make it easier to support Win32 on both Windows 95, which used 8-bit characters and codepages and Windows NT which was natively utf-16 unicode. At the time utf-16 was considered the best and most standard choice for supporting unicode. In most cases it is implemented as W function with an A function that is little more than a wrapper.
This has nothing to do with what Raymond is describing.
For the 64-32 bit stuff they ensured that all code would compile and work correctly with both 32/64 bit stuff and built two versions, one for ia32 and one for amd64. The kernel would have to be modified to support the amd64 architecture. This is exactly what Linux, OSX and other operating systems that support multiple architectures do. On top of this, because amd64 supports backwards compatibility, they also included an ia32 environment with it as well, but this is optional, so anything that ships with the OS cannot depend on it. I assume this is what OSX does too, the only difference is that with Windows the two versions ship as different SKAs, and MacOSX ships with both versions and installs the one that the computer originally shipped with.
Second, the number of system calls has nothing do with any of this at all.
Windows unicode support predates the existence of UTF-8 -- so that's great API design for Windows if you possess a time machine.
The ANSI functions merely map to the unicode functions.
In addition, the Windows Kernel is probably similar in API size to the Linux kernel. Of course, that's not nearly enough API for a complete Windowing operating system in either case.
The point is both Unix and Windows faced the exact same problem: needing to support larger characters. MS thought little and unleashed their army of coders to do something foolish. The Unix guys thought hard and did something elegant and sensible.
NT was started around 88 and released in mid 93. UTF-8 wasn't ready until early 93. It's too bad they couldn't go back in time to retrofit everything.
Programming is a series of tradeoffs. Unless you are in the middle of doing it, you don't understand the pressure the programmers were facing and the tradeoffs needed to be made. The Windows and Unix guys didn't face the same problem since they were different problems with different tools available in different time periods. Hindsight is 20/20 and is easy to be dickish to laugh at their mistakes afterward.
A huge amount of Unix software uses UTF-16 just like Windows (including Java). You're just being deliberately ignorant of the history. UTF-8 didn't exist and UCS-2/UTF-16 was the standard. One could argue that the Unicode Consortium screwed up assuming that Basic Multilingual Plane would be enough characters for everyone.
I am not ignorant of the history. Please try to follow this reasoning:
1. Faced with a new character set which was larger than eight-bits (Unicode 16-bit) Microsoft said "hey let's make an all-new API" and set to work rewriting everything
2. Faced with a new character set which was larger than eight-bits (Unicode 32-bit), the Unix guys said "hey let's create a standard way to encode these characters and rewrite nothing.
You seem to be fixated on the difference between the new character sizes. Ignore the precise number of bits! The point is when making a change to adapt to a new system, do you rewrite everything and risk causing bugs everywhere, or do you do something clever which has far less risk and uses the same API?
#2 massively ignores the fact they didn't bother to solve the problem until much much later. In fact, everyone else solved it the same way as Microsoft before then even on Unix. Actually, a bunch of Unix guys were involved in the design of UCS-2 and UTF-16 so I'm not sure why it's Microsoft's fault.
But yes, some Unix guys eventually faced with a bigger problem, significantly more time, and a design already started by the Unicode Consortium eventually solved it better. But that's not really much of an argument.
Also arguing that there is no risk going to UTF-8 is ridiculous. Anything that treats UTF-8 as ASCII, as you suggest, is going to do it wrong in some way. At least making a new API forced developers to think about the issue.
They didn't exactly face this problem. Linux kernel actually mostly has no idea of any kind of unicode or encoding except two places: character console code and windows-originated unicode based filesystems. It's interesting to note that NTFS in windows kernel implements it's own case folding mechanism for unicode and that this is also probably only significant place where windows kernel has to care about unicode.
What are the consequences of your preferred approach for legacy code? While there are certainly disadvantages of duplicating APIs in terms of cruft, I think that many customers appreciate that old code continues to run (and compile) against newer versions of the OS.
I believe you'll find it to be Windows <insert version here> 64-bit Edition.
A quick googling suggests that Windows 7 retail copies have both 32-bit and 64-bit editions, but I'm guessing you still have to pick the right edition when you go to install it. And if you don't have a retail disc, then you probably only got one of the two architectures.
Compare this with Mac OS X, where there is no "32-bit edition" or "64-bit edition", there is just "Mac OS X". Everybody installs the same OS, and it transparently works for both 32-bit and 64-bit machines. The only time when you really have to care is if you're installing a 3rd-party kext (kernel extension), as 32-bit kexts won't work on a 64-bit kernel, but that's pretty rare and any kext that's still being maintained today will support both architectures.
That's a nice description of what's going on, but it really boils down to:
* Running 32-bit kernel on 32-bit hardware
* Running 32-bit kernel (mostly; enough that kexts are 32-bit) on 64-bit hardware
* Running 64-bit kernel on 64-bit hardware
Interestingly, for a while only the Xserve defaulted the kernel to 64-bit mode, whereas all consumer machines used the 32-bit kernel mode, though this could be toggled using a certain keyboard chord at startup (I think it was holding down 6 and 4). Eventually this shifted so the 64-bit kernel became the default, but only after giving kext authors enough warning so they could update their kexts.
In any case, as diverting as this is, it doesn't really matter to the consumer. There was only one version of OS X to install, and it ran in whatever mode was appropriate for the machine in question. The only reason consumers ever had to care what mode their kernel was running in was if they wanted to use a kext that did not have both 32-bit and 64-bit support, and by the time the kernel switched to 64-bit mode by default, this was pretty rare.
> but I'm guessing you still have to pick the right edition when you go to install it. And if you don't have a retail disc, then you probably only got one of the two architectures.
Good guess. They are separate versions, and only retail includes both.
And, whats more, you can install Leopard on a 64 bit x86 machine, then dd the HD over to a ppc machine, and it will boot, as long as you use the right partition format.
I do believe that Mountain Lion dropped 32-bit kernel support. But Apple hasn't produced a 32-bit machine in years, and every new major version of the OS does tend to drop support for old hardware. The fact that the motivating factor here was dropping the 32-bit kernel is fairly irrelevant.
Even when you look at syscalls themselves there is difference: Linux kernel deals with mostly opague strings of bytes (that are today by user-space side convention mostly utf-8) while NT kernel deals mostly with UCS-2 unicode codepoint strings (that are sometimes UTF-16 and this way madness ensues).
Again, you are confused with user mode API (Win32/64) and kernel mode service API. Windows does have kernel mode service API. Just it's not well known since most people don't need to deal with them.
When implementing Unicode, MS said "hey let's copy all of our API functions to new names". Now they have two problems.
When implementing 64-bit, MS said "hey let's have a completely different OS for 64-bit." Now they have four problems.
Compare this to Unix/Linux/MacOS where Unicode is just implemented as a standard way (UTF-8) of encoding characters into the existing API. And there need only be one shipping OS which supports both 32-bit and 64-bit process types equally since there are only a handful of kernel APIs compared to thousands for Windows.