Consider a simple EXE COM server, FooServer. Let's assume FooServer exposes an object Foo that has a method Bar: Code: IFoo = interface procedure Bar; end; Let's also assume that FooServer has no clue what multithreading is and is therefore single-threaded just like any normal application. Consider 3 clients (C1, C2, and C3) each of which creates an instance of Foo from FooServer. All 3 clients then call IFoo.Bar simultaneously (assume C1 got through first, followed by C2, then C3). Since FooServer is single-threaded, it will process C1's call to completion, and then process C2's call to completion, and then process C3's call. Why? Single-threading enforces a single path of execution meaning that FooServer can only process calls sequentially one after the other. Assuming that IFoo.Bar takes a minute to execute, C3 will take at least 3 minutes to execute IFoo.Bar. This is because it has to wait until C1 and C2 are completed, each of which takes a minute. Wouldn't it better if the server could be a bit fair and somehow divide its time equally for the 3 clients since all 3 simultaneously called into the server anyway? Or better yet, if FooServer was run on a kick-ass 10 processor machine, wouldn't it be cool if all 3 clients were processed simultaneously on 3 separate processors? A simple way to improve FooServer is to create a thread for each instance of Foo. Using our example, C1 would get a Foo on server thread 1 (T1), C2 would get a Foo on server thread 2 (T2), and C3 would get a Foo on server thread 3 (T3). This way, when all 3 clients simultaneously call IFoo.Bar, T1, T2, and T3 would all kick in to do the work simultaneously. And if you had a multiprocessor machine running FooServer, T1, T2, and T3 can each execute on its own processor! Now that we have multithreaded FooServer, let's try to simplify and formalize a few things. Consider the Foo (Foo1) instance on T1. Since Foo1 "belongs" to T1, it would make sense to simplify our lives a bit by making T1 be "in charge" of Foo1, i.e. an outside call into any of Foo1's methods must execute on T1, and only on T1. Also, our idea of FooServer creating a separate thread per instance of Foo might not always be a good idea. Why? If a hundred clients each created Foo, that would instantly create a hundred threads in FooServer. For a thousand clients, that's a thousand threads; for a hundred thousand clients, that's a hundred thousand threads, and so forth. You get the idea! What would be better is to instead allow each thread to possibly "hold" more than 1 instance of Foo resulting in a form of thread reuse. What I mean by this is let's say we have 9 clients connecting to FooServer, each of which creates an instance of Foo. We can still enforce the 9 objects into threads T1, T2, and T3 in a more ingenuous way: Figure1: Distributing 9 Foo objects into 3 threads As we can see, each thread holds 3 Foo instances and 3 threads can hold 9 Foo's thus, satisfy all 9 clients. This way, we don't unnecessarily waste resources had we used our one-thread-per-Foo-instance rule. However, because we simplified things a little bit, instances that "belong" to single thread are all serviced on that thread, and only on that thread. Foo1, Foo2, and Foo3 are all serviced on T1 meaning clients that simultaneously call into Foo1, Foo2, and Foo3 are sequentially serviced (one after the other) on T1. The same behavior applies for T2 with Foo4, Foo5, and Foo6, and also for T3 with Foo7, Foo8, and Foo9. Let's take a closer look again at what we've just seen: Figure2: 3 single threaded apartments in a server Each box in the above diagram is what COM calls the Single Threaded Apartment. Surprisingly simple isn't it? Single Threaded Apartment (STA) Let's break the term down to 2 parts: "apartment" and "single threaded". "Apartment" is used to describe a place for a group of objects "living" together and sharing the same threading behavior and requirements. An apartment is an abstraction of "a container of objects and threads" similar to how a thread is an abstraction of "a path of execution". "Single threaded" is used to denote that the threading behavior for this particular apartment is such that there is 1 thread, and only 1 thread, that executes in the apartment, ever. Got it? If not, read the entire paragraph three more times. Here's a few important characteristics of the STA: 1. There is 1 and only 1 thread in an STA. All objects that reside in the STA are serviced on this one thread. 2. A server can implement multithreading by creating multiple STAs. Since an STA contains 1 thread, multiple STAs result in multiple threads. Furthermore, each STA can "house" more than 1 object. This prevents the server from excessively creating a lot of STAs! Because an STA contains only 1 thread, it is said to be serialized by default. What this means is that if 2 clients simultaneously call into each of its objects, residing in the same STA, the STA architecture will ensure that the 2 calls are serialized (processed sequentially one after the other). Figure3: Default STA serialization behavior This serialization behavior in the STA is implemented by COM using the standard Win32 window messaging architecture. What happens is COM interjects a hidden window between the client and the object in the STA. Whenever the client calls into an object in the STA, COM intercepts the call, bundles it into a windows message, and then posts the message into the message queue of the hidden window. Once the hidden window gets to the message in its queue, it unbundles the message and then makes the actual call into the STA object. This is all made possible because the hidden window runs on the one thread that lives in the STA. In order for the hidden window to receive and handle calls/messages, it must continuously check and process its message queue. This means that the STA thread is required to run a message pump (a process which checks and handles messages from a message queue) in order to live. Figure4: COM STA architecture If the STA thread does not pump messages properly, the hidden window won't see the messages and thus, would not process them. In this case, a client making a call into an STA object will appear to be hung/unresponsive simply because the call is not coming through into the server! Now that you've seen what an STA is, let's look at how FooServer would implement STAs for its Foo object. Recall our previous discussion on the class factory. To refresh your memory, the class factory is the standard "gatekeeper" used to create instances of objects from your server, i.e. when a client wants to create an object, it first goes to the class factory, and the class factory creates the actual object. Because of this, it makes sense to somehow associate the creation of threads with the class factory so that we can control how to create threads when the class factory creates the objects. For instance, the following pseudocode provides a simple implementation of a thread-per-object allocation for Foo (also recall that the IClassFactory.CreateInstance method is where object instances are created): Code: // CreateInstance called everytime client asks to create a new Foo instance procedure FooClassFactory.CreateInstance; begin // create new thread NewThread = CreateANewThread; // create new Foo on new thread NewFooInstance = CreateAnInstanceOfFoo on NewThread; // return newly created Foo to client Result = NewFooInstance; end; What's happening here is that everytime a client requests to create a new Foo instance, Foo's class factory will create a new thread and then create an instance of Foo on that new thread. But wait, where's the STA here? All we've created is a new thread and a new Foo instance, but not the apartment. The answer's actually simple. It's the thread that decides what type of apartment it wants! It does this by calling the CoInitializeEx (or CoInitialize in the old days) COM API. Each thread in your application that wants to work with COM must request to initialize an apartment so that COM knows how to work with that thread! This is a very important rule because an apartment is characterized by how tough it is when it comes to handling multiple threads. In other words, an apartment defines what and what cannot be done with the objects and threads contained in it. The process of a thread initializing an apartment means that that thread wants to work only within the boundaries of its apartment's rules. An apartment is a real, living entity in COM! For each apartment, COM needs to allocate resources in order for the apartment function properly. CoInitializeEx tells COM to do whatever needs to be done to setup an apartment for the calling thread. CoInitializeEx must be called prior to any code in a thread that interacts with COM. This usually means that a thread must call CoInitializeEx as its first statement. For our FooClassFactory, the new thread would initialize as follows if it wants the STA: Code: // Thread routine created from the line NewThread = CreateANewThread // in FooClassFactory.CreateInstance procedure NewThreadFunc; begin // initialize apartment CoInitializeEx (SingleThreaded); ... do work here ... // leave apartment CoUninitialize; end; The SingleThreaded parameter indicates that the new thread is interested in initializing itself into an STA. The CoUninitialize call towards the end is important and indicates that the thread is terminating and tells COM to do any cleanup of resources that were allocated when the apartment was initialized. In order for our new thread to be a bonafide STA service thread, we also need a message pump so a more accurate pseudocode would look like this: Code: procedure NewThreadFunc; begin CoInitializeEx (SingleThreaded); // message pump while MoreMessagesInQueue do begin GetMessageFromQueue (Message); ProcessMessage (Message); end; CoUninitialize; end; Important note: The message pump is only necessary for an STA that contains objects to be used from other apartments, i.e. if the STA serves objects to clients. For STAs that simply create an object, call a method, and release the object, there is no need for a message pump. Another important thing to remember when working with STAs is proper synchronization. A single STA is itself synchronized on its sole thread. An object within the STA does not need to do any extra synchronization on its instance data because the STA architecture does the synchronization for you for free. However, multiple STAs (meaning multiple threads) can easily step on each other when accessing global data. Because of this, objects from different STAs must properly synchronize among themselves when accessing global data: Figure5: Multiple STAs must properly synchronize access to shared data Thus, the correct way to handle global data is to use a locking mechanism when accessing the data. This pseudocode shows how a simple locking mechanism is performed using a Win32 critical section: Code: procedure TFoo.Bar; begin EnterCriticalSection; AccessGlobalData; LeaveCriticalSection; end; This ensures that if 2 Foo objects, each on different STAs, simultaneously execute Bar, the AccessGlobalData routine will only execute one at a time preventing any possibilities of data corruption. The automatic STA serialization has its advantages and disadvantages. You can develop an object that doesn't need to worry about instance data synchronization by simply putting it in the STA. However, the STA serialization imposes a few limitations: 1. If you have objects that have no danger (or no need) of instance data corruption across threads, you don't really need the automatic serialization feature of the STA. It'd just be unnecessary overhead. In this case, throughput can be much better without the synchronization. 2. A client (on a different thread) talking to an STA object (on another thread) incurs some overhead. This is because whenever a client makes a call into the object, COM has to do a thread-switch from the client thread to the STA thread. What this means is that COM will temporarily "suspend" the client thread, switch to the STA thread and then make the call, and then switch back to the client thread. Thread switching is a rather expensive operation and should be avoided if possible. Multithreaded Apartment (MTA) For developers that are not impressed with limitations imposed by the STA, a "new and improved" apartment is in order. This apartment must have the following amenities: 1. No more automatic synchronization. This apartment will have the ability to accommodate more than 1 thread. If multiple clients call into objects in this apartment, all calls must proceed immediately and fearlessly! 2. No more thread switching. All threads in this apartment can freely call into any object within this apartment at any time they well damn wish! Objects in this apartment will have no concept of being "owned" by 1 thread as in the STA. Not surprisingly, this apartment is what COM calls the Multithreaded Apartment. "Multithreaded" is used to indicate that this apartment can have multiple threads in it. Since any number of threads can live in the MTA, there's need for only 1 MTA per application. Contrast this with an STA which can only contain 1 thread which would require multiple STAs for multiple threads. Unlike the STA, the MTA doesn't designate a particular thread that handles calls into its objects. How, then, does the MTA know which thread to use to make calls into its objects? MTAs are different from STAs in that there is no hidden window, no messages, and therefore no message pump required. The MTA's architecture is such that COM will manage an internal pool of threads for each MTA. When a client makes a call into an object in the MTA, COM will look into its thread pool, find an available thread and make the call directly from that thread. This means that an MTA object can receive method calls from arbitrary threads anytime. COM manages this thread pool by growing or shrinking the number of threads in the pool as necessary. Figure6: COM MTA architecture Since objects in the MTA receive calls from any thread at any time, synchronization is harder than the STA case. Instance or per-object data is no longer secure and any access done to them that can cause corruption will need to be properly synchronized. The same goes for global data as in the STA case. The more important thing is that MTA objects must not be dependent on anything that is thread-relative (or as the gurus like to say, MTA objects must not be dependent on anything that has thread affinity). An example of a thread-relative entity is a Win32 window handle; other examples are objects that have some dependency on thread local storage (TLS). A new thread that wishes to initialize the MTA can do so by calling CoInitializeEx as follows: Code: procedure NewThreadFunc; begin // initialize apartment CoInitializeEx (MultiThreaded); ... do work here ... // leave apartment CoUninitialize; end; Note that we use the MultiThreaded parameter as opposed to SingleThreaded (STA) to specify that we are interested in the MTA. Since there's only 1 MTA per process, the first thread that calls CoInitializeEx (MultiThreaded) creates the MTA whereas any similar succeeding calls from other threads will enter the existing MTA. Note that you still have to match up calls with CoUninitialize even though other threads simply enter the MTA because COM keeps track of the thread count in the MTA. Interface Marshaling The automatic synchronization in the STA and the fearless threading in the MTA are COM's guarantee to the developer. What this means is that if you have a weak object that cannot simultaneously handle multiple threads or is dependent on thread-relative information, that object has to go into the STA. By putting it in the STA, COM guarantees automatic synchronization, no more, no less. On the other hand, if you have a tough object that's not a afraid of a thread beating, you can put it in the MTA. This way, COM will guarantee that it will get a damn good beating if it needs to. Aside from the COM guarantees, you must also guarantee to abide by COM's rules. For instance, consider 2 STA threads, T1 and T2. Assume that T1 creates an instance of Foo and stores it in a global variable T1Foo: Code: var T1Foo : IFoo; // an STA service thread for T1Foo procedure T1; begin CoInitializeEx (SingleThreaded); T1Foo = CreateAFoo; RunMessagePump; CoUninitialize; end; Consider the adventurous T2 wanting to play with T1Foo: // an STA thread procedure T2; begin CoInitializeEx (SingleThreaded); T1Foo.Bar; CoUninitialize; end; From what we've learned, it is obvious that T1 and T1Foo live in an STA. More specifically, T1 is the STA's lone service thread. T2 lives in another STA. Now look closely at T2. The raw call to T1Foo.Bar will me made from the context of thread T2, i.e. 1) Bar won't be executed from within the context of thread T1 and 2) it will bypass T1's message pump. Both of these are clear violations of the STA model! This is what I mean when I say you also have to abide by COM's threading rules. How exactly do you abide by COM's threading rules in this case? Good question! The answer lies in a process called interface marshaling. In COM, an interface pointer is valid only in the apartment that acquired it, and nowhere else. In our example, the T1Foo interface pointer is only valid in T1's apartment because that is where it was assigned from. If you want to use an interface pointer that's valid in an apartment different from yours (as is the case with T2's STA using T1Foo), you have to ask COM to "massage" the interface so that it will be valid in your apartment. This "massage" process is called interface marshaling. The marshaling process involves taking an interface pointer from the source apartment (apartment where pointer is valid), converting it into a stream of bytes, shipping the stream to the target apartment (apartment that wants to use it), and finally decoding the stream of bytes back into a live interface pointer. After the marshaling is performed, COM will set up (behind the scenes) something called a proxy in the target apartment and a stub in the source apartment. A proxy is nothing but a small object that exposes the same exact interface as the original interface pointer does. The proxy and stub work in tandem as apartment-to-apartment translators in such a way that the client in the target apartment talks to the proxy, which in turn talks to the stub in the source apartment, which finally talks to the object. Figure7: COM Marshaling architecture If you also work with CORBA (or Java RMI), the marshaling terms might be confusing at first. CORBA's stub is the equivalent of COM's proxy, and CORBA's skeleton is the equivalent of COM's stub. COM takes care of how the proxy talks to the stub. In fact if T2 had a proxy to T1Foo, then the T1Foo.Bar call would go through the proxy, then to the stub, then to the actual Foo instance in T1. Since the stub "lives" in T1's apartment, the stub will make the call on T1 and T1's message pump will therefore pick the call up from the queue and then execute Bar in T1's context. Pretty cool huh?! In other words, using the proxy and stub mechanism is how we can work with COM when attempting to use interface pointers across apartments. Interface marshaling can be accomplished using the CoMarshalInterface and CoUnmarshalInterface APIs. CoMarshalInterface exports the interface pointer to a byte stream and CoUnmarshalInterface imports the pointer from the byte stream. CoMarshalInterface is normally called from the source apartment and then CoUnmarshalInterface is called from the target apartment. Because the CoMarshalInterface and CoUnmarshalInterface APIs require that you manually allocate/deallocate the byte stream, developers normally prefer a more convenient pair of APIs that automatically take care of the stream allocation/deallocation: CoMarshalInterThreadInterfaceInStream and CoGetInterfaceAndReleaseStream. Here's a pseudocode of how these APIs can be used for our example above: Code: var MarshalStream : IStream; // an STA service thread for T1Foo procedure T1; var T1Foo : IFoo; begin CoInitializeEx (SingleThreaded); T1Foo = CreateFoo; // export/marshal T1Foo into byte stream CoMarshalInterThreadInterfaceInStream (T1Foo, MarshalStream); RunMessagePump; CoUninitialize; end; // an STA thread procedure T2; var T1Foo : IFoo; begin CoInitializeEx (SingleThreaded); // import/unmarshal T1Foo from byte stream T1Foo = CoGetInterfaceAndReleaseStream (MarshalStream); T1Foo.Bar; CoUninitialize; end; Note that the marshaling process has to occur in sequence: T1 must execute CoMarshalInterThreadInterfaceInStream first before T2 can meaningfully execute CoGetInterfaceAndReleaseStream! Threading For In-Process Servers In-process/DLL servers introduce some factors into the entire COM threading business. This is because DLLs do not normally proactively create threads on their own. Being passively mapped into the address space of the client application, DLLs simply "blend" in with apartments and threads the client creates. In other words, it is the client that actually creates the threads and it is the client that makes the CoInitializeEx calls - objects in the DLL simply go with the flow of what the client wants to do. Why is this? Simple. DLLs simply blend into the client. Once a DLL is mapped into the client's address space, it is no different than any other code that is part of the client. An object in the DLL is no different than a raw object in the client. In fact, making calls into an object in the DLL is exactly the same as making calls into any other object in the client. In a sense, the client is itself both a client and a server even though strictly speaking, the DLL is the server. Since DLLs do not proactively take part in creating threads, objects in a DLL require a different mechanism of initializing themselves into the apartment of their choice. For instance, a weak object has to somehow tell the client that it can only live in an STA whereas a kick-ass object can tell the client that it prefers to live in the MTA. Of course DLL objects don't tell clients, per se, of their choice. Instead, when a DLL object gets registered, it indicates its apartment of choice into a ThreadingModel subkey. For example, if Foo prefers to live in the STA, here's what Foo theoretically looks like in the registry: Code: CLSID of Foo CLSID of Foo\ServerLocation = "FooServer.dll" CLSID of Foo\ThreadingModel = "Apartment" // <== Foo prefers to live in the STA What exactly does this mean? Whenever a client thread wants to created Foo, COM first looks at the ThreadingModel subkey. If it finds that the client thread belongs to an apartment that Foo prefers (in this case STA), then COM will create Foo directly into the client thread's apartment. Figure8: Direct apartment activation on compatible threading models for inproc objects. Foo1 is activated directly into STA 1 and Foo2 is activated directly in STA 2. If, on the other hand, the client thread belongs in a different apartment (for instance if the client thread called CoInitializeEx (MultiThreaded)) than the object's preference, COM will silently create an apartment that matches the object and then marshal an interface pointer from that apartment into the client apartment that requested to create the object. Why does COM do this? Imagine a client thread T1 that has entered the MTA as follows: Code: var FooVar : IFoo; procedure T1; begin CoInitializeEx (MultiThreaded); FooVar = CreateObject ("FooServer.Foo"); // assume Foo's ThreadingModel="Apartment" ... do some things ... CoUninitialize; end; Imagine a second client thread T2 that enters the MTA at a later point: Code: procedure T2; begin CoInitializeEx (MultiThreaded); // call Bar on FooVar variable initialized from T1 FooVar.Bar; CoUninitialize; end; Unlike our previous example, T2's direct access to FooVar is perfectly legal. Why? Because an interface pointer is valid from within the apartment where it was acquired. In this case, both T1 and T2 are in the same apartment - the one and only MTA. What you might not have noticed is that Foo declared a ThreadingModel="Apartment". So if COM creates Foo directly into the MTA that containes both T1 and T2, T1 and T2 (and any other thread in the MTA) would be free to crush (make calls into) Foo by virtue of the MTA. But we cannot allow that since Foo says it prefers the STA, i.e. "Do not allow multiple threads to simultaneously call me because I cannot handle it!" This is exactly why COM will do some extra work to create Foo in an STA and then hand back a marshaled pointer into T1 and T2's MTA. In simple terms, if the client and the server have different threading requirements, COM will do some extra work to ensure that both the client's and the server's wishes are granted unconditionally. Note though that COM will not create a separate STA for each Foo that is incompatible with the calling client thread. Instead COM will use a single STA to house all of the incompatible Foo instances. COM will use the first or "primary" STA that the client creates for this purpose. If the client hasn't created any STAs yet, COM will automatically create one when needed. If Foo prefers to live in the MTA, it would register its ThreadingModel value as "Free". What this means is that if a client thread is from the MTA, COM will happily create Foo directly into the client's MTA. However, if the client is STA, COM will also be a smart-ass and silently create an MTA on behalf of Foo and hand back a marshaled interface pointer from the MTA into the client thread's STA. Again, COM will use the one client MTA to house all incompatible Foo instances. If the client hasn't created the MTA yet, COM will automatically create one when needed. Foo can also be adventurous and be indifferent to the client's apartment type. COM also supports the "Both" ThreadingModel value meaning that Foo doesn't care if the client's thread is in the STA or the MTA. If the client is in the STA, COM will create Foo into the STA; if the client is in the MTA, COM will create Foo into the MTA. In a sense, "Both" really means "Either" STA or MTA. As we've discussed before, synchronization for "Apartment" involves proper protection of global data and synchronization for "Free" or "Both" involves proper protection of both instance-specific and global data. Depending on the synchronization toughness level of your object, that's one factor to determine which ThreadingModel value to use. Costs of Interface Marshaling An important deciding factor for the ThreadingModel value is by considering the overhead involved in marshaling. Remember, marshaling involves a proxy-stub connection and more importantly, an expensive thread-switch per method call. How is this important? Consider an object marked as ThreadingModel="Free". That's like saying "I'm tough, I can handle anything!" Oh yeah?! If your clients are mostly STA-based, a client that creates your object will always receive a proxy because of the magic that COM does behind-the-scenes. A proxy kind of defeats your tough object doesn't it? It would probably be better if your object declared itself as ThreadingModel="Apartment". That way, you get no magic from COM, and hence no stinking proxy! Consider another scenario. You have an object marked as ThreadingModel="Apartment". Let's say you have clients that are STA-based as well as clients that are MTA-based - say evenly distributed (50/50). Your STA clients would be very happy, but your MTA clients would be very sad due to the proxy syndrome. If you had declared your object as ThreadingModel="Both", it would probably be the best of both worlds for both STA clients and MTA clients. But remember, marshaling overhead is only part of the picture. For instance, it makes no sense to declare an object as ThreadingModel="Free" if it depends on anything that is thread-relative. In this case, ThreadingModel="Apartment" would be more appropriate! You can also mix objects of different threading models within a single server. The usefulness of this is relative to how your objects are being used. However, its good to point out that mixing objects with different threading models is a possibility and can sometimes be beneficial in terms of performance. Where Are We? Whew! Multithreading is both a hard and easy topic in COM. Hard because it takes time to see the big picture and easy because once you see the big picture, everything just falls into place. My only comment is multithreading is not for the faint of heart and should only be interesting to hard-core geeks. However, knowledge gained in this chapter can always be useful in understanding future concurrency aspects of COM. As we shall see later, COM+ and Windows 2000 will build on top of these basic concepts to create a user-friendly approach to COM multithreading.