Of course, this is the fastest (sizeChar and BYTE_DEFINE as above)

Code:
template<typename  T>
int my_sizeof_4()
{
	int  c = sizeChar();

	T* t = NULL;

	return  ((char*)(t+1)-(char*)t)*c;
}
Observe that this would not be perfectly safe if we would have t++ anywhere since this would create a pointer to undefined memory location, but t + 1 is more OK. The only problem is that t+1 implicitly uses something similar to sizeof so it might look like cheating
Only my_sizeof_3 has its own logic applied because it creates objects first. But even there somebody should know the size of the object, so it is cheating anyhow.
What I want to say, even if you don't use sizeof to find the size, the system this way or another must.