As software developers we have enjoyed a long trend of consistent performance improvement from processor technology. In fact, for the last 20 years processor performance has consistently doubled about every two years or so. But, day by day the truth is realizing that single threaded performance improvement is likely to see a significant slow down within next one or three years. In some cases, single thread performance may even drop. The long and sustained climb will slow dramatically. We refer to the cause of this trend as the CLIP level.
- C - Clock frequency increases have hit a thermal wall
- L - Latency of processor to memory requests continues as the key performance bottleneck
- IP - Instruction-level Parallelism is already fully exploited by current processor and compiler technologies.
This paper/article will take a deeper dive into the current issues challenging processor performance improvement and how we can show the generic definition structure of the multicore systems. This is the first part of the series of this article.
For to start the definition of the computing hardware and the information processing logics, we can pre-assume that the speed of the light in Vacuum (C)is our ultimate speed of information processing target. And, still yet, according to the formal definition of our relativity science by Albert Einstein, we assuming that we cannot break it. So, we can take it as a constant value in our universe.
So, think in such way now that-
"V" as a Volume of "Space-Time", containing mass, energy or just empty. That is superior bound on the amount of the information to be processed on a unit of time.
The constants are "C" (The speed of light) and the "h" (the Planck Constant).
Now, yet to achieve this limit level of high speed computing we already tried different combination of researches(like Pipelining, Instruction level parallelism, RISC technique and many more). Always it was our main point of view that how we can compute more numbers of operations in a unit time cycle.
So, one point is coming now when we achieve the Volume level of "V "of information to compute at a unit time cycle; to process more infos at a unit time cycle we need another volume of "V". So, then we can compute double amount of V of infos at a Unit time cycle. This is a basic view point for the "Atomic processors"- where a processor can do the entire job in one "basic" cycle, without communicating between two time states.
And, in the case of the Multithreading architecture we are working generally in different time states of the computation processes. That's for we need communicating channel to communicate between the various time states.
This is just a rough idea of the moving of the cpu's towards the Multicore system and just an rough explanation viewpoint from the Atomic concepts. We hope that we can present more general and formal mathematical definition of the "Multicore Mania", in the coming versions of this article.
We have an interesting viewpoint for the multicore architecture. Far far times ago, someone started to talk about it. That, to increase our computation scalability dominating the red eyes of the thermal dissipation and the size of the processors, but the speed of the light of the total computation process.
Lets think, for a microprocessor running at 1Gz frequency. So, the basic cycle is 1ns (10^-9 sec) of the computation process. So, now the speed of the light is 3*10^8m/sec in an empty space. Some experiments already have shown us that the speed of light will be 1/3 of the space in the Silicon. So, the speed will be 10^8 meter/second in Silicon.
Hence, we are getting it that for the 1GHz processor in the light can move the distance in Silicon is 10^8/10^9 meter in the onetime cycle; i.e. 0.01 meter or 10cm.
So, for the 3GHz frequency based microprocessor this distance getting down at (10/3 cm or nearly 3 cm).
Let take a scale and open your P4 box and measure the length of that P4 Microprocessor. The P4 is roughly 2Cm lengthen square. So, the diagonal is nearly 2*Sqrt (2) i.e. 2.41 or very roughly 3cm. So, the most interesting point is coming to us that the P4 machine running at 3GHz, there the light wave can move to any corner of that microprocessor in a clock pulse.
That is the idea for the 2D structures. So, now to improve more enhancement of the performance level of a Microprocessor we have very small options in our hand for the Microprocessor Designers.
- You just go smaller more.
- You go 3D structure.
- Just Fix the frequency of the Microprocessor.
- you make multiple processors inside the same package in an SMP fashion.
For instance of that multicore enhancement from the long time ago,
IBM is now trying to achieve 3D structure; Intel, AMD etc is going towards the integrating multiple processors in a Single Chip.
Both Intel, IBM have stopped to increase the frequency. The Wiki page:
http://en.wikipedia.org/wiki/Intel_Core is giving you a look on it.
And, all of this from a simple children affordable simple arithmetic concept is showing us that how all we are going to achieve the limit level of speed of Light for information processing for the non-quantum CPU in the coming years. Already world is spreading the idea of the multicore with 4,8 numbers available in market. More larger cores are in lab as preproduction and testing stages now. With it now IBM's 3D technology is also moving to gain its target .They already declared as :
This is a first part of the series of the article target to explore the future multicore technology. In this part, we just tried to explain just a simple explanatory arithmetical view of the Multicore processors. We hope this will help that students who are intending to learn about the Multicore technology and why it is? Why we want multicore in coming generation?
In coming sessions we will try to explain it more mathematically that how it will grow up and what will be its generic math explaination. We hope we will get support from more math talents to explain it simply.
Special Thanks to : Giuseppe Vitillaro ( http://vitillaro.org/ )