Analysis of performance bottlenecks in multithreaded multiprocessor
systems
Zuberek, W.M.
Fundamenta Informaticae, vol.50, no.2, pp.223-241, 2002.
Abstract:
The performance of modern multiprocessor systems is often limited
by the delays of interconnections or long latencies of memory subsystems.
Instruction-level multithreading is a technique to tolerate such long
latencies by switching from one instruction thread to another and continuing
instruction execution concurrently with the long-latency operations. Using
timed Petri net models, the paper analyzes performance limitations introduced
by different components of distributed-memory multithreaded multiprocessor
systems. Simulation results are used to compare performance improvements
obtained by replicating critical components of the system to those obtained
using components with better performance characteristics.
Keywords:
Instruction-level multithreading, distributed-memory multiprocessor systems,
timed Petri nets, performance analysis, performance bottlenecks,
event-driven simulation.
References:
-
Agarwal, A.: "Performance tradeoffs in multithreaded processors", IEEE
Trans. on Parallel and Distributed Systems, vol.3, no.5, 1992, pp.525-539.
-
Ajmone Marsan, M., Conte, G., Balbo, G.: "A class of generalized stochastic
Petri nets for the performance evaluation of multiprocessor systems",
ACM Trans. on Computer Systems, vol.2, no.2, 1984, pp.93-122.
no
-
Bause, F., Kritzinger, P.S.: Stochastic Petri nets - an introduction
to the theory (Academic Studies in Computer Science), Vieweg Publ.,
Wiesbaden, 1996.
-
Boothe, B., Ranade, A.: "Improved multithreading techniques for hiding
communication latency in multiprocessors",
Proc. 19-th Annual Int. Symp. on Computer Architecture,
Gold Coast, Australia, 1992, pp.214-223.
-
Burger, D., Goodman, J.R., Kaegi, A.: "Memory bandwidth limitations of future
microprocessors", Proc. 23-rd Annual Int. Symp. on Computer Architecture,
Philadelphia, PA, 1996, 78-89.
-
Byrd, G.T., Holliday, M.A.: "Multithreaded processor architecture", IEEE
Spectrum, vol.32, no.8, 1995, pp.38-46.
-
Chen, T-F., Baer, J-L.: "A performance study of software and hardware data
prefetching scheme", Proc. 21-st Annual Int. Symp. on Computer
Architecture, Chicago, IL, 1994, pp.223-232.
-
Ding, C., Kennedy, K.: "The memory bandwidth bottleneck and its amelioration
by a compiler", Proc. 14-th Int. Parallel and Distributed Processing
Symp., Cancun, Mexico, 2000, pp.181-189.
-
Eggers, S.J., Emer, J.S., Levy, H.M., Lo, J.L., Stamm, R.L., Tullsen, D.M.:
"Simultaneous multithreading - a platform for next generation processors",
IEEE Micro, vol.17, no.5, 1997, pp.12-19.
-
Govindarajan, R., Suciu, F., Zuberek, W.M.: "Timed Petri net models of
multithreaded multiprocessor architectures", Proc. 7-th Int. Workshop
on Petri Nets and Performance Models, St. Malo, France, 1997, pp.153-162.
-
Hamilton, S.: "Taking Moore's law into the next century", IEEE Computer
Magazine, vol.32, no.1, 1999, pp.43-48.
-
Jain, R.: The art of computer systems performance analysis,
J. Wiley & Sons, New York, 1991.
-
Jensen, K.: "Coloured Petri nets", in
Advanced Course on Petri Nets 1986 (Rozenberg, G., Ed.), LNCS 254,
Springer-Verlag 1987, pp.248-299.
-
Klaiber, A.C., Levy, H.M.: "An architecture for software-controlled data
prefetching", Proc. 18-th Annual Int. Symp. on Computer Architecture,
Toronto, Canada, 1991, pp.43-53.
-
Loh, K.S., Wong, W.F.: "Multiple context multithreaded superscalar processor
architecture", Journal of Systems Architecture}, vol.46, no.3, 2000,
pp.243-258.
-
Merlin, P.M., Farber, D.J.: "Recoverability of communication protocols -
implications of a theoretical study", IEEE Trans. on Communications,
vol.24, no.9, 1976, pp.1036-1049.
-
Murata, T.: "Petri nets: properties, analysis and applications",
Proceedings of IEEE, vol.77, no.4, 1989, pp.541-580.
-
Reisig, W.: Petri nets - an introduction (EATCS Monographs on
Theoretical Computer Science 4), Springer-Verlag, Berlin, 1985.
-
Rixner, S., Dally, W.J., Kapasi, U.J., Mattson, P., Ovens, J.D.: "Memory
access scheduling", Proc. 27-th Annual Int. Symp. on Computer Architecture,
Vancouver, BC, 2000, pp.128-138.
-
Rogers, A., Li, K.: "Software support for speculative loads", Proc. 5-th
Symp. on Architectural Support for Programming Languages and Operating
Systems, 1992, pp.38-50.
-
Tullsen, D.M., Eggers, S.J., Levy, H.M.:
"Simultaneous multithreading: maximizing on-chip parallelism", Proc.
22-nd Annual Int. Symp. on Computer Architecture, Santa Margherita Ligure,
Italy, 1995, pp.392-403.
-
Wilkinson, B.: {Computer architectures - design and performance,
Prentice Hall Europe, London, 1996.
-
Zuberek, W.M.:
"Timed Petri nets - definitions, properties and applications",
Microelectronics and Reliability (Special Issue on Petri Nets and
Related Graph Models), vol.31, no.4, 1991, pp.627-644.
-
Zuberek, W.M.:
"Performance modeling of multithreaded distributed memory architectures",
Proc. 2-nd Workshop on Hardware Design and Petri Nets, Williamsburg, VA,
1999, pp.63-82.
-
Zuberek, W.M.:
"Approximate performance evaluation of multithreaded distributed-memory
architectures", Proc. 15-th Performance Engineering Workshop, Bristol, UK,
1999, pp.81-92.
-
Zuberek, W.M.:
"Analysis of pipeline stall effects in block multithreaded multiprocessors",
Proc. 16-th Performance Engineering Workshop, Durham, UK, 2000, 187-198.
Available in pdf
and in postscript.