|Research Area:||Smart memories|
Over the last several years uniprocessor performance scaling slowed significantly because of power dissipation limits and the exhausted benefits of deeper pipelining and instruction-level parallelism. To continue scaling performance, microprocessor designers switched to Chip Multi-Processors (CMP). Now the key issue for continued performance scaling is the development of parallel software applications that can exploit their performance potential. Because the development of such applications using traditional shared memory programming models is difficult, researchers have proposed new parallel programming models such as streaming and transactions. While these models are attractive for certain types of applications they are likely to co-exist with existing shared memory applications. We designed a polymorphic Chip Multi-Processor architecture, called Smart Memories, which can be configured to work in any of these three programming models. The design of the Smart Memories architecture is based on the observation that the difference between these programming models is in the semantics of memory operations. Thus, the focus of the Smart Memories project was on the design of a reconfigurable memory system. All memory systems have the same fundamental hardware resources such as data storage and interconnect. They differ in the control logic and how the control state associated with the data is manipulated. The Smart Memories architecture combines reconfigurable memory blocks, which have data storage and metadata bits used for control state, and programmable protocol controllers, to map shared memory, streaming, and transactional models with little overhead. Our results show that the Smart Memories architecture achieves good performance scalability. We also designed a test chip which is an implementation of Smart Memories architecture. It contains eight Tensilica processors and the reconfigurable memory system. The dominant overhead was from the use of flops to create some of the specialized memory structures that we required. Since previous work has shown this overhead can be made small, our test-chip confirmed that hardware overhead for reconfigurability would be modest.
This thesis describes the polymorphic Smart Memories architecture and how three different models—shared memory, streaming and transactions—can be mapped onto it, and presents performance evaluation results for applications written for these three models. We found that the flexibility of the Smart Memories architecture has other benefits in addition to better performance. It helped to simplify and optimize complex software runtime systems such as Stream Virtual Machine or transactional runtime, and can be used for various semantic extensions of a particular programming model. For example, we implemented fast synchronization operations in the shared memory mode which utilize metadata bits associated with data word for fine-grain locks.