Stanford VLSI Group Homepage

  • Increase font size
  • Default font size
  • Decrease font size
Home Papers
Latency Sensitive FMA Design
Research Area: Low-power Year: 2011
Type of Publication: In Proceedings Keywords: Fused Multiply Add
Book title: Proceedings of 20th IEEE Symposium on Computer Arithmetic
Pages: 129-138
The implementation of merged floating-point multiply-add operations can be optimized in many ways. For latency sensitive applications, our cascade design reduces the accumulation dependent latency by 2x over a fused design, at a cost of a 13% increase in non-accumulation dependent latency. A simple in-order execution model shows this design is superior in most applications, providing 12% average reduction in FP stalls, and improves performance by up to 6%. Simulations of superscalar out-of-order machines show 4% average improvement in CPI in 2-way machines and 4.6% in 4-way machines. The cascade design has the same area and energy budget as a traditional fused multiple-add FMA.
Digital version