Area-Efficient Evaluation of Arithmetic Expressions
Using Deeply Pipelined Floating-Point Cores
It has become possible to implement floating-point cores on FPGAs in an effort to
provide hardware acceleration for the applications that require high performance floating-point arithmetic.Due to this deep pipelining requirement and the
complexity of floating-point arithmetic, floating-point cores use
a great deal of the FPGAâ„¢s area. an
area-efficient architecture and algorithm for the evaluation of
arithmetic expressions is described here. Because of their complexity, the floating-point
cores have very large areas.the design problem in this case
is to develop algorithms and architectures that can hide this
pipeline latency so that as few floating-point cores as possible
are used in a design.
PROBLEM DEFINITION
The arithmetic expression evaluation problem is to compute
the value of an expression that is represented by a tree in
which the leaves are numeric values and the internal nodes
are operators. Here a tree with k levels has n=2^k inputs. Floating point cores are kept as few as possible. This problem is solved with n-1 floating-point
cores, one for each operation as dictated by the expression. The floating-point cores in the architecture do not
take new inputs on every clock cycle; rather, they are only
active periodically.
SOLUTION TO THE PROBLEM
Basic Case:
in the reduced form, there is only a
single type of operator in the expression and this operator is
commutative and associative. The architecture for the basic case has one alpha -stage pipelined floating-point core, a counter , some control circuitry and some buffer levels.
Developing a Solution to the Advanced Case
Assume that two circuits implementing the architecture
and algorithm for the basic case, one that performs
addition and the other that performs multiplication. We make
the assumption that the floating-point cores used in each circuit
have the same number of pipeline stages,they will
write to their respective buffers at the same time. By selecting the appropriate result
for each buffer write, any complete-binary-tree expression that uses only addition and multiplication can be evaluated.
Full report:
http://halcyon.usc.edu/~pk/prasannawebsi...sa2005.pdf