Jump to Content

Multi-input Serial Adders for FPGA-like Computational Fabric

Herman Schmit
Matthew Denton
Google Scholar


In this paper, we present a new functional unit to replace the K-LUT in an FPGA-like computational fabric designed specifically for use to accelerate instance-specific sparse integer matrix multiplication. We use a suite of matrices, the VPR place-and-route tool\cite{vpr}, and modern architecture representations of the interconnect \cite{global-new-local} to examine this architectural idea. The new cell, called the K-ADD, increases density by 2.5x to 4x, and increases net performance by 8\% to 30\% in a 16nm implementation. This benefit magnifies the two-orders-of-magnitude advantage of using instance specific matrix multipliers demonstrated in \cite{denton2021direct}. We investigate the cluster size, N, in an experiment similar to \cite{global-new-local}, across multiple technology nodes. In that investigation, the netlists that use the K-ADD have a similar relationship to cluster size as conventional netlists. When matrices are mapped to LUTs, however, clustering provides very little delay benefit.