Fpga Implementation Of A Decimal Floating Point Co Processor With Accurate Scalar Product Unit

Download Fpga Implementation Of A Decimal Floating Point Co Processor With Accurate Scalar Product Unit PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Fpga Implementation Of A Decimal Floating Point Co Processor With Accurate Scalar Product Unit book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Area, and Power Performance Analysis of a Floating-Point Based Application on FPGAs

Almost all signal processing algorithms are initially represented as double precision floating-point in languages such as Matlab. For hardware implementations, these algorithms have to be converted to large precision fixed-point to have a sufficiently large dynamic range. However the inevitable quantization effects and the complexity of converting the floating-point algorithm into a fixed point one, limit the use of fixed-point arithmetic for high precision embedded computing. FPGAs have become an attractive option for implementing computationally intensive applications. However, the common conception has been that efficient FPGA implementations of floating-point arithmetic have a lot of performance, area and power overheads compared to fixed-point arithmetic. With recent technology advances, FPGA densities are increasing at a rate at which area considerations are becoming less significant. These advances have also reduced the performance and power overhead of floating-point arithmetic. With appropriate designs, floating-point applications can even be more efficient than fixed-point ones for large bitwidths. The overheads in the context of the overall application can be quite low. In this paper, we present a preliminary area, and power performance analysis of double precision matrix multiplication, an extensively used kernel in embedded computing and also show that FPGAs are good candidates for implementing high precision floating-point based applications when compared to a general-purpose processor. Currently many FPGA based floating-point units, both open source 2 and commercial 1, are available. However, most of them consider only single precision floating-point operations, and do not make use of the recent advances in FPGAs. Moreover, an area, and power performance analysis of the floating-point units in the context of a common application is lacking.
Designing an IEEE Floating-Point Unit with Configurable Compliance Support and Precision for FPGA-based Soft-processors

Field Programmable Gate Arrays (FPGAs) are commonly used to accelerate floating-point applications. The advancements in FPGA technology and the introduction of the RISC-V Instruction Set Architecture (ISA) have collectively enabled a number of soft-processor designs. Although researchers have extensively studied FPGA- based floating-point implementations, existing work has largely focused on standalone, and frequency-optimized data-path designs. They are not suitable for soft-processors targeting FPGAs due to the units' long latency, and soft-processors' innate frequency ceiling. Furthermore, the few existing integrated Floating Point Unit (FPU) hardware implementations targeting FPGA-based soft-processors are not IEEE 754 compliant. We present a floating-point unit for FPGA-based RISC-V soft-processors that is fully IEEE compliant and configurable. Our design focuses on maximizing runtime performance with efficient resource utilization. We allow the users to configure the FPU to four varying levels of compliance, or to select reduced precision configurations. Benchmarking against a set of real-world floating-point applications, we evaluate the FPU variants in term of resource usage, operating frequency, runtime performance, and performance efficiency. We also present trade-off analyses of two microarchitecture design choices. Our fully compliant FPU uses 5423 Look-Up Tables (LUTs), and achieves an operating frequency of 105 MHz. The key results from our work demonstrate the effect of running floating-point workloads using reduced compliance FPUs. Our experimentation shows that decreasing the Fused Multiply-Add (FMA)'s intermediate representation leads to a 25% reduction in LUT usage that translates to an average 46% increase in performance- efficiency. Additionally, disabling denormal support reduces resource utilization by 10% and improves the clock frequency by 6%, which results in a 14% higher performance efficiency, while having no impact on the result accuracy for our benchmark applications. Furthermore, we find that running applications in reduced precision can improve runtime performance by up to 75%, although applications may suffer from significant loss of precision.