Research

I work at the intersection of VLSI design, computer architecture, and machine learning to codesign solutions across the hardware-software computing stack with the goal of overcoming fundamental limitations we now face due to the end of Dennard Scaling — and bringing these proof-of-concepts into physical realizations via modern chip tape-outs. Here are a few highlights of my research work so far:

 

1) A novel and hardware-friendly floating-point based encoding datatype, AdaptivFloat, which enables highly resilient and energy-efficient quantized computations. AdaptivFloat dynamically maximizes and optimally clips its available dynamic range, at a tensor granularity, in order to create faithful encodings of neural network parameters.  In doing so, AdaptivFloat produces higher inference accuracies at low bit precision compared to many other prominent datatypes used in deep learning computations. 

AdaptivFloat Quantization

AdaptivFloat quantization scheme based on exponential bias shift from maximum absolute tensor value
Source:

2) A 16nm system-on-chip for noise-robust speech recognition and NLP featuring hardware acceleration of sequence-to-sequence attention-based DNNs with AdaptivFloat-based processing elements, and unsupervised Bayesian-based speech denoising. The test chip executes a full speech-enhancing ASR pipeline while consuming 2.24mJ of energy per frame and 18ms end-to-end latency -- enabling real-time throughput.

sm6_chip

Source:

3) As newer Transformer-based pre-trained models continue to generate impressive breakthroughs in language modeling, they characteristically exhibit complexities that levy hefty latency, memory, and energy taxes on resource-constrained, battery-operated devices. EdgeBERT provides a principled latency-driven methodology to alleviate these computational challenges in both the algorithm and hardware architecture layers. EdgeBERT adopts first-layer early exit prediction in order to perform sentence-level dynamic voltage-frequency scaling (DVFS), for minimal energy consumption while adhering to a prescribed target latency.

 

EdgeBERT HW/SW co-design

Memory and latency optimizations for minimizing multi-task NLP energy consumption
Source: