My work aims to co-optimize natural language processing (NLP) at the algorithm, hardware architecture, and solid-state layers. This endeavor has so far led to the following efforts:


1) A novel and hardware-friendly floating-point based encoding datatype, AdaptivFloat, which enables highly resilient and energy-efficient quantized computations. AdaptivFloat dynamically maximizes and optimally clips its available dynamic range, at a tensor granularity, in order to create faithful encodings of neural network parameters.  In doing so, AdaptivFloat produces higher inference accuracies at low bit precision compared to many other prominent datatypes used in deep learning computations. 

AdaptivFloat Quantization

AdaptivFloat quantization scheme based on exponential bias shift from maximum absolute tensor value

2) A 16nm system-on-chip for noise-robust speech recognition featuring hardware acceleration of attention-based DNNs with AdaptivFloat-based processing elements, and Bayesian-based speech denoising. 



3) As newer Transformer-based pre-trained models continue to generate impressive breakthroughs in language modeling, they characteristically exhibit complexities that levy hefty latency, memory, and energy taxes on resource-constrained embedded platforms. EdgeBERT provides an in-depth and principled methodology to alleviate these computational challenges in both the algorithm and hardware architecture layers. Furthermore, EdgeBERT investigates to what extent the very dense, albeit stochastic, storage capabilities of emerging non-volatile memories can be exploited in satisfying the always-on and intermediate computing requirements of fully on-chip multi-task NLP.

BERT Optimizations

Memory and latency optimizations incorporated under the EdgeBERT methodology