I am currently enrolled in a Master’s program for Electrical Science and Technology at the School of Electronic Information and Electrical Engineering (电子信息与电气工程学院), Shanghai Jiao Tong University (SJTU), Shanghai, China, and under the supervision of Professor Guanghui He.

My current research interests focus on Efficient ASIC Architecture Design for Emerging Applications.

  • Efficient Hardware Accelerators for Autonomous Driving Systems
    • Preprocessing: feature learning network (DAC 2023, TVLSI 2024)
    • Backbone: 3D sparse convolution (ICCAD 2023)
    • Multi-modality Fusion: vision-centric 3D perception (DAC 2024, TCAS-I 2025)
    • Data Compression Techniques for High-bandwidth Requirement from Future Neural Networks (TCAS-II 2024)
  • Efficient Hardware Accelerators for Neural Radiance Field (NeRF)
    • Hardware/Software Co-Design for NERF based 3D reconstruction (TCAD 2025)
  • Efficient Hardware Accelerators for Transformers
    • KV Cache Compression (2 x DAC 2025)
    • Bit-Serial Acceleration (DAC 2025)
    • Processing-in-Memory (DAC 2025)

🔥 News

  • 2025.04:  🎉🎉 One co-authored paper about Acceleration for Vision-Centric 3D Perception is accepted by TCAS-I.
  • 2025.02:  🎉🎉 Our paper about DSA’s Efficiency Optimization on KV Cache in LLMs is accepted by DAC. One co-authored paper about Near Memory Processing for Generative LLM inference is accepted by DAC. One co-authored paper about Bit-serial Neural Network Acceleration is accepted by DAC.
  • 2024.11:  🎉🎉 Our paper about Neural Rendering Acceleration is accepted by TCAD.
  • 2024.03:  🎉🎉 The preprint version of our DAC’24 paper DEFA: Deformable Attention Accelerator is available on ArXiv.
  • 2024.02:  🎉🎉 One co-authored paper about Deformable Attention Acceleration is accepted by DAC’24.
  • 2024.02:  🎉🎉 One co-authored paper about NN data compression engine is accepted by IEEE TCAS-II.
  • 2024.01:  🎉🎉 The extended paper of our DAC’23 is accepted by IEEE TVLSI.
  • 2023.07:  🎉🎉 One co-authored paper about 3D Sparse Convolution Accelerator is accepted by ICCAD’23.
  • 2023.02:  🎉🎉 One co-authored paper about Feature Learning Network Acceleration of Point Clouds is accepted by DAC’23.

📝 Publications

$^{\dagger}$: equal contribution.

Selected Publications

TVLSI 2024
sym

FLNA: Flexibly Accelerating Feature Learning Networks for Large-Scale Point Clouds with Efficient Dataflow Decoupling

Dongxu Lyu$^{\dagger}$, Zhenyu Li$^{\dagger}$, Yuzhou Chen, Gang Wang, Weifeng He, Ningyi Xu and Guanghui He

  • The first grid-based feature learning network accelerator with algorithm-architecture co-optimization for large-scale point clouds.
  • It demonstrates substantial performance boost over the state-of-the-art point cloud accelerators while providing superior support of large-scale point clouds ($>10^6$ points in $\sim2$ ms).
  • An extension of our DAC’24 paper.
  • IEEE Transactions on Very Large Scale Integration (VLSI) Systems
TCAD 2025
sym

Neural Rendering Acceleration with Deferred Neural Decoding and Voxel-Centric Data Flow

Yuzhou Chen$^{\dagger}$, Zhenyu Li$^{\dagger}$, Dongxu Lyu, Yansong Xu and Guanghui He

  • Proposed a deferred neural decoding method to aggregate the network queries, reducing the computation workload by $85.6\%$ compared to the original algorithm and only incurs $<0.5\%$ loss in rendering quality.
  • Implemented a highly-pipelined accelerator utilizing voxel-centric dataflow, which achieves up to $2.9\times$ throughput, $36.5\times$ energy efficiency over the state-of-the-art related accelerators.
  • IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
DAC 2025
sym

KVO-LLM: Boosting Long-Context Generation Throughput for Batched LLM Inference

Zhenyu Li$^{\dagger}$, Dongxu Lyu$^{\dagger}$, Gang Wang, Yuzhou Chen, Liyan Chen, Wenjie Li, Jianfei Jiang, Yanan Sun and Guanghui He

  • Proposed a framework unifying quantization and pruning, which compresses KV cache by up to $92\%$.
  • Implemented a multi-core architecture utilizing intra- and inter-core latency overlapping.
  • 2025 62th ACM/IEEE Design Automation Conference (DAC)

Full Pub List

💻 Projects

A High Performance Neural Processing Unit for Efficient CNN Inference (Oct. 2023 – Present)

NPU Chip Design
sym

Hybrid Precision Neural Network Processor for Edge Computing (Tape-out)

  • High programmability and flexibility, designed for AI algorithms with multi-operator and multi-precision requirements.
  • Equipped with an operator-assembly translation framework to support fast deployment of the entire network.
  • Deployed in company’s intelligent perception terminals for real-time device status detection.

🎖 Honors and Awards

  • 2025.09 National Scholarship for Graduate Student from Shanghai Jiao Tong University.
  • 2025.09 1st Prize on China Postgraduate IC Innovation Competition (中国研究生创芯大赛) from Association of Chinese Graduate Education.
  • 2024.11 Tang Youshuqi Scholarship from Shanghai Jiao Tong University.
  • 2024.09 3rd Prize on China Postgraduate IC Innovation Competition (中国研究生创芯大赛) from Association of Chinese Graduate Education.
  • 2023.06 Outstanding Graduate from Shanghai Jiao Tong University.
  • 2023.06 Departmental Excellent Undergraduate Thesis (Department of Micro/Nano Electronics) from Shanghai Jiao Tong Unversity.
  • 2021.11 The Outstanding Alumni Fund of the School of SEIEE from Shanghai Jiao Tong University.
  • 2021.09 2nd Prize on Naitional Undergraduate Embedded Chip Design Competition (嵌入式芯片与系统设计竞赛) from Chinese Institute of Electronics.

📖 Educations

sjtu

SHANGHAI JIAO TONG UNIVERSITY

Degree: Bachelor
Period: 2019.09 - 2023.06
Major: Microelectronics Science and Engineering
GPA: 4.03/4.3 (ranked 2nd out of 67)
sjtu

SHANGHAI JIAO TONG UNIVERSITY

Degree: Master
Period: 2023.09 - 2026.03 (expected)
Major: Integrated Circuit Science and Engineering
GPA: 3.98/4.00 (ranked 1st out of 115)

💻 Internships

hx

Huixi Technology (辉羲智能)

2022.11 - 2024.03
Hardware & Toolchain Intern
Shanghai, China.