Publication

  1. System for Machine Learning
  2. [1] P. Behnam*, Y. Fu*, R. Zhao, P. Tsai, Z. Yu, A. Tumanov, "RocketKV: Accelerating Long-context LLM Inference via Two-stage KV-cache Compression," GTC'25, 42'nd International Conference on Machine Learning (ICML), 2025.[*co-first authors] (Highlighted at Nvidia GTC'25 Conference in Insight from Nvidia Research Talk)

    [2] P. Behnam*, U. Kamal*, S. Vijay Ganesh, Zh. Li, M. Jurado, A. Khare, I. Fedorov, G. Liu, A. Tumanov, "∇QDARTS: Quantization as an Elastic Dimension to Differentiable NAS," Transactions in Machine Learning Research (TMLR), 2025.[*co-first authors]

    [3] A. Khare, A. Agrawal, A. Annavajjala, P. Behnam, H. Latapie, M. Lee, A. Tumanov "SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device Inference," 18th European Conference on Computer Vision (ECCV), Italy, 2024.

    [4] P. Behnam*, J. Tong*, A. Khare, Y. Chen, P. Gadikar, A. Bambhaniya, T. Krishna, A. Tumanov, "Hardware-Software Co-design for Real-time Latency-accuracy Navigation in TinyML Applications," In IEEE Micro Special Issue on Tiny ML, 2023. [*co-first authors]

    [5] P. Behnam*, J. Tong*, A. Khare, Y. Chen, P. Gadikar, A. Bambhaniya, T. Krishna, A. Tumanov, "Subgraph Stationary Hardware-software Inference Co-design," Sixth Conference on Machine Learning and Systems (MLSys), USA, 2023. [*co-first authors]

    [6] P. Behnam*, G. Yuan*, Y. Cai, A. Shafiee, J. Fu, Z. Liao, Z. Li, X. Ma, J. Deng, J. Wang, M. Bojnordi, Y. Wang, and C. Ding, "TinyADC: Peripheral Circuit-aware Weight Pruning Framework for Mixed-signal DNN Accelerators," 27'th Annual Design, Automation, and Test in Europe Conference (DATE), 2021. [*co-first authors] (Best Paper Candidate)

    [7] P. Behnam and M.Bojnordi, "A potential replacement for IEEE-754," ACM SIGARCH Blog Article, 2020.

  3. Accelerators for Machine Learning 
  4. [1] P. Behnam, U. Kamal, A. Shafiee, A. Tumanov, S. Mukhopadhyay,  "Harmonica: Hybrid Accelerator to Overcome Imperfections of Mixed-signal DNN Accelerators," 38'th IEEE International Parallel and Distributed Processing Symposium (IPDPS), USA, 2024.

    [2] P. Behnam*, Y. Luo*, K. Thorat, Z. Liu, H. Peng, Sh. Huang, Sh. Zhou, O. Khan, A. Tumanov, C. Ding, and T. Geng, "CoDG-ReRAM: An Algorithm-Hardware Co-design to Accelerate Semi-Structured GNNs on ReRAM," 40'th IEEE International Conference on Computer Design (ICCD), USA, 2022. [*co-first authors]

    [3] P. Behnam*, G. Yuan*, Z. Li, A. Shafiee, S. Lin, X. Ma, H. Liu, X. Qian, M. Bojnordi, Y. Wang, C. Ding, "FORMS: Fine-grained Polarized ReRAM-based In-situ Computation for Mixed-signal DNN Accelerator," 46th International Symposium on Computer Architecture (ISCA), 2021. [*co-first authors]

    [4] S. Taheri,  P. Behnam, E. Bozorgzadeh, A. Veidenbaum, and A. Nicolau, “AFFIX: Automatic Acceleration Framework for FPGA Implementation of OpenVX Vision Algorithm,” 27’th ACM International Symposium on Field-Programmable Gate Arrays (FPGA), USA, 2019.

    [5] S. Taheri, J. Heo, P. Behnam, J. Chen, A. Veidenbaum, and A. Nicolau, “Acceleration Framework for FPGA Implementation of OpenVX Graph Pipelines,” 26’th IEEE International Symposium on Field-Programmable Custom Computing Machines (FCCM), USA, 2018.

    [6] Y. Rupesh, P. Behnam, G.Pandla, M. Miryala, and M. Bojnord, “Accelerating k-Medians Clustering Using a Novel 4T-4R RRAM Cell,” in IEEE Transactions on Very Large Scale Integration (VLSI) Systems (TVLSI), 2018.

    [7] P. Behnam*, E. Nikahd*, and R. Sameni, “High-Speed Hardware Implementation of Fixed and Run-time Variable Window Length 1-D Median Filters,” in IEEE Trans. on Circuits and Systems II (TCAS-II), 2016. [*co-first authors]

  5. Accelerator for Signal Processing (Algorithm, Simulation, FPGA Prototyping, ASIC Fabrication)
  6. [1] X, Mao*, M. Mukherjee*, N. M. Rahman*, C. DeLude*, J. Driscoll*, S. Sharma, P. Behnam, U. Kamal, D. Kim, S. Khan, J. Tong, J. Woo, J. Seo, P. Sinha, S. Pande, T. Krishna, M. Swaminathan, J. Romberg, and S. Mukhopadhyay, “Real-time Digital RF Emulation – II: A Near Memory Custom Accelerator,” IEEE Transactions on Radar Systems, 2024.

    [2] M. Mukherjee, N. M. Rahman, C. DeLude, J. Driscoll, U. Kamal, J. Woo, J. Seo, S. Sharma, X. Mao, P. Behnam, S. Khan, D. Kim, J. Tong, P. Sinha, S. Pande, T. Krishna, J. Romberg, M. Swaminathan, and S. Mukhopadhyay, “A High-Performance Computing Architecture for Real-Time Digital Emulation of RF Interactions,” IEEE Radar Conference (RaderConf), USA, 2023.

    [3] X. Mao, M. Mukherjee, N.M. Rahman, U. Kamal, S. Sharma, P. Behnam, J. Tong, J. Driscoll, T. Krishna, J. Romberg, and S. Mukhopadhyay, “FPGA-Based High-Performance Real-Time Emulation of Radar System using Direct Path Compute Model,” International Microwave Symposium (IMS), USA, 2023.

    [4] M. Mukherjee, N. M. Rahman, S. Sharma, U. Kamal, X. Mao, P. Behnam, D. Kim, J. Tong, J. Woo, P. Sinha, C. Delude, J. Driscoll, J. Seo, S. Pande, T. Krishna, J. Romberg, M. Swaminathan, and S. Mukhopadhyay, “A Near-Memory Accelerator for Real-Time Emulation of RF Interactions,” GomacTech, USA, 2023. (Les Palukti Best Student Poster Paper Award)

  7. Energy-Efficient System Design
  8. [1] P. Behnam and M. Bojnordi, “Adaptively Reduced DRAM Caching for Energy-Efficient High Bandwidth Memory,” IEEE Transactions on Computers (TC), 2022.

    [2] B. Khodabandeloo, A. Khonsari, P. Behnam, A. Majidi, and M. H. Hajiesmaili, “Stereo: Assignment and Scheduling in MPSoC under Process Variation by Combining Stochastic and Decomposition Approaches,” IEEE Transactions on Computers (TC), 2022.

    [3]P. Behnam and M. Bojnordi, “RedCache: Reduced DRAM Caching,” 57’th IEEE International Conference on Design Automation Conference (DAC), USA, 2020.

    [4] P. Behnam and M. Bojnordi, “STFL-DDR: Improving the Energy-Efficiency of Memory Interface,”  IEEE Transactions on Computers (TC), 2020. (Featured Paper of the Month)

    [5] P. Behnam and M. Bojnordi, “STFL: Energy-Efficient Cache Interface using Slow Transition Fast Level Signaling,” 56’th IEEE International Conference on Design Automation Conference (DAC), USA, 2019.

    [6] P. Behnam and M.Bojnordi, "Optical Memory Systems," ACM SIGARCHBlog Article, 2019.

    [7] P. Behnam, A. Chowdhury, N. Rauniyar, N. Sedaghati, and M. Bojnordi, “XCache: Energy-Efficient In-Package Cache Architecture using 3D-stacked Memristive Crosspoint,” 55’t ACM/IEEE Design Automation Conference, Work-in-Progress Session(DAC-WIP), USA, 2018.

    [8] P. Behnam, A. Chowdhury, and M. Bojnordi, “R-Cache: A Highly Set-Associative In-Package Cache using Memristive Arrays,” 36’th IEEE International Conference on Computer Design (ICCD), USA, 2018.

    [9] P. Behnam, N.Sedaghati, and M. Bojnordi, “Adaptive Time-based Encoding for Energy-Efficient Large Cache Architectures,” 5’th ACM Workshop on Energy Efficient Supercomputing held in conjunction with SC (E2SC), USA, 2017.

    [10] P. Behnam, S. Taheri, and B. Alizadeh, “Improving Thermal-aware Placement and Routing in 3D FPGAs,” 10’th International Conference on Reconfigurable Computing and FPGAs (ReConFig), Mexico, 2015 (Accepted).

  9. Security Verification & Pre-Silicon Debug
  10. [1] P. Behnam, “ Validation of Hardware Security and Trust: A Survey,” arXiv: 801.00649, 2018.

    [2] P. Behnam, B. Alizadeh, S. Taheri, “Automated Formal Equivalence Verification of Pipelined Nested Loops in Datapath Designs,” arXiv: 1712.09818, 2017.

    [3] P. Behnam, B. Alizadeh, S.Taheri, and M. Fujita, “Formally Analyzing Fault Tolerance in Datapath Designs Using Equivalence Checking,” 21’st IEEE Asia and South Pacific Design Automation Conference (ASP-DAC), Macao, China, 2016.

    [4] B. Alizadeh, P. Behnam, and S. Sadeghi Kohan, “A Scalable Formal Debugging Approach with Auto-Correction Capability based on Static Slicing and Dynamic Ranking for RTL Datapath Designs,” in IEEE Trans. on Computers (TC), 2015.

    [5] P. Behnam, and B. Alizadeh, “In-circuit Mutation-based Automatic Correction of Certain Design Errors Using SAT Mechanisms,” 24’th IEEE Asian Test Symposium (ATS), India, 2015.

    [6] P. Behnam, B. Alizadeh, and Z. Navabi, “Automatic Correction of Certain Design Errors Using Mutation Technique,” 19’th IEEE European Test Symposium (ETS), Germany, 2014.

    [7] S. Sadeghi, P. Behnam, B. Alizadeh, M. Fujita, and Z. Navabi, “Improving Polynomial Datapath Debugging with HEDs,”19’th IEEE European Test Symposium (ETS), Germany, 2014.

    [8] H. Haghbayan, B. Alizadeh, P. Behnam, and S. Safari, “Formal Verification and Debugging of Array Dividers With Auto-correction Mechanism,” 27’th IEEE VLSI Design Conference (VLSID), India, 2014.

    [9] B. Alizadeh, and P. Behnam, “Formal Equivalence Verification and Debugging Techniques with Auto-Correction Mechanism for RTL Designs,” in Elsevier Microprocessor and Microsystems - Embedded Hardware Design (MICPRO), 2013.

  11. VLSI Testing & Post-Silicon Debug
  12. [1] F. Zokaee, H. Sabaghian-Bidgoli, V. Janfaza, P. Behnam, and Z. Navabi, “ A novel SAT-based ATPG approach for transition delay faults,” 19’th IEEE International High-Level Design Validation and Test Workshop (HLDVT), USA, 2017.

    [2] H. Sabaghian-Bidgoli, P. Behnam, B. Alizadeh, and Z.Navabi, “Reducing Search Space for Fault Diagnosis: A Probability-Based Scoring Approach,” 15’th IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Germany, 2017.

    [3] V. Janfaza, P. Behnam, B. Forouzandeh, B. Alizadeh, “A Low-power Enhanced Bitmask-dictionary Scheme for Test Data Compression,” 12’th IEEE Symposium on Very Large Scale Integration (ISVLSI), USA, 2014.

    [4] P. Behnam, S. Sabaghian-Bidgoli, B. Alizadeh, K. Mohajerani, and Z. Navabi, “A Probabilistic Approach for Counterexample Generation to Aid Design Debugging,” 11’th IEEE East-West Design and Test Symposium (EWDTS), Russia, 2013.

    [5] V. Janfaza, B. Forouzandeh, P. Behnam, and M. Najafi, “ Hybrid History-based Test Overlapping to Reduce Test Application Time,” 11’th IEEE East-West Design and Test Symposium (EWDTS), Russia, 2013.