基本信息
郑成诗      中国科学院声学研究所
电子邮件: cszheng@mail.ioa.ac.cn
通信地址: 北京市北四环西路21号
邮政编码: 100190

研究领域

Communication Acoustics (通信声学):

  1. Single-Channel Speech Processing(单通道语音处理);

  2. Microphone Array Signal Processing (传声器阵列信号处理);

  3. Machine Learning for Speech and Audio Processing(机器学习语音和音频处理);

  4. Hearing-Aid Signal Processing (助听器信号处理).

招生信息

2026年入学学硕(占用声学所名额)已招满,专硕仍有少量名额,持续接收高校和其他研究所联合培养优秀学生;

2026年入学的中国科学院大学未来学院硕士研究生欢迎联系,持续招生中;

2027年入学博士研究生招生中(需通过统考和面试);

2027年入学免试硕士研究生和直博生招生中......


联系方式:cszheng@mail.ioa.ac.cn


招生专业
081002-信号与信息处理
081001-通信与信息系统
招生方向
深度学习语音信号处理
传声器阵列声学信号处理
音频信号处理
最新信息


教育背景

2020-06--中国科学院声学研究所   博士生导师
2018-08--中国科学院声学研究所   研究员
2014-11--2015-10   德国埃尔朗根-纽伦堡大学   访问学者
2014-06--中国科学院声学研究所   硕士生导师
2012-01--2018-07   中国科学院声学研究所   副研究员
2009-07--2011-12   中国科学院声学研究所   助理研究员
2004-09--2009-06   中国科学院声学研究所   博士学位
2001-09--2003-06   中国科学技术大学   管理学双学士学位
1999-09--2004-06   中国科学技术大学   学士学位

工作经历

工作简历
2020-06~现在, 中国科学院声学研究所, 博士生导师
2018-08~现在, 中国科学院声学研究所, 研究员
2014-11~2015-10,德国埃尔朗根-纽伦堡大学, 访问学者
2014-06~现在, 中国科学院声学研究所, 硕士生导师
2012-01~2018-07,中国科学院声学研究所, 副研究员
2009-07~2011-12,中国科学院声学研究所, 助理研究员
社会兼职
2024-08-15-2028-08-15,中国电子音响行业协会, 标准化技术委员会副主任
2023-12-01-今,北京听力协会, 理事
2023-10-20-今,《声学学报》, 第一届青年编委
2023-02-28-今,山东省低空监测网技术重点实验室, 第一届学术委员会委员
2022-12-31-今,中国声学学会, 理事
2022-08-01-今,Sound & Vibration, Editorial Board
2021-12-27-今,中国人工智能学会, 智能传媒专委会第二届委员
2021-12-01-今,深圳市音响行业协会, 专家委员会专家
2021-09-01-今,《中国传媒大学学报(自然科学版)》, 青年编委
2021-08-01-今,Frontiers in Signal Processing, Review Editor
2019-11-01-今,通信学会人工智能技术与应用委员会, 委员
2019-01-01-今,中国高科技产业化研究会智能信息处理产业化分会, 理事
2015-11-20-今,IEEE, Senior Member
2010-01-01-今,EURASIP, Member

教授课程

信号与系统
电声原理与应用
声学基础
声学基础习题课

专利与奖励

   
奖励信息
(1) 中国科学院大学优秀本科课程, 研究所(学校), 2025
(2) 中国科学院大学优秀本科生课程, , 其他, 2024
(3) 中国科学院优秀导师, 院级, 2023
(4) 华为公司火花奖, 其他, 2023
(5) Outstanding Reviewer Awards (INTERSPEECH 2023), 一等奖, 其他, 2023
(6) 中国科学院大学优秀本科生指导教师, 研究所(学校), 2022
(7) 环境保护科学技术奖, 二等奖, 部委级, 2022
(8) 汪德昭青年科技奖, 一等奖, 研究所(学校), 2021
(9) ICASSP2021 DNS Challenge, 一等奖, 其他, 2021
(10) INTERSPEECH2021 DNS Challenge, 一等奖, 其他, 2021
(11) 环境技术进步奖, 二等奖, 部委级, 2020

出版信息


SCI&SSCI

Book Chapter

 

[1].    C. Zheng, Y. Ke, X. Luo, and X. Li. Convolutional neural network-based models for speech denoising and dereverberation: algorithms and applications. in M. Naved, V. A. Devi, L. Gaur, and A. A. Elngar. (eds). IoT-enabled convolutional neural networks: techniques and applications. River Publisher, Denmark, 2023.

 

Publication


International Journal Papers (SCI Index or SSCI Index)


International Journal Papers (SCI Index or SSCI Index)

[1]. Y. Liang, F. Liu, A. Li, X. Li, C. Lei, and C. Zheng*. NaturalL2S: End-to-End High-quality Multispeaker Lip-to-Speech Synthesis with Differential Digital Signal Processing. Neural Networks, 194(2026) 108163.

[2]. L. Wu*, X. Zhang*, Y. Zhang*, C. Zheng*, T. Liu*, L. Xie*, C. Zheng*, and E. Yin*. Bridging semantics across modalities: Decoupled representation learning for audio-visual speech recognition. Knowledge-Based Systems, 331(2026)114722.

[3]. L. Dai, Y. Ke, A. Li, X. Li, and C. Zheng*. SFNet: A Two-Stage Source-Filter-Based Neural Network for Real-Time Speech Bandwidth Extension. IEEE Transactions on Audio, Speech and Language Processing, 34 (2025): 169-183.

[4]. X. Luo, Y. Ke, X. Li, and C. Zheng*. Deep Informed Spatio-Spectral Filtering for Multi-channel Speech Extraction against Steering Vector Uncertainties. Applied Acoustics, 228(2025)110259.

[5]. H. Zhang, B. C. J. Moore, F. Jiang, M. Diao, X. Li, and C. Zheng*. Neural-WDRC: A deep-learning wide dynamic range compression method combined with controllable noise reduction for hearing aids. Trends in Hearing, 29 (2025): 23312165241309301.

[6]. J. Wang, Q. Yang, S. Stenfelt, X. Wang, X. Lu, J. Sang*, and C. Zheng*. Sound localization with bilateral bone conduction stimulation: Influence of stimuli and simulated flat sensorineural hearing loss. Measurement, 259(2025)119752.

[7]. J. Wang, H. Zheng, S. Stenfelt, Q. Qu, J. Sang, and C. Zheng. Externalization of Virtual Sound Sources with Bone and Air Conduction Stimulation. Trends in Hearing, 29 (2025): 23312165251378355.

[8]. F. Hao, X. Li, and C. Zheng*. X-TF-GridNet: A Time-Frequency Domain Target Speaker Extraction Network with Adaptive Speaker Embedding Fusion. Information Fusion, 112(2024)102550.

[9]. A. Li, G. Yu, Z. Xu, C. Fan, X. Li, and C. Zheng*. TaBE: decoupling spatial and spectral processing with Taylor’s unfolding method for multi-channel speech enhancement. Information Fusion, 101 (2024)101976.

[10]. W. Meng, X. Li, X. Luo, X. Li, and C. Zheng*. Deep Kronecker Product Beamforming for Large-scale Microphone Arrays. IEEE-ACM Transactions on Audio, Speech, and Language Processing, Vol. 32, pp. 4537-4553, 2024.

[11]. C. Xu, B. C. J. Moore, X. Li, and C. Zheng*. Predicting the intelligibility of Mandarin Chinese with manipulated tonal information. J. Acoust. Soc. Am., Vol. 156, pp. 3088–3101, 2024.

[12]. X. Luo, Y. Ke*, X. Li, and C. Zheng. On phase recovery and preserving early reflections for deep-learning speech dereverberation. J. Acoust. Soc. Am., vol. 155, pp. 436-451, 2024.

[13]. W. Meng, J. Li, Y. Ge, X. Li, and C. Zheng*. Frame-wise speech extraction with recursive expectation maximization for partially deformable microphone arrays. Digital Signal Processing, 151(2024) 104530.

[14]. Y. Ge, W. Meng, X. Li, and C. Zheng*. Geometry Calibration for Deformable Linear Microphone Arrays with Bézier Curve Fitting. IEEE Signal Processing Letters, vol. 31, pp. 1620-1624, June 2024.

[15]. Y. Zhang, J. Sang, C. Zheng*, and X. Li*. A denoising-aided multi-task learning method for blind estimation of reverberation time. Measurement, 231(2024)114568.

[16]. C. Fan*, J. Xue, J. Tao, J. Yi, C. Wang, C. Zheng, and Z. Lv. Spatial reconstructed local attention Res2Net with F0 subband for fake speech detection. Neural Networks, 175 (2024)106320.

[17]. J. Xu, J. Li*, W. Meng, X. Li, and C. Zheng. Low-complexity frequency-invariant beampattern synthesis using accurate response control for speech extraction. Applied Acoustics, 224(2024)110129.

[18]. C. Zheng*, H. Zhang, W. Liu, X. Luo, A. Li, X. Li, and B. C. J. Moore. Sixty years of frequency-domain monaural speech enhancement: from traditional to deep learning algorithms. Trends in Hearing, 2023;27. doi:10.1177/23312165231209913.

Paper: https://journals.sagepub.com/doi/full/10.1177/23312165231209913

Source Codes: https://github.com/cszheng-ioa/Sixty-years-of-frequency-domain-monaural-speech-enhancement

[19]. C. Zheng*, C. Xu, M. Wang, X. Li, and B. C. J. Moore. Evaluation of deep marginal feedback cancellation for hearing aids using speech and music. Trends in Hearing, 2023;27. doi:10.1177/23312165231192290.

[20]. A. Li, G. Yu, C. Zheng*, W. Liu, X. Li. A General Unfolding Speech Enhancement Method Motivated by Taylor's Theorem. IEEE-ACM Transactions on Audio, Speech, and Language Processing, vol. 31, pp. 3629-3646, 2023.

[21]. F. Hao, X. Li, and C. Zheng*. End-to-end neural speaker diarization with an iterative attractor estimation. Neural Networks, 166 (2023): 566-578.

[22]. G. Yu, A. Li, H. Wang, W. Liu, Y. Zhang, Y. Wang, and C. Zheng*. FSI-Net: a dual-stage Full- and Sub-band Integration Network for full-band speech enhancement. Applied Acoustics, 211(2023)109539.

[23]. W. Meng, M. Yuan, C. Zheng*, and X. Li. A Comparison of Robust Capon Beamformers using a Large-scale Microphone Array for Speech Extraction. Applied Acoustics, 202(2023)109123.

[24]. R. Zhang, R. Meng, J. Sang, Y. Hu, X. Li, and C. Zheng*. Modeling individual HRTF based on anthropometric parameters and generic HRTF amplitudes. CAAI Transactions on Intelligence Technology, vol. 8, no. 2, pp. 364-378, 2023

[25]. J. Wang, Y. Chen, S. Stenfelt, J. Sang*, X. Li, and C. Zheng*. Analysis of cross-talk cancellation of bilateral bone conduction stimulation. Hearing Research 434 (2023): 108781.

[26]. Z. Han, Y. Ke, X. Li, and C. Zheng*. Parallel processing of distributed beamforming and multichannel linear prediction for speech denoising and dereverberation in wireless acoustic sensor networks. EURASIP Journal of Audio, Speech, and Music Processing, 25(2023).

[27]. Z. Jiang, J. Sang*, C. Zheng, A. Li, and X. Li. Modeling individual HRTFs from Sparse Measurements based on U-net. J. Acoust. Soc. Am., vol. 153, pp. 248-259, 2023.

[28]. Y. Nie, J. Sang*, C. Zheng, et al. A calibration method for bone conduction transducers using electrical input impedance[J]. Applied Acoustics, 213(2013)109631.

[29]. C. Fan*, H. Zhang, A. Li, X. Wang, C. Zheng, L. Zhao, and X. Wu. CompNet: Complementary network for single-channel speech enhancement. Neural Networks, vol. 168, pp. 508-517, 2023.

[30]. G. Li, C. Zheng, Y. Ke*, and X. Li. Deep learning-based acoustic echo cancellation for surround sound systems. Applied Sciences, 2023, 13, 1266.

[31]. C. Zheng*, M. Wang, X. Li, and B. C. J. Moore. A deep learning solution to the marginal stability problems of acoustic feedback systems for hearing aids. J. Acoust. Soc. Am., vol. 152, no. 6, pp. 3616-3634, 2022.

[32]. C. Zheng*, W. Liu, A. Li, Y. Ke, and X. Li. Low-latency monaural speech enhancement with deep filter-bank equalizer. J. Acoust. Soc. Am., vol. 151, no. 5, pp. 3291-3304, 2022.

[33]. A. Li, C. Zheng*, G. Yu, J. Cai, and X. Li. Filtering and Refining: A Collaborative-Style Framework for Single-Channel Speech Enhancement. IEEE-ACM Transactions on Audio, Speech, and Language Processing, vol.30, pp. 2156-2172, 2022.

[34]. G. Yu, A. Li, H. Wang, Y. Wang, Y. Ke, and C. Zheng*. DBT-Net: Dual-branch federative magnitude and phase estimation with attention-in-attention transformer for monaural speech enhancement. IEEE-ACM Transactions on Audio, Speech, and Language Processing, vol. 30, pp. 2629-2644, 2022.

[35]. W. Liu, A. Li, C. Zheng*, and X. Li. A Separation and Interaction Framework for Causal Multi-channel Speech Enhancement. Digital Signal Processing, 126(2022)103519.

[36]. A. Li, C. Zheng*, L. Zhang, and X. Li. Glance and Gaze: A Collaborative Learning Framework for Single-channel Speech Enhancement. Applied Acoustics, 187(2022)108499.

[37]. F. Liu, H. Wang, Y. Ke, and C. Zheng*. One-shot voice conversion using a combination of U2-Net and vector quantization. Applied Acoustics, 199(2022)109014.

[38]. F. Zhang, J. Li, W. Meng, X. Li, and C. Zheng*. A Vehicle Whistle Database for Evaluation of Outdoor Acoustic Source Localization and Tracking using an Intermediate-Sized Microphone Array. Applied Acoustics, 201(2022)109113.

[39]. X. Luo, C. Zheng, A. Li, Y. Ke*, and X. Li. Analysis of trade-offs between magnitude and phase estimation in loss functions for speech denoising and dereverberation. Speech Communication, 145(2022)71-87.

[40]. K. Zheng, C. Zheng, J. Sang*, Y. Zhang, and X. Li. Noise-robust blind reverberation time estimation using noise-aware time-frequency masking. Measurement, 192(2022)110901.

[41]. Y. Nie, J. Wang*, C. Zheng, J. Xu, X. Li, Y. Wang, B. Zhong, J. Cai, and J. Sang*. Measurement and modeling of the mechanical impedance of human mastoid and condyle. J. Acoust. Soc. Am., vol. 151, pp. 1434-1448, 2022.

[42]. J. Wang, X. Lu, J. Sang*, J. Cai, and C. Zheng. Effects of stimulation position and frequency band on auditory spatial perception with bilateral bone conduction. Trends in Hearing, vol. 26, pp. 1-17, 2022.

[43]. J. Wang, S. Stenfelt, S. Wu, Z. Yan, J. Sang*, C. Zheng, and X. Li. The Effect of Stimulation Position and Ear Canal Occlusion on Perception of Bone Conducted Sound. Trends in Hearing, vol. 26, pp. 1-15, 2022.

[44]. W. Liu, A. Li, X. Wang*, M. Yuan, Y. Chen, C. Zheng, and X. Li. A Neural Beamspace-Domain Filter for Real-Time Multi-Channel Speech Enhancement. Symmetry, 2022, 14(6), 1081.

[45]. K. Zheng, R. Meng, C. Zheng, X. Li, J. Sang*, J. Cai, J. Wang, and X. Wang. EmotionBox: A music-element-driven emotional music generation system based on music psychology. Frontiers in Psychology, 13(2022)841926.

[46]. Y. Nie, J. Sang*, C. Zheng, J. Xu, F. Zhang, and X. Li. An objective bone conduction verification tool using a piezoelectric thin-film force transducer. Front. Neurosci., 16(2022)1068682.

[47]. A. Li, W. Liu, C. Zheng*, C. Fan, and X. Li. Two Heads Are Better Than One: A Two-Stage Complex Spectral Mapping Approach for Monaural Speech Enhancement. IEEE-ACM Transactions on Audio, Speech, and Language Processing, vol. 29, pp. 1829-1843, 2021.

[48]. W. Meng, Y. Ke, J. Li, C. Zheng*, and X. Li. Finite Data Performance Analysis of One-Bit MVDR and Phase-Only MVDR. Signal Processing, 183(2021)108018.

[49]. G. Yu, Y. Wang, H. Wang, Q. Zhang, and C. Zheng*. A two-stage complex network using cycle-consistent generative adversarial networks for speech enhancement. Speech Communication, vol. 134, pp. 42-54, Nov. 2021.

[50]. L. Cheng, R. Peng, A. Li, C. Zheng*, and X. Li. Deep Learning-based Stereophonic Acoustic Echo Suppression without Decorrelation. J. Acoust. Soc. Am., vol. 150, pp. 816-829, 2021.

[51]. X. Guo, M. Yuan, Y. Ke, C. Zheng*, and X. Li. Distributed Node-Specific Block-Diagonal LCMV Beamforming in Wireless Acoustic Sensor Networks. Signal Processing, 185(2021)108085.

[52]. J. Wang, J. Zhang, J. Xu, C. Zheng*, and X. Li. An optimization framework for designing robust cascade biquad feedback controllers on active noise cancellation headphones. Applied Acoustics, 179(2021)108081.

[53]. J. Zhang, C.Zheng*, F. Zhang, and X. Li. A Low-complexity Volterra Filtered-Error LMS Algorithm with a Kronecker Product Decomposition. Applied Sciences, 2021, 11, 9637.

[54]. J. Wang, Y. Guan, C. Zheng, R. Peng*, and X. Li. A temporal-spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission. J. Acoust. Soc. Am., vol. 150, pp. 2577-2588, 2021.

[55]. Y. Ke, A. Li, C. Zheng, R. Peng*, and X. Li. Low-complexity artificial noise suppression methods for deep learning-based speech enhancement algorithms. EURASIP Journal of Audio, Speech, and Music Processing, (2021)2021:17.

[56]. F. Liu, H. Wang, R. Peng*, C. Zheng, and X. Li. U2-VC: one-shot voice conversion using two-level nested U-structure. EURASIP Journal of Audio, Speech, and Music Processing, 2021, 40 (2021).

[57]. A. Li, C. Zheng, R. Peng*, and X. Li. On the importance of power compression and phase estimation in monaural speech dereverberation. JASA Express Letters, 1, 014802(2021).

[58]. R. Meng, J. Xiang, J. Sang*, C. Zheng, X. Li, S. Bleeck, J. Cai, and J. Wang. Investigation of an MAA Test with Virtual Sound Synthesis. Frontiers in Psychology, 12(2021)656052.

[59]. J. Ding, Y. Ke, L. Cheng, C. Zheng*, and X. Li. Joint estimation of binaural distance and azimuth by exploiting deep neural networks. J. Acoust. Soc. Am., vol. 147, pp. 2625-2635, 2020.

[60]. J. Ding, J. Li, C. Zheng*, and X. Li. Wideband sparse Bayesian learning for off-grid binaural sound source localization. Signal Processing, 166(2020)107250.

[61]. A. Li, M. Yuan, C. Zheng*, and X. Li. Speech enhancement using progressive learning-based convolutional recurrent neural network. Applied Acoustics, 166(2020)107347.

[62]. A. Li, R. Peng, C. Zheng*, and X. Li. A Supervised Speech Enhancement Approach with Residual Noise Control for Voice Communication. Applied Sciences, 2020, 10, 2894.

[63]. Z. Jiang, J. Sang*, C. Zheng, and X. Li. The effect of pinna filtering in binaural transfer functions on externalization in a reverberant environment. Applied Acoustics, 164(2020) 107257.

[64]. G. Li, C. Zheng, X. Li, T. Yu, S. Bleeck, and J. Sang*, Evaluation of headphone phase equalization on sound reproduction. Applied Acoustics, vol. 156, pp. 208-216, 2019.

[65]. C. Zheng, A. Deleforge, X. Li*, and W. Kellermann. Statistical analysis of the multichannel Wiener filter using a bivariate normal distribution for sample covariance matrices. IEEE/ACM Transactions on Audio, Speech and Language Processing, vol. 26, no. 5, pp. 951-966, 2018.

[66]. C. Zheng*, Z. Tan, R. Peng, and X. Li. Guided spectrogram filtering for speech dereverberation. Applied Acoustics, vol. 134, pp. 154-159, 2018.

[67]. R. Peng, Z. Tan, X. Li, and C. Zheng*. A perceptually motivated LP residual estimator in noisy and reverberant environments. Speech Communication, vol. 96, pp. 129-141, 2018.

[68]. Y. Ke, C. Zheng*, R. Peng, and X. Li. Robust Adaptive Beamforming using Noise Reduction Preprocessing-based Fully Automatic Diagonal and Steering Vector Estimation. IEEE Access, vol.5, pp. 12974-12987, 2017.

[69]. H. Yang, J. Wang, C. Zheng*, and X. Li. Stereophonic Channel Decorrelation Using a Binaural Masking Model. Applied Acoustics, vol.110, no. 9, pp. 128-136, Sept. 2016.

[70]. C. Zheng*, C. Hofmann, X. Li, and W. Kellermann. Analysis of additional stable gain by frequency shifting for acoustic feedback suppression using statistical room acoustics. IEEE Signal Processing Letters, vol.23, no. 1, pp. 159-163, Jan. 2016.

[71]. C. Lei, J. Xu, C. Zheng*, and X. Li. Active headrest with robust performance against head movement. Journal of Low Frequency Noise; Vibration and Active Control, vol. 34, no. 3, pp. 233-250, 2015.

[72]. X. Li, Z. Cai, C. Zheng*, and X. Li. Equalization of loudspeaker response using balanced model truncation. J. Acoust. Soc. Am., vol. 137, no. 4, pp. EL241-EL247, 2015.

[73]. J. Sang, H. Hu, C. Zheng, G. Li, M. E. Lutman, and S. Bleeck. Speech quality evaluation of a sparse coding shrinkage noise reduction algorithm with normal hearing and hearing impaired listeners. Hearing Research, vol. 327, pp. 175-185, 2015.

[74]. C. Zheng*, R. Peng, J. Li, and X. Li. A constrained MMSE LP residual estimator for speech dereverberation in noisy environments. IEEE Signal Processing Letters, vol. 21, no. 12, pp. 1462-1466, Dec. 2014

[75]. C. Zheng*, H. Yang, and X. Li. On generalized auto-spectral coherence function and its applications to signal detection. IEEE Signal Processing Letters, vol. 21, no. 5, pp. 559-563, May 2014.

[76]. S. Wang, C.Zheng*, R. Peng, and X. Li. A statistical analysis of power-level- difference-based dual-channel post-filter estimator. Applied Acoustics, vol. 83, pp. 40-46, 2014.

[77]. J. Sang, H. Hu, C. Zheng, G. Li, M. E. Lutman, and S. Bleeck. Evaluation of the sparse coding shrinkage noise reduction algorithm in normal hearing and hearing impaired listeners. Hearing Research, vol. 310, no. 4, pp. 36-47, 2014.

[78]. R. Peng, C. Zheng*, and X. Li. Two-stage optimization algorithm for adaptive IIR notch filter. Electronic Letters, vol. 50, no. 14, pp. 985-987, 2014.

[79]. C. Zheng*, H. Liu, R. Peng,and X. Li. A Statistical Analysis of Two-Channel Post-Filter Estimators inIsotropic Noise Fields. IEEE Trans. on Audio, Speech, and Lang. Process., vol. 21, no. 2, pp. 336-342, 2013.

[80]. H. Hu, S. Wang, C. Zheng*, and X. Li. A cepstrum-based preprocessing and postprocessing for speech enhancement in adverse environments. Applied Acoustics, vol. 74, no. 12, pp. 1458-1462, 2013.

[81]. J. Wang, H. Liu, C. Zheng*, and X. Li. Spectral subtraction based on two-stage spectral estimation and modified cepstrum thresholding. Applied Acoustics, vol. 74, no. 3, pp. 450-458, 2013.

[82]. C. Zheng*. On second-order statistics of log-periodogram and cepstral coefficients for processes with mixed spectra. Signal Processing, vol. 92, pp. 2560-2565, 2012.

[83]. C. Zheng*, and X. Li. Detection of multiple sinusoids in unknown colored noise using truncated cepstrum thresholding and local signal-to-noise-ratio. Applied Acoustics, vol.73, pp. 809-816, 2012.

[84]. C. Zheng*, Y. Zhou, and X. Li. Generalized framework for the nonparametric coherence function estimation. Electronics Letters, vol. 46, no. 6, pp.450-452, 2010.

[85]. M. Bao, C. Zheng, X. Li, J. Yang, and J. Tian. Acoustical vehicle detection based on bispectral entropy. IEEE Signal Processing Letters, vol. 16, no. 5, pp. 378-381, May 2009.

[86]. C. Zheng*, M. Zhou, and X. Li. On the relationship of non-parametric methods for coherence function estimation. Signal Processing, vol. 88, pp.2863-2867, 2008.


Peer-Reviewed International Conference Proceedings


[1]. A. Li, T. Lei, L. Dai, K. Li, R. Chen, M. Yu, X. Li, D. Yu, C. Zheng*. DegVoC: Revisting Neural Vocoder from a Degradation Perspective. In Proc. AAAI, 2026.

[2]. L. Dai, A. Li*, C. Chi, Y. Liang, X. Li, and C. Zheng*. GOMPSNR: Reflourish the Signal-to-Noise Ratio Metric for Audio Generation Tasks. In Proc. AAAI, 2026.

[3]. Y. Liang, A. Li, K. Yang, G. Yu, F. Liu, L. Dai, X Li, and C. Zheng*. SLD-L2S: Hierarchical Subspace Latent Diffusion for High-Fidelity Lip to Speech Synthesis. In Proc. AAAI, 2026.

[4]. H. Zhang, B. C. J. Moore, L. Dai, F. Hao, X. Li, and C. Zheng*. Audiogram-informed end-to-end noise reduction and wide dynamic range compression for hearing aids. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 06-11, 2025.

[5]. X. Zhang, F. Hao, X. Li, and C. Zheng*. DeepPEM-AFC: an improved prediction-error-method-based adaptive feedback cancellation with deep learning for hearing aids. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 06-11, 2025.

[6]. F. Hao, A. Li, X. Li, and C. Zheng*. DSINet: towards real-time target speaker extraction with dynamic speaker information fusion. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 06-11, 2025.

[7]. X. Lu, Y. Wang, J. Sang, and C. Zheng. BiCG: binaural cue generation for unified HRTF datasets. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, April 06-11, 2025.

[8]. F. Hao, B. C.J. Moore, H. Zhang, X. Li, C. Zheng*. L3C-DeepMFC: Low-Latency Low-Complexity Deep Marginal Feedback Cancellation with Closed-Loop Fine Tuning for Hearing Aids. INTERSPEECH 2025, Rotterdam, The Netherlands, Aug. 17-21, 2025.

[9]. Y. Liang, K. Yang, F. Liu, A. Li, X. Li, and C. Zheng*. LightL2S: Ultra-Low Complexity Lip-to-Speech Synthesis for Multi-Speaker Scenarios. Proc. INTERSPEECH 2025, Rotterdam, The Netherlands, Aug. 17-21, 2025

[10]. L. Dai, A. Li, Z. Han, C. Zheng*, X. Li. BAPEN: Towards Versatile Audio Phase Retrieval. In Proceedings of the 33rd ACM International Conference on Multimedia, 2025: pp. 8293-8302.

[11]. A. Li, T. Lei, Z. Sun. R. Chen, E. Yin, X. Li, and C. Zheng*. Learning Neural Vocoder from Range-Null Space Decomposition. In Proc. IJCAI, 2025.

[12]. T. Lei, Z. Zhang, R. Chen, M. Yu, J. Lu, C. Zheng, D. Yu, and A. Li. BridgeVoC: Neural Vocoder with Schrodinger Bridge. In Proc. IJCAI, 2025.

[13]. Z. Sun, A. Li, T. Lei, R. Chen, M. Yu, C. Zheng, Y. Zhou, and D. Yu. Scaling beyond Denoising: Submitted System and Findings in URGENT Challenge 2025. INTERSPEECH 2025, Rotterdam, The Netherlands, Aug. 17-21, 2025.

[14]. C. Chi, X. Li, Y. Ke, Q. Ni, G. Yao, X. Li, and C. Zheng*. End-to-end multi-channel speaker extraction and binaural speech synthesis." 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2025.

[15]. Y. Ni, A. Li, L. Dai, E. Yin, Q. Ni, and C Zheng*. SinDiffPhase: High-Quality Phase Estimation with Ultra-Fast Single-Step Diffusion. 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2025.

[16]. R. Zhang, Y. Ke, Q. Ni, G. Yao, X. Li, and C. Zheng. Directional Hybrid Optimization of HRTFs for Low-Order Spherical Harmonics Binaural Rendering. 2025 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2025.

[17]. W. Meng, X. Li, A. Li, J. Li, X. Li, and C. Zheng*. All neural Kronecker product beamforming for speech extraction with large-scale microphone arrays. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.

[18]. G. Yu, X. Zheng, N. Li, R. Han, C. Zheng, C. Zhang, C. Zhou, Q. Huang, and B. Yu. BAE-Net: A low complexity and high fidelity bandwith-adpative neural network for speech super-resolution. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.

[19]. F. Hao, H. Zhang, L. Dai, X. Luo, X. Li, and C. Zheng*. RENET: A time-frequency domain general speech restoration network for ICASSP 2024 speech improvement challenge. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.

[20]. L. Dai, Y. Ke, H. Zhang, F. Hao, X. Luo, X. Li, and C. Zheng*. A time-frequency band-split neural network for real-time full-band packet loss concealment. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Seoul, Korea, April 14-19, 2024.

[21]. A. Li, W. Meng, G. Yu, W. Liu, X. Li, and C. Zheng. TaylorBeamixer: Learning Taylor-Inspired All-Neural Multi-Channel Speech Enhancement from Beam-Space Dictionary Perspective. INTERSPEECH 2023, Dublin, Ireland, August 20-24, 2023.

[22]. J. Xu, J. Li, W. Meng, X. Li, and C. Zheng. Low-complexity Broadband Beampattern Synthesis using Array Response Control. INTERSPEECH 2023, Dublin, Ireland, August 20-24, 2023.

[23]. J. Chen, Y. Shi, W. Liu, W. Rao, S. He, A. Li, Y. Wang, Z. Wu, S. Shang, and C. Zheng. Gesper: A Unified Framework for General Speech Restoration. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Greece, June 4-10, 2023.

[24]. A. Li, S. You, G. Yu, C. Zheng*, and X. Li. Taylor, can you hear me now? A Taylor-unfolding framework for monaural speech enhancement. IJCAI-ECAI 2022.

[25]. G. Yu, A. Li, C. Zheng, Y. Guo, Y. Wang, and H. Wang. Dual-branch Attention-In-Attention Transformer for single-channel speech enhancement. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May 22-27, 2022.

[26]. G. Yu, A. Li, Y. Wang, Y. Guo, H. Wang, and C. Zheng. Joint magnitude estimation and phase recovery using Cycle-in-Cycle GAN for non-parallel speech enhancement. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May 22-27, 2022.

[27]. A. Li, W. Liu, C. Zheng, and X. Li. Embedding and Beamforming: All-neural Causal Beamformer for Multichannel Speech Enhancement. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Singapore, May 22-27, 2022.

[28]. A. Li, G. Yu, C. Zheng*, and X. Li. TaylorBeamformer: Learning All-Neural Beamformer for Multi-Channel Speech Enhancement from Taylor’s Approximation Theory, in INTER- SPEECH 2022, Incheon, Korea, Sept. 18-22, 2022.

[29]. W. Meng, C. Zheng*, and X. Li. Fully Automatic Balance between Directivity Factor and White Noise Gain for Large-scale Microphone Arrays in Diffuse Noise Fields, in INTERSPEECH 2022, Incheon, Korea, Sept. 18-22, 2022.

[30]. Y. Guan, G. Yu, A. Li, C. Zheng*, and J. Wang. TMGAN-PLC: Audio Packet Loss Concealment using Temporal Memory Generative Adversarial Network, in INTERSPEECH 2022, Incheon, Korea, Sept. 18-22, 2022.

[31]. L. Cheng, C. Zheng*, A. Li, Y. Wu, R. Peng, and X. Li. A deep complex multi-frame filtering network for stereophonic acoustic echo cancellation, in INTERSPEECH 2022, Incheon, Korea, Sept. 18-22, 2022.

[32]. X. Luo, C. Zheng, A. Li, Y. Ke, and X. Li. Bifurcation and Reunion: A Loss-Guided Two-Stage Approach for Monaural Speech Dereverberation, in INTERSPEECH 2022, Incheon, Korea, Sept. 18-22, 2022.

[33]. A. Li, W. Liu, X. Luo, C. Zheng, and X. Li. ICASSP 2021 Deep Noise Suppression Challenge: Decoupling Magnitude and Phase Optimization with a Two-Stage Deep Network. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Ontario, Canada, June 6-11, 2021.

[34]. R. Peng, L. Cheng, C. Zheng, and X. Li. ICASSP 2021 Acoustic Echo Cancellation Challenge: Integrated Adaptive Echo Cancellation with Time Alignment and Deep Learning-based Residual Echo plus Noise Suppression. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Toronto, Ontario, Canada, June 6-11, 2021.

[35]. A. Li, W. Liu, X. Luo, G. Yu, C. Zheng, and X. Li. A simultaneous denoising and dereverberation framework with target decoupling. in INTERSPEECH 2021, Brno, Czech Republic, Aug. 30-Sept. 3, 2021.

[36]. W. Liu, A. Li, Y. Ke, C. Zheng, and X. Li. Know Your Enemy, Know Yourself: A Unified Two-Stage Framework for Speech Enhancement. in INTERSPEECH 2021, Brno, Czech Republic, Aug. 30-Sept. 3, 2021.

[37]. R. Peng, L. Cheng, C. Zheng, and X. Li. Acoustic Echo Cancellation using Deep Complex Neural Network with Nonlinear Magnitude Compression and Phase Information. in INTERSPEECH 2021, Brno, Czech Republic, Aug. 30-Sept. 3, 2021.

[38]. A. Li, C. Zheng, L. Zhang, and X. Li. Learning to inference with early exit in the progressive speech enhancement. in the 2021 European Signal Processing Conference (EUSIPCO-2021), Virtual Conference, Aug. 23-27, 2021.

[39]. A. Li, C. Zheng, C. Fang, R. Peng, and X. Li. A Recursive Network with Dynamic Attention for Monaural Speech Enhancement. in INTERSPEECH 2020, Shanghai, China, Oct. 25-29, 2020.

[40]. A. Li, C. Zheng, L. Cheng, R. Peng, and X. Li. A time-domain monaural speech enhancement with recursive learning. in 2020 Asia-Pacific Signal and Information Processing Association (APSIPA), Virtual Conference, Dec. 7-10, 2020.

[41]. L. Cheng, C. Zheng, R. Peng, and X. Li. Improvement of DNN-based speech enhancement with non-normalized features by using an automatic gain control. in the 147th AES Convention, New York, Oct. 16- 19, 2019.

[42]. G. Li, R. Peng, C. Zheng, and X. Li. A non-intrusive speech quality assessment model based on DNN. in Proc. of the 26th International Congress on Sound and Vibration, Prague, July 7-11, 2019.

[43]. Y. Leng, C. Zheng, F. Zhang, and X. Li. Fast independent vector analysis using non-overlapping frequency subbands partition and power ratio correlation. in Proc. of the 26th International Congress on Sound and Vibration, Prague, July 7-11, 2019.

[44]. Y. Nie, J. Sang, C. Zheng, and X. Li. Modelling of a chip scale package on the acoustic behavior of a MEMS microphone. in the 147th AES Convention, New York, Oct. 16- Oct. 19, 2019.

[45]. J. Wang, D. Wang, Y. Chen, X. Lu, and C. Zheng. Noise robustness automatic speech recognition with convolutional neural network and time delay neural network. in the 147th AES Convention, New York, Oct. 16- Oct. 19, 2019.

[46]. T. Wei, J. Sang, C. Zheng, and X. Li. Near-Field Compensated Higher-Order Ambisonics Using a Virtual Source Panning Method. in the 145th AES Convention, New York, Oct. 16- Oct. 19, 2018.

[47]. Z. Li, P. Luo, C. Zheng, and X. Li. Vibrational contrast control for local sound source rendering on flat panel loudspeakers. in the 145th AES Convention, New York, Oct. 16- Oct. 19, 2018.

[48]. P. Luo, Z. Li, C. Zheng, and X. Li. Theoretical analysis of the far-field directional active noise control. in the 145th AES Convention, New York, Oct. 16- Oct. 19, 2018.

[49]. Y. Ke, Y. Hu, J. Li, C. Zheng, and X. Li. A Generalized Subspace Approach for Multichannel Speech Enhancement Using Machine Learning-Based Speech Presence Probability Estimation. in the 146th AES Convention, Dublin, Mar. 20- Mar. 23, 2018.

[50]. R. Peng, B. Xu, G. Li, C. Zheng, and X. Li. Long-range Speech Acquirement and Enhancement with Dual-point Laser Doppler Vibrometers. in 23rd Inter. Conf. on Digital Signal Process., Shanghai, Nov. 19-21, 2018.

[51]. J. Li, J. Ding, C. Zheng, and X. Li. An efficient and robust speech dereverberation method using spherical microphone array. in 23rd Inter. Conf. on Digital Signal Process., Shanghai, Nov. 19-21, 2018.

[52]. Z. Jiang, J. Sang, J. Wang, C. Zheng, F. Zhang, and X. Li. An audio loudness compression and compensation method for miniature loudspeaker playback. in the 143rd AES Convention, New York, Oct. 18- Oct. 20, 2017.

[53]. G. Li, Z. Jiang, J. Sang, C. Zheng, R. Peng, and X. Li. Auditory-based smoothing for equalization of headphone-to-eardrum transfer function. in the 143rd AES Convention, New York, Oct. 18- Oct. 20, 2017.

[54]. J. Ding, J. Wang, C. Zheng, R. Peng, and X. Li. Analysis of Binaural Features for Supervised Localization in Reverberant Environments. in the 141th AES Convention, Los Angeles, Sept. 29-Oct. 2, 2016.

[55]. Y. Cui, J. Wang, C. Zheng, and X. Li. Acoustic echo cancellation for asynchronous systems based on resampling adaptive filter coefficients. in the 141th AES Convention, Los Angeles, Sept. 29- Oct. 2, 2016.

[56]. C. Zheng, X. Li, A. Schwarz, and W. Kellermann. Statistical analysis and improvement of coherent-to-diffuse power ratio estimators for dereverberation. in the 15th International Workshop on Acoustics Echo and Noise Control (IWAENC),Xi'an China, Sept. 13-16, 2016.

[57]. C. Zheng, A. Schwarz, W. Kellermann, and X. Li. Binaural coherent-to-diffuse-ratio estimation for dereverberation using an ITD model. in the 2015 European Signal Processing Conference (EUSIPCO-2015), Nice, France, Aug. 31-Sept. 4, 2015.

[58]. R. Peng, C. Zheng, and X. Li. Bandwidth extension for speech acquired by laser Doppler vibrometer with an auxiliary microphone. in the 10th Inter. Conf. on Information, Communications and Signal Processing (ICICS), Singapore, Dec. 2-4, 2015.

[59]. C. Zheng, Y. Ke, R. Peng, X. Li, and Y. Zhou. Statistical analysis of temporal coherence function and its application in howling detecton. in the 19th Inter. Conf. on Digital Signal Processing, Hongkong, China, Aug. 20-23, 2014.

[60]. C. Zheng, S. Wang, R. Peng, and X. Li. Delayless method to suppress transient noise using speech properties and spectral coherence. in the 135th AES Convention, New York, Oct. 17-20, 2013.

[61]. R. Peng, J. Li, X. Chen, X. Li, and C. Zheng. Cepstrum-based preprocessing for howling detection in speech applications. in the 135th AES Convention, New York, Oct. 17-20, 2013.

[62]. J. Wang, C. Zheng, C. Zhang, and Y. Sun. The structure of noise power spectral density-driven adaptive post-filtering algorithm. in the 135th AES Convention, New York, Oct. 17-20, 2013.

[63]. C. Zheng, H. Liu, R. Peng, and X.Li. Temporal Coherence-Based Howling Detection for Speech Applications. in the AES 133rd Convention, San Francisco, 2012.

[64]. C. Zheng, H. Liu, and X. Li. Combining Capon and Bartlett Spectral Estimators for Detection of Multiple Sinusoids in Colored Noise Environments. J. Acoust. Soc. Am., Vol. 131, pp.3444-3444, 2012.

[65]. J.Sang, H. Hu, C. Zheng, G. Li, M. E. Lutman, and S. Bleek. Evaluation of a sparse coding shrinkage algorithm in normal hearing and hearing impaired listeners. in the 20th European Signal Processing Conference (EUSIPCO 2012), Bucharest, Romania, Aug. 27-31, 2012.

[66]. X. Hu, S. Wang, Y. Zhou, X. Li, and C. Zheng. Robustness analysis of time-domain and frequency-domain adaptive null-forming schemes. in the 8th IEEE International Conference on Information, Communications, and Signal Processing (ICICS), Singapore, pp. 1-4, 2011.

[67]. C. Zheng, Y. Zhou, X. Hu, and X. Li. Two-channel post-filtering based on adaptive smoothing and noise properties. in Inter. Conf. Acoustics, Speech, and Signal Processing (ICASSP), Prague, Czech, pp. 1745-1748, May, 22-27, 2011.

[68]. C. Zheng, Y. Zhou, X. Hu, and X. Li. Speech enhancement based on the structure of noise power spectral density. in the 2010 European Signal Processing Conference (EUSIPCO-2010), Aalborg, Denmark, Aug. 23-27, 2010.

[69]. C. Zheng, Y. Zhou, X. Hu, J. Tian, and X. Li. Speech enhancement based on estimating expected values of speech cepstra. in Proc. of 20th Inter. Congress on Acoustics, ICA 2010, Aug. 23-27, Sydney, Australia.


 

国内学术期刊论文

[1]. 王梅煌,章辉勇,徐晨阳,李晓东,郑成诗*. 助听器端到端联合声反馈抑制和去噪去混响研究. 声学学报,vol. 49, no. 6, pp. 1215-1225, 2024.

[2]. 徐嘉懿,厉剑*,李晓东,郑成诗. 结合精确阵列响应控制的时域宽带波束图综合. 声学学报,vol. 49, no. 2, pp. 344-360, 2024.

[3]. 柯雨璇,厉剑,彭任华,郑成诗*,李晓东. 用于自适应滤波形成语音增强的球谐域掩蔽函数估计方法. 声学学报, 2021, 46(1): 67-80.

[4]. 王杰, 陈运达, 陆锡坤, 杨乔赫, 严志豪, 桑晋秋*, 郑成诗. 真人头部骨导效应实验和分析. 声学学报, 2021, 46(4): 687-698.

[5]. 程琳娟, 彭任华, 郑成诗*, 李晓东. 两阶段复数谱卷积循环网络立体声回声消除. 声学学报,2023,48(1): 199-214。

[6]. 厉剑,柯雨璇,郑成诗*,李晓东. 球谐域类正则化宽带超指向性波束形成算法. 声学学报, 2020, 45(2): 145-160.

[7]. 厉剑, 彭任华, 郑成诗*, 李晓东. 球谐域自适应混响抵消与声源定位算法. 声学学报, 2019, 44(5): 874-886.

[8]. 丁建策, 厉剑, 彭任华, 郑成诗*, 李晓东. 室内两步法监督式学习双耳声源距离估计. 声学学报, 2019, 44(4): 405-416.

[9]. 杨鹤飞, 郑成诗, 李晓东. 基于谱优势与非线性变换混合的立体声声学回声消除方法. 电子与信息学报, 2015, 37(2): 373-379.

[10]. 郑成诗, 胡笑浒, 周翊, 李晓东. 基于噪声谱结构特性的谱减法. 声学学报, 2010, 35(2): 215-222.

[11]. 周翊, 郑成诗, 李晓东. 一种用于立体声声学回波消除的新型鲁棒梯度法格梯形自适应滤波算法. 声学学报, 2010, 35(2): 223-229.

[12]. 郑成诗, 周名远, 李晓东. 基于联合语音出现概率的先验信噪比估计算法. 电子与信息学报, 2008, 30(7): 1680-1683.

[13]. 郑成诗, 李晓东, 陈佳路, 田静. 自适应平滑周期图语音增强研究. 声学学报, 2007, 32(5): 461-467.

年度招生信息

   
信息公告

2026年入学学硕(占用声学所名额)已招满,专硕仍有少量名额,持续接收高校和其他研究所联合培养优秀学生;

2026年入学的中国科学院大学未来学院硕士研究生欢迎联系,持续招生中;

2027年入学博士研究生招生中(需通过统考和面试);

2027年入学免试硕士研究生和直博生招生中......


联系方式:cszheng@mail.ioa.ac.cn