基本信息

易江燕 女 硕导 中国科学院自动化研究所
电子邮件: jiangyan.yi@nlpr.ia.ac.cn
通信地址: 北京市海淀区中关村东路95智能化大厦707
邮政编码: 100190
电子邮件: jiangyan.yi@nlpr.ia.ac.cn
通信地址: 北京市海淀区中关村东路95智能化大厦707
邮政编码: 100190
研究领域
语音信息处理、语音识别与合成、个性化语音生成与鉴别、小数据建模、迁移学习
招生信息
招生专业
081104-模式识别与智能系统
招生方向
语音识别与合成语音信息安全人机交互
教育背景
2015-09--2018-06 中国科学院自动化研究所 博士2007-09--2010-07 中国社会科学院研究生院 硕士
工作经历
工作简历
2020-10~现在, 中国科学院自动化研究所, 副研究员2018-07~2020-10,中国科学院自动化研究所, 助理研究员2011-09~2014-11,阿里巴巴集团, 资深算法工程师
社会兼职
2022-05-07-2022-05-27,ICASSP 2022, Session Chair
2021-12-01-2022-09-24,Interspeech 2022, Area Chairs
2021-10-09-今,IEEE信号处理学会语音与语言处理技术委员会IEEE SLTC, Associate Member
2021-04-30-2021-06-15,ICASSP 2021, Session Chair
2020-11-23-今,APSIPA SLA Technical Committee, TC member
2020-10-25-2020-10-29,Interspeech 2020, Session Co-Chairs
2019-09-15-2020-10-30,Interspeech 2020, Area Co-Chairs
2019-09-15-2019-09-19,Interspeech 2019, Session Chair
2019-08-22-今,全国人机语音通讯学术会议(NCMMSC)常设机构, 委员
2019-01-01-2020-04-01,APSIPA 2019, Publication Co-Chairs
2018-12-01-2020-02-01,NCMMSC 2019出版委员会, 出版共同主席
2018-10-11-今,中国计算机学会语音对话与听觉专委会, 委员
2021-12-01-2022-09-24,Interspeech 2022, Area Chairs
2021-10-09-今,IEEE信号处理学会语音与语言处理技术委员会IEEE SLTC, Associate Member
2021-04-30-2021-06-15,ICASSP 2021, Session Chair
2020-11-23-今,APSIPA SLA Technical Committee, TC member
2020-10-25-2020-10-29,Interspeech 2020, Session Co-Chairs
2019-09-15-2020-10-30,Interspeech 2020, Area Co-Chairs
2019-09-15-2019-09-19,Interspeech 2019, Session Chair
2019-08-22-今,全国人机语音通讯学术会议(NCMMSC)常设机构, 委员
2019-01-01-2020-04-01,APSIPA 2019, Publication Co-Chairs
2018-12-01-2020-02-01,NCMMSC 2019出版委员会, 出版共同主席
2018-10-11-今,中国计算机学会语音对话与听觉专委会, 委员
教授课程
语音信息处理语音交互
专利与奖励
奖励信息
(1) 2022年度第十二届吴文俊人工智能科学技术奖, 特等奖, 部委级, 2022(2) 语音顶级会议国际会议ICASSP 2021多说话人多风格音色克隆大赛极少样本赛道, 一等奖, 其他, 2021(3) 第十九届全国信号处理学术年会最佳论文, , 其他, 2019(4) 第十三届全国人机语音通讯学术会议最佳论文, 其他, 2019(5) Intel AIDC Beijing Best Poster Award, 其他, 2018
专利成果
[1] 陶建华, 马浩鑫, 易江燕. 一种无需原始数据存储的持续性学习生成语音特征的方法. CN113299315A, 2022-08-24.[2] 陶建华, 汪涛, 易江燕, 傅睿博. 一种统一的语音合成与语音转换的训练方法和系统. CN114495898A, 2022-05-13.[3] 傅睿博, 陶建华, 易江燕. 语音对抗样本生成方法及装置、电子设备及存储介质. CN114267363A, 2022-04-01.[4] 陶建华, 王诗明, 傅睿博, 易江燕. 一种细粒度韵律建模的语音生成模型、设备及存储介质. CN114093342A, 2022-02-25.[5] 陶建华, 田正坤, 易江燕. 端到端语音转写模型的训练方法、系统、装置. CN: CN110689879B, 2022-02-25.[6] 陶建华, 傅睿博, 易江燕. Method for detecting voice splicing points and storage medium. US17/668,074, 2022-02-09.[7] 陶建华, 张帅, 易江燕. 一种可定制的中英混合语音识别端到端系统. CN113936641A, 2022-01-14.[8] 陶建华, 田正坤, 易江燕. 语音识别模型的训练方法、语音识别方法和系统. CN113936647A, 2022-01-14.[9] 陶建华, 张帅, 易江燕. 一种语音识别与语音翻译端到端系统及设备. CN113920989A, 2022-01-11.[10] 易江燕, 陶建华, 傅睿博, 田正坤. 生成语音的检测方法、装置、电子设备及存储介质. CN113808579A, 2021-12-17.[11] 陶建华, 汪涛, 易江燕. 编辑音频的方法、装置、电子设备及存储介质. CN113724686A, 2021-11-13.[12] 陶建华, 张帅, 易江燕. 统一中英混合文本生成和语音识别的端到端系统. CN113284485B, 2021-11-09.[13] 傅睿博, 陶建华, 易江燕. 语音拼接点检测方法及存储介质. CN113555007A, 2021-10-26.[14] 易江燕, 陶建华, 田正坤, 傅睿博. 篡改音频的篡改区域检测方法、装置及存储介质. CN113555037A, 2021-10-26.[15] 易江燕, 陶建华, 田正坤, 傅睿博. 基于知识迁移的电话信道虚假语音鉴别方法及存储介质. CN113380235A, 2021-09-10.[16] 易江燕, 陶建华, 田正坤, 傅睿博. 一种融合组合模型信息的语音鉴别模型压缩方法. CN113362814A, 2021-09-07.[17] 易江燕, 陶建华, 田正坤, 傅睿博. 一种环境对抗的鲁棒语音鉴别方法. CN113284486A, 2021-08-20.[18] 陶建华, 田正坤, 易江燕. 基于层级区分的生成音频检测系统. CN113284508A, 2021-08-20.[19] 陶建华, 田正坤, 易江燕. 一种融合多模态语义不变性的语音识别文本增强系统. CN113270086A, 2021-08-17.[20] 陶建华, 田正坤, 易江燕. 一种流式和非流式混合语音识别系统及流式语音识别方法. CN113257248A, 2021-08-13.[21] 梁山, 聂帅, 陶建华, 易江燕. 基于相位偏移检测的数字音频篡改取证方法. CN: CN113178199A, 2021-07-27.[22] 易江燕. 基于音素时长特征的虚假语音检测方法及装置. ZL 202110841276.2, 2021-07-26.[23] 陶建华, 易江燕, 温正棋. 语音识别中的小数据语音声学建模方法. CN: CN108682417A, 2018-10-19.[24] 陶建华, 易江燕, 温正棋, 倪浩. 基于口音瓶颈特征的声学模型自适应方法. CN: CN106875942A, 2017-06-20.[25] 陶建华, 易江燕, 温正棋, 刘斌. 语音识别中的正则化口音自适应方法. CN: CN106531157A, 2017-03-22.
出版信息
发表论文
[1] Wang, Tao, Fu, Ruibo, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi. NeuralDPS: Neural Deterministic Plus Stochastic Model With Multiband Excitation for Noise-Controllable Waveform Generation. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2022, 30: 865-878, http://dx.doi.org/10.1109/TASLP.2022.3140480.[2] Yi, Jiangyan, Fu, Ruibo, Tao, Jianhua, Nie, Shuai, Ma, Haoxin, Wang, Chenglong, Wang, Tao, Tian, Zhengkun, Bai, Ye, Fan, Cunhang, Liang, Shan, Wang, Shiming, Zhang, Shuai, Yan, Xinrui, Xu, Le, Wen, Zhengqi, Li, Haizhou. ADD 2022: the First Audio Deep Synthesis Detection Challenge. 2022, http://arxiv.org/abs/2202.08433.[3] Tian, Zhengkun, Yi, Jiangyan, Tao, Jianhua, Zhang, Shuai, Wen, Zhengqi. Hybrid Autoregressive and Non-Autoregressive Transformer Models for Speech Recognition. IEEE SIGNAL PROCESSING LETTERS[J]. 2022, 29: 762-766, http://dx.doi.org/10.1109/LSP.2022.3152128.[4] Tao wang, Jiangyan Yi, ruibo fu, Tao, Jianhua. Context-Aware Mask Prediction Network for End-to-End Text-Based Speech Editing. ICASSP[J]. 2022, [5] 汪涛, 易江燕, 傅睿博, 陶建华, 温正棋. CampNet: Context-Aware Mask Prediction for End-to-End Text-Based Speech Editing. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2022, [6] Fan, Cunhang, Yi, Jiangyan, Tao, Jianhua, Tian, Zhengkun, Liu, Bin, Wen, Zhengqi. Gated Recurrent Fusion With Joint Training Framework for Robust End-to-End Speech Recognition. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2021, 29(29): 198-209, http://dx.doi.org/10.1109/TASLP.2020.3039600.[7] Wang, Tao, Fu, Ruibo, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi, Qiang, Chunyu, Wang, Shiming, IEEE. PROSODY AND VOICE FACTORIZATION FOR FEW-SHOT SPEAKER ADAPTATION IN THE CHALLENGE M2VOC 2021. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)null. 2021, 8603-8607, [8] 王成龙, 易江燕. 基于全局-时频注意力网络的语音伪造检测. 计算机研究与发展[J]. 2021, [9] Bai, Ye, Yi, Jiangyan, Tao, Jianhua, Tian, Zhengkun, Wen, Zhengqi, Zhang, Shuai. Fast End-to-End Speech Recognition Via Non-Autoregressive Models and Cross-Modal Knowledge Transferring From BERT. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2021, 29(29): 1897-1911, [10] Fu, Ruibo, Tao, Jianhua, Wen, Zhengqi, Yi, Jiangyan, Wang, Tao, Qiang, Chunyu, IEEE. BI-LEVEL STYLE AND PROSODY DECOUPLING MODELING FOR PERSONALIZED END-TO-END SPEECH SYNTHESIS. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)null. 2021, 6568-6572, [11] Yi, Jiangyan, Bai, Ye, Tao, Jianhua, Tian, Zhengkun, Wang, Chenglong, Wang, Tao, Fu, Ruibo. Half-Truth: A Partially Fake Audio Detection Dataset. 2021, http://arxiv.org/abs/2104.03617.[12] Zhang, Shuai, Yi, Jiangyan, Tian, Zhengkun, Bai, Ye, Tao, Jianhua, Wen, Zhengqi, IEEE. Decoupling_Pronunciation_and_Language_for_End-to-End_Code-Switching_Automatic_Speech_Recognition. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)null. 2021, 6249-6253, [13] Wang, Shiming, Ling, Zhenhua, Fu, Ruibo, Yi, Jiangyan, Tao, Jianhua, IEEE. PATNET : A PHONEME-LEVEL AUTOREGRESSIVE TRANSFORMER NETWORK FOR SPEECH SYNTHESIS. 2021 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP 2021)null. 2021, 5684-5688, [14] Bai, Ye, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi, Tian, Zhengkun, Zhang, Shuai. Integrating Knowledge Into End-to-End Speech Recognition From External Text-Only Data. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2021, 29: 1340-1351, http://dx.doi.org/10.1109/TASLP.2021.3066274.[15] Ma Haoxin, Yi Jiangyan, Tao Jianhua, Bai Ye, Tian Zhengkun, Wang Chenglong. Continual Learning for Fake Audio Detection. 2021, http://arxiv.org/abs/2104.07286.[16] Tao Wang, Ruibo Fu, Jiangyan Yi, Tao Jianhua. Non-autoregressive End-to-End TTS with Coarse-to-Fine Decoding. INTERSPEECH 2020[J]. 2020, [17] Tian Zhengkun, Yi, Jiangyan, Bai Ye, Tao Jianhua, Zhang Shuai, Wen Zhengqi. Synchronous Transformers for End-to-End Speech Recognition. 2020, http://arxiv.org/abs/1912.02958.[18] Fan, Cunhang, Tao, Jianhua, Liu, Bin, Yi, Jiangyan, Wen, Zhengqi, Liu, Xuefei. End-to-End Post-Filter for Speech Separation With Deep Attention Fusion Features. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2020, 28: 1303-1314, http://dx.doi.org/10.1109/TASLP.2020.2982029.[19] Fu, Ruibo, Tao, Jianhua, Wen, Zhengqi, Yi, Jiangyan, Wang, Tao, IEEE. FOCUSING ON ATTENTION: PROSODY TRANSFER AND ADAPTATIVE OPTIMIZATION STRATEGY FOR MULTI-SPEAKER END-TO-END SPEECH SYNTHESIS. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSINGnull. 2020, 6709-6713, [20] Tian, Zhengkun, Yi, Jiangyan, Bai, Ye, Tao, Jianhua, Zhang, Shuai, Wen, Zhengqi, IEEE. SYNCHRONOUS TRANSFORMERS FOR END-TO-END SPEECH RECOGNITION. 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSINGnull. 2020, 7884-7888, [21] Tao Wang, Ruibo Fu, Yi Jiangyan, Tao Jianhua. Spoken Content and Voice Factorization for Few-shot Speaker Adaptation. INTERSPEECH 2020[J]. 2020, [22] Yi, Jiangyan, Tao, Jianhua. Focal Loss for Punctuation Prediction. INTERSPEECH[J]. 2020, [23] Tao Jianhua, Tao Wang, ruibo fu, Yi, Jiangyan. Dynamic Speaker Representations Adjustment and Decoder Factorization for Speaker Adaptation in End-to-End Speech Synthesis. INTERSPEECH[J]. 2020, [24] 陶建华, 傅睿博, 易江燕, 王成龙, 汪涛. 语音伪造与鉴伪的发展与挑战. 信息安全学报[J]. 2020, 5(2): 28-38, http://lib.cqvip.com/Qikan/Article/Detail?id=7101732838.[25] Bai, Ye, Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi, Fan, Cunhang. A Public Chinese Dataset for Language Model Adaptation. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY[J]. 2020, 92(8): 839-851, https://www.webofscience.com/wos/woscc/full-record/WOS:000490530600001.[26] Fan Cunhang, Tao Jianhua, Liu Bin, Yi Jiangyan, Wen Zhengqi. Joint Training for Simultaneous Speech Denoising and Dereverberation with Deep Embedding Representations. INTERSPEECH[J]. 2020, [27] Tao Wang, Tao Jianhua, ruibo fu, Yi, Jiangyan. Bi-level Speaker Supervision for One-shot Speech Synthesis. INTERSPEECH 2020[J]. 2020, [28] ruibo fu, Tao Jianhua, Zhengqi Wen, Yi, Jiangyan. Dynamic Soft Windowing and Language Dependent Style Token for Code-Switching End-to-End Speech Synthesis. INTERSPEECH[J]. 2020, [29] Fan, Cunhang, Yi, Jiangyan, Tao, Jianhua, Tian, Zhengkun, Liu, Bin, Wen, Zhengqi. Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition. 2020, http://arxiv.org/abs/2011.04249.[30] Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang. Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition. 2020, http://arxiv.org/abs/2005.04862.[31] Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen. Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition. 2020, http://arxiv.org/abs/2005.07903.[32] Fan Cunhang, Tao Jianhua, Liu Bin, Yi Jiangyan, Wen Zhengqi. Gated Recurrent Fusion of Spatial and Spectral Features for Multi-channel Speech Separation with Deep Embedding Representations. Interspeech[J]. 2020, [33] Yi, Jiangyan, Tao, Jianhua, Bai, Ye. LANGUAGE-INVARIANT BOTTLENECK FEATURES FROM ADVERSARIAL END-TO-END ACOUSTIC MODELS FOR LOW RESOURCE SPEECH RECOGNITION. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)[J]. 2019, 6071-6075, [34] Fan Cunhang, Liu Bin, Tao Jianhua, Yi Jiangyan, Wen Zhengqi. Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features. interspeech2019null. 2019, http://arxiv.org/abs/1907.09884.[35] Zheng, Yibin, Tao, Jianhua, Wen, Zhengqi, Yi, Jiangyan. Forward-Backward Decoding Sequence for Regularizing End-to-End TTS. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2019, 27(12): 2067-2079, http://dx.doi.org/10.1109/TASLP.2019.2935807.[36] Ye Bai, Jiangyan Yi, Tao Jianhua. A Time Delay Neural Network with Shared Weight Self-Attention for Small-Footprint Keyword Spotting. INTERSPEECH[J]. 2019, [37] Ye Bai, Jiangyan Yi, Jianhua Tao, Zhengkun Tian, Zhengqi Wen. Learn Spelling from Teachers: Transferring Knowledge from Language Models to Sequence-to-Sequence Speech Recognition. 2019, http://arxiv.org/abs/1907.06017.[38] Zhengkun Tian, Jiangyan Yi, Jianhua Tao, Ye Bai, Zhengqi Wen. Self-Attention Transducers for End-to-End Speech Recognition. 2019, http://arxiv.org/abs/1909.13037.[39] 范存航, 刘斌, 陶建华, 温正棋, 易江燕. 一种基于卷积神经网络的端到端语音分离方法. 信号处理[J]. 2019, 35(4): 542-548, http://lib.cqvip.com/Qikan/Article/Detail?id=7002150057.[40] Jiangyan Yi, Jianhua Tao, Zhengqi Wen, Ye Bai. Language-Adversarial Transfer Learning for Low-Resource Speech Recognition. IEEE / ACM Transactions on Audio, Speech and Language Processing (TASLP). 2019, 27(3): http://kns.cnki.net/KCMS/detail/detail.aspx?QueryID=0&CurRec=1&recid=&FileName=SJCM0D850DF295CFFB91AA79EEA520C4B2A5&DbName=WWMERGEJ01&DbCode=WWME&yx=&pr=&URLID=&bsm=.[41] Yi, Jiangyan, Tao, Jianhua. SELF-ATTENTION BASED MODEL FOR PUNCTUATION PREDICTION USING WORD AND SPEECH EMBEDDINGS. 2019 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)[J]. 2019, 7270-7274, [42] Yi, Jiangyan, Tao, Jianhua, Wen, Zhengqi, Bai, Ye. Language-Adversarial Transfer Learning for Low-Resource Speech Recognition. IEEE-ACM TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING[J]. 2019, 27(3): 621-630, http://ir.ia.ac.cn/handle/173211/25289.[43] 易江燕, 陶建华, 刘斌, 温正棋. 基于迁移学习的噪声鲁棒语音识别声学建模. 清华大学学报:自然科学版[J]. 2018, 58(1): 55-60, http://lib.cqvip.com/Qikan/Article/Detail?id=674347161.[44] Yi, Jiangyan, Wen, Zhengqi, Tao, Jianhua, Ni, Hao, Liu, Bin. CTC Regularized Model Adaptation for Improving LSTM RNN Based Multi-Accent Mandarin Speech Recognition. JOURNAL OF SIGNAL PROCESSING SYSTEMS FOR SIGNAL IMAGE AND VIDEO TECHNOLOGY[J]. 2018, 90(7): 985-997, http://dx.doi.org/10.1007/s11265-017-1291-1.[45] Huang, Jian, Tao, Jianhua, Li, Ya, Lian, Zheng, Yi, Jiangyan. End-to-End Continuous Emotion Recognition from Video Using 3D Convlstm Networks. ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processingnull. 2018, http://ir.ia.ac.cn/handle/173211/22375.[46] Yi Jiangyan, Tao Jianhua, Wen Zhengqi, Bai Ye, IEEE. ADVERSARIAL MULTILINGUAL TRAINING FOR LOW-RESOURCE SPEECH RECOGNITION. 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)null. 2018, 4899-4903, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000446384605014.[47] Huang,Jian, Tao, Jianhua, Lian, Zheng, Li, Ya, Yi, Jiangyan, Niu, Mingyue. Speech Emotion Recognition Using Semi-supervised Learning with Ladder Networks. 2018 1st Asian Conference on Affective Computing and Intelligent Interaction, ACII Asia 2018null. 2018, http://ir.ia.ac.cn/handle/173211/22377.[48] 陶建华, 易江燕, 温正棋, 刘斌. 基于迁移学习的鲁棒语音识别声学建模方法. 清华大学学报[J]. 2018, 58(1): 55-60, http://ir.ia.ac.cn/handle/173211/19898.[49] Yi Jiangyan, NiHao, Tao Jianhua, Wen Zhengqi. Acoustic Model Compression with Knowledge Transfer. 2017, http://ir.ia.ac.cn/handle/173211/19888.[50] Jiangyan Yi, Tao, Jianhua, Zhengqi Wen, Hao Ni. Distilling Knowledge from an Ensemble of Models for Punctuation Prediction. INTERSPEECH 2017[J]. 2017, [51] 乌日其其格, 陶建华, 白音门德, 易江燕, 温正棋, 白烨. 面向语音识别的蒙古语标准音语音库的建立. 2017, http://ir.ia.ac.cn/handle/173211/19916.[52] Bai Ye, Yi Jiangyan, Ni Hao, Wen Zhengqi, Liu Bin, Li Ya, Tao Jianhua. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. 2016, http://ir.ia.ac.cn/handle/173211/19742.[53] Yi Jiangyan, Tao Jianhua, Ni Hao, Wen Zhengqi. Improving BLSTM RNN Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features. 2016, http://ir.ia.ac.cn/handle/173211/19881.[54] Yi, Jiangyan, Ni, Hao, Wen, Zhengqi, Tao, Jianhua, IEEE. Improving BLSTM RNN Based Mandarin Speech Recognition Using Accent Dependent Bottleneck Features. 2016 ASIA-PACIFIC SIGNAL AND INFORMATION PROCESSING ASSOCIATION ANNUAL SUMMIT AND CONFERENCE (APSIPA)null. 2016, [55] Bai, Ye, Yi, Jiangyan, Ni, Hao, Wen, Zhengqi, Liu, Bin, Li, Ya, Tao, Jianhua, Lee, T, Xie, L, Dang, J, Wang, HM, Wei, J, Feng, H, Hou, Q, Wei, Y. End-to-end Keywords Spotting Based on Connectionist Temporal Classification for Mandarin. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)null. 2016, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000405610900098.[56] Yi, Jiangyan, Ni, Hao, Wen, Zhengqi, Liu, Bin, Tao, Jianhua, Lee, T, Xie, L, Dang, J, Wang, HM, Wei, J, Feng, H, Hou, Q, Wei, Y. CTC Regularized Model Adaptation for Improving LSTM RNN Based MultiAccent Mandarin Speech Recognition. 2016 10TH INTERNATIONAL SYMPOSIUM ON CHINESE SPOKEN LANGUAGE PROCESSING (ISCSLP)[J]. 2016, http://apps.webofknowledge.com/CitedFullRecord.do?product=UA&colName=WOS&SID=5CCFccWmJJRAuMzNPjj&search_mode=CitedFullRecord&isickref=WOS:000405610900058.
科研活动
科研项目
( 1 ) 面向小数据语音建模的跨语言迁移学习研究, 负责人, 国家任务, 2020-01--2022-12( 2 ) 基于小数据的强噪声语音识别声学模型研究, 负责人, 中国科学院计划, 2019-01--2021-12( 3 ) 网络多媒体鉴伪关键技术研究, 负责人, 国家任务, 2020-07--2023-06( 4 ) 语音识别及语音合成技术研究合作项目, 负责人, 企业委托, 2021-03--2022-03( 5 ) 大数据多模态交互协同关键技术, 参与, 国家任务, 2017-10--2021-09( 6 ) 大数据分析关键技术及应用, 参与, 中国科学院计划, 2020-01--2020-12( 7 ) 基于多语种多通道融合的XXX, 参与, 国家任务, 2018-02--2021-12( 8 ) 语音关键技术, 负责人, 企业委托, 2021-06--2022-09( 9 ) 基于连续学习的音视频分析关键技术, 负责人, 国家任务, 2021-10--2024-09
指导学生
现指导学生
顾浩 硕士研究生 081104-模式识别与智能系统