Аудиовизуальный детектор голосовой активности на базе глубокой сверточной сети и обобщенной взаимной корреляции
Аннотация
Об авторах
Д. А. СуворовРоссия
Р. А. Жуков
Россия
Д. О. Тетерюков
Россия
С. Л. Зенкевич
Россия
Список литературы
1. RamHrez J., Gorriz J. M., Segura J. C. Voice activity detection. Fundamentals and speech recognition system robustness // Robust Speech Recognition and Understanding. Vienna: I-TECH Education and Publishing. 2007. P. 1-22.
2. Woo K., Yang T., Park K., Lee C. Robust voice activity detection algorithm for estimating noise spectrum // Electronics Letters. 2000. Vol. 36, N. 2. P. 180-181.
3. Mousazadeh S., Cohen I. Voice activity detection in presence of transient noise using spectral clustering // IEEE Trans. Audio, Speech, Language Process. 2013. Vol. 21, N. 6. P. 1261-1271.
4. Obuchi Y. Framewise speech-nonspeech classification by neural networks for voice activity detection with statistical noise suppression // IEEE International Conference on Acoustics, Speech and Signal Processing. Shanghai, 2016, March. P. 5715-5719.
5. Montazzolli S., Jung C. R., Gelb D. Audiovisual voice activity detection using off-the-shelf cameras // IEEE International Conference on Image Processing. Quebec, 2015, September. P. 3886-3890.
6. Ying D., Yan Y., Dang J., Soong F. K. Voice activity detection based on an unsupervised learning framework // IEEE Trans. Audio, Speech, Language Process. 2011. Vol. 19, N. 8. P. 2624-2633.
7. Popovic B., Pakoci E., Pekar D. Advanced Voice Activity Detection on Mobile Phones by Using Microphone Array and Phoneme-Specific Gaussian Mixture Models // SISY. Subotica, 2016, August. P. 45-50.
8. Grondin F., Michaud F. Noise Mask for TDOA Sound Source Localization of Speech on Mobile Robots in Noisy Environments // IEEE International Conference Robotics and Automation. Stockholm, 2016, May.
9. Tashev I., Mirsamadi S. DNN-based Causal Voice Activity Detector // Information Theory and Applications Workshop. San Diego, 2016, February.
10. Julier S., Uhlmann J. A new extension of the Kalman filter to nonlinear systems // 11th International Symposium on Aerospace/ Defense Sensing, Simulation and Controls. Vol. Multi- Sensor Fusion, Tracking and Resource Management II. Orlando, 1997.
11. King D. E. Max-Margin Object Detection // Cornell University Library. 31.12.15. URL: https://arxiv.org/pdf/1502.00046.pdf (дата обращения: 18.08.2017).
12. Kazemi V., Sullivan J. One Millisecond Face Alignment with an Ensemble of Regression Trees // IEEE Conference on Computer Vision and Pattern Recognition. Columbus, 2014, June.
13. Bradski G., Kaehler A. Learning OpenCV. Computer Vision with the OpenCV Library. Sebastopol: O'Reilly Media, 2008. P. 580.
14. Tashev I. Sound Capture and Processing. Practical Approaches. The City of New York: John Wiley & Sons, 2009. P. 365.
15. Суворов Д. А., Жуков Р. А. Устройство синхронного сбора данных с массива MEMS микрофонов с PDM интерфейсом. Патент России № 172596. 2017. Бюл. № 20.
Рецензия
Для цитирования:
Суворов Д.А., Жуков Р.А., Тетерюков Д.О., Зенкевич С.Л. Аудиовизуальный детектор голосовой активности на базе глубокой сверточной сети и обобщенной взаимной корреляции. Мехатроника, автоматизация, управление. 2018;19(1):53-57.
For citation:
Suvorov D.A., Zhukov R.A., Tsetserukov D.O., Zenkevich S.L. Audiovisual Voice Activity Detector Based on Deep Convolutional Neural Network and Generalized Cross-Correlation. Mekhatronika, Avtomatizatsiya, Upravlenie. 2018;19(1):53-57. (In Russ.)