Aljohani, N. F., & Jaha, E. S. (2023). Visual Lip-Reading for Quranic Arabic Alphabets and Words Using Deep Learning. Computer Systems Science and Engineering, 46(3), 3037–3058. https://doi.org/10.32604/csse.2023.037113
DOI: https://doi.org/10.32604/csse.2023.037113
Bai, Z., Li, Z., Li, Z., Song, Y., Gao, Q., & Mao, Z. (2023). Domain-Adaptive Emotion Recognition Based on Horizontal Vertical Flow Representation of EEG Signals. IEEE Access, 11, 55023–55034. https://doi.org/10.1109/ACCESS.2023.3270977
DOI: https://doi.org/10.1109/ACCESS.2023.3270977
Benhafid, Z., Selouani, S. A., Amrouche, A., & Sidi Yakoub, M. (2025). Attentive Context-Aware Deep Speaker Representations for Voice Biometrics in Adverse Conditions. Circuits, Systems, and Signal Processing, 44(1), 534–555. https://doi.org/10.1007/s00034-024-02854-4
DOI: https://doi.org/10.1007/s00034-024-02854-4
Chakhtouna, A., Sekkate, S., & Abdellah, A. (2024). Modeling Speech Emotion Recognition via ImageBind representations. Procedia Computer Science, 236, 428–435. https://doi.org/10.1016/j.procs.2024.05.050
DOI: https://doi.org/10.1016/j.procs.2024.05.050
Cheng, T., Curley, M., & Barmettler, A. (2022). Skin Color Representation in Ophthalmology Textbooks. Medical Science Educator, 32(5), 1143–1147. https://doi.org/10.1007/s40670-022-01636-4
DOI: https://doi.org/10.1007/s40670-022-01636-4
Chernyak, B. R., Bradlow, A. R., Keshet, J., & Goldrick, M. A. (2024). A perceptual similarity space for speech based on self-supervised speech representations. Journal of the Acoustical Society of America, 155(6), 3915–3929. https://doi.org/10.1121/10.0026358
DOI: https://doi.org/10.1121/10.0026358
Cho, S., & Wee, K. (2025). Multi-Noise Representation Learning for Robust Speaker Recognition. IEEE Signal Processing Letters, 32, 681–685. https://doi.org/10.1109/LSP.2025.3530879
DOI: https://doi.org/10.1109/LSP.2025.3530879
Farhana, I., Shin, J., Mahmood, S., Rabiul Islam, M. D., & Molla, M. K. I. (2023). Emotion Recognition Using Narrowband Spatial Features of Electroencephalography. IEEE Access, 11, 44019–44033. https://doi.org/10.1109/ACCESS.2023.3270177
DOI: https://doi.org/10.1109/ACCESS.2023.3270177
Fu, K., Du, C., Wang, S., & He, H. (2024). Improved Video Emotion Recognition With Alignment of CNN and Human Brain Representations. IEEE Transactions on Affective Computing, 15(3), 1026–1040. https://doi.org/10.1109/TAFFC.2023.3316173
DOI: https://doi.org/10.1109/TAFFC.2023.3316173
Gharib, S., Tran, M., Luong, D., Drossos, K., & Virtanen, T. (2024). Adversarial Representation Learning for Robust Privacy Preservation in Audio. IEEE Open Journal of Signal Processing, 5, 294–302. https://doi.org/10.1109/OJSP.2023.3349113
DOI: https://doi.org/10.1109/OJSP.2023.3349113
He, S., Xue, W., Yang, Y., Zhang, H., Pan, J., & Zhang, X. (2025). Enhancing target speaker extraction with Hierarchical Speaker Representation Learning. Neural Networks, 188. https://doi.org/10.1016/j.neunet.2025.107388
DOI: https://doi.org/10.1016/j.neunet.2025.107388
Hu, Y., Chen, C., Zhu, Q., & Chng, E. (2024). Wav2code: Restore Clean Speech Representations via Codebook Lookup for Noise-Robust ASR. IEEE/ACM Transactions on Audio Speech and Language Processing, 32, 1145–1156. https://doi.org/10.1109/TASLP.2023.3332545
DOI: https://doi.org/10.1109/TASLP.2023.3332545
Ichikawa, R., Zhang, B., & Lim, H. (2023). Research on a Guide Dog Robot for Expressing Visual Environment by Voice; 視覚環境を音声で表現可能な盲導犬ロボットに関する研究. IEEJ Transactions on Electronics, Information and Systems, 143(5), 562–568. https://doi.org/10.1541/ieejeiss.143.562
DOI: https://doi.org/10.1541/ieejeiss.143.562
Ishaq, M. H., Khan, M., & Kwon, S. (2023). TC-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network. Computer Systems Science and Engineering, 46(3), 3355–3369. https://doi.org/10.32604/csse.2023.037373
DOI: https://doi.org/10.32604/csse.2023.037373
Karjigi, V., Roopa, S., & H․M․, C. M. (2024). Investigation of different time–frequency representations for detection of fricatives. International Journal of Speech Technology, 27(3), 599–611. https://doi.org/10.1007/s10772-024-10129-1
DOI: https://doi.org/10.1007/s10772-024-10129-1
Li, T., Fu, B., Wu, Z., & Liu, Y. (2023). EEG-Based Emotion Recognition Using Spatial-Temporal-Connective Features via Multi-Scale CNN. IEEE Access, 11, 41859–41867. https://doi.org/10.1109/ACCESS.2023.3270317
DOI: https://doi.org/10.1109/ACCESS.2023.3270317
Mai, S., Sun, Y., Zeng, Y., & Hu, H. (2023). Excavating multimodal correlation for representation learning. Information Fusion, 91, 542–555. https://doi.org/10.1016/j.inffus.2022.11.003
DOI: https://doi.org/10.1016/j.inffus.2022.11.003
Mao, K., Wang, Y., Ren, L., Zhang, J., Qiu, J., & Dai, G. (2023). Multi-branch feature learning based speech emotion recognition using SCAR-NET. Connection Science, 35(1). https://doi.org/10.1080/09540091.2023.2189217
DOI: https://doi.org/10.1080/09540091.2023.2189217
Padman, S. N., & Magare, D. B. (2023). Multi-modal speech emotion detection using optimised deep neural network classifier. Computer Methods in Biomechanics and Biomedical Engineering: Imaging and Visualization, 11(5), 2020–2038. https://doi.org/10.1080/21681163.2023.2212082
DOI: https://doi.org/10.1080/21681163.2023.2212082
Prain, V., Xu, L., & Speldewinde, C. A. (2023). Guiding Science and Mathematics Learning when Students Construct Representations. Research in Science Education, 53(2), 445–461. https://doi.org/10.1007/s11165-022-10063-9
DOI: https://doi.org/10.1007/s11165-022-10063-9
Puffay, C., Vanthornhout, J., Gillis, M., Accou, B., Van hamme, H., & Francart, T. (2023). Robust neural tracking of linguistic speech representations using a convolutional neural network. Journal of Neural Engineering, 20(4). https://doi.org/10.1088/1741-2552/acf1ce
DOI: https://doi.org/10.1088/1741-2552/acf1ce
Qiu, Y., Tian, H., Li, H., Chang, C. C., & Vasilakos, A. V. (2023). Separable Convolution Network With Dual-Stream Pyramid Enhanced Strategy for Speech Steganalysis. IEEE Transactions on Information Forensics and Security, 18, 2737–2750. https://doi.org/10.1109/TIFS.2023.3269640
DOI: https://doi.org/10.1109/TIFS.2023.3269640
Qu, L., Li, T., Weber, C., Pekarek-Rosin, T., Ren, F., & Wermter, S. (2024). Disentangling Prosody Representations With Unsupervised Speech Reconstruction. IEEE/ACM Transactions on Audio Speech and Language Processing, 32, 39–54. https://doi.org/10.1109/TASLP.2023.3320864
DOI: https://doi.org/10.1109/TASLP.2023.3320864
Salimpour, S., Tytler, R. W., Eriksson, U., & Fitzgerald, M. T. (2021). Cosmos visualized: Development of a qualitative framework for analyzing representations in cosmology education. Physical Review Physics Education Research, 17(1). https://doi.org/10.1103/PhysRevPhysEducRes.17.013104
DOI: https://doi.org/10.1103/PhysRevPhysEducRes.17.013104
Teixeira, F. S., Abad, A., Raj, B., & Trancoso, I. M. (2024). Privacy-Oriented Manipulation of Speaker Representations. IEEE Access, 12, 82949–82971. https://doi.org/10.1109/ACCESS.2024.3409067
DOI: https://doi.org/10.1109/ACCESS.2024.3409067
Wong, K., & Meng, H. M. L. (2023). Automatic Analyses of Dysarthric Speech based on Distinctive Features. APSIPA Transactions on Signal and Information Processing, 12(3). https://doi.org/10.1561/116.00000077
DOI: https://doi.org/10.1561/116.00000077
Xie, Y., Liang, R., Liang, Z., Zhao, X., & Zeng, W. (2023). Speech Emotion Recognition Using Multihead Attention in Both Time and Feature Dimensions. IEICE Transactions on Information and Systems, E106.D(5), 1098–1101. https://doi.org/10.1587/transinf.2022EDL8084
DOI: https://doi.org/10.1587/transinf.2022EDL8084
Xin, R., Miao, F., Cong, P., Zhang, F., Xin, Y., & Feng, X. (2023). Multiview Feature Fusion Attention Convolutional Recurrent Neural Networks for EEG-Based Emotion Recognition. Journal of Sensors, 2023. https://doi.org/10.1155/2023/9281230
DOI: https://doi.org/10.1155/2023/9281230
Xuejun, W. (2023). Application of sensor-based sound control principle in speech recognition technology. International Journal of System Assurance Engineering and Management. https://doi.org/10.1007/s13198-023-01939-8
DOI: https://doi.org/10.1007/s13198-023-01939-8
Yamamoto, K., Zhu, Y., Aoyama, T., Takeuchi, M., & Hasegawa, Y. (2023). Improvement in the Manipulability of Remote Touch Screens Based on Peri-Personal Space Transfer. IEEE Access, 11, 43962–43974. https://doi.org/10.1109/ACCESS.2023.3271003
DOI: https://doi.org/10.1109/ACCESS.2023.3271003
Yuan, K., Srivastav, V., Yu, T., Lavanchy, J. L., Marescaux, J. F., Mascagni, P., Navab, N., & Padoy, N. (2025). Learning multi-modal representations by watching hundreds of surgical video lectures. Medical Image Analysis, 105. https://doi.org/10.1016/j.media.2025.103644
DOI: https://doi.org/10.1016/j.media.2025.103644
Yusuf, B., Cernocký, J. “Honza,” & Saraçlar, M. (2023). End-to-End Open Vocabulary Keyword Search with Multilingual Neural Representations. IEEE/ACM Transactions on Audio Speech and Language Processing, 31, 3070–3080. https://doi.org/10.1109/TASLP.2023.3301239
DOI: https://doi.org/10.1109/TASLP.2023.3301239
Zhang, H., Li, H., Peng, G., Liu, Y., & Xu, D. (2023). Image Emotion Recognition via Fusion Multi-Level Representations; 多层次特征融合表征的图像情感识别. Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao/Journal of Computer-Aided Design and Computer Graphics, 35(10), 1566–1576. https://doi.org/10.3724/SP.J.1089.2023.19742
Zhou, Y., Wu, Z., Zhang, M., Tian, X., & Li, H. (2023). TTS-Guided Training for Accent Conversion Without Parallel Data. IEEE Signal Processing Letters, 30, 533–537. https://doi.org/10.1109/LSP.2023.3270079
DOI: https://doi.org/10.1109/LSP.2023.3270079
Zhuang, H., Xiao, Y., Liu, Q., Yu, B., Xiong, J., & Bao, L. (2021). Comparison of nature of science representations in five Chinese high school physics textbooks. International Journal of Science Education, 43(11), 1779–1798. https://doi.org/10.1080/09500693.2021.1933647
DOI: https://doi.org/10.1080/09500693.2021.1933647