РОЗПІЗНАВАННЯ ТЕКСТОВОЇ ТА ВІЗУАЛЬНОЇ ІНФОРМАЦІЇ ЗА ДОПОМОГОЮ ГІБРИДНОЇ МОДЕЛІ SVD І CNN

Іван Пелещак

doi:10.30890/2567-5273.2025-39-02-049

Автор(и)

Іван Пелещак Національний університет «Львівська політехніка» https://orcid.org/0000-0002-7481-8628

DOI:

https://doi.org/10.30890/2567-5273.2025-39-02-049

Ключові слова:

розпізнавання інформації, згорткова нейронна мережа, сингулярний розклад, гібридна модель, мультиформатні дані.

Анотація

У статті розглядається розпізнавання текстової та візуальної інформації за допомогою гібридної моделі, що поєднує метод сингулярного розкладу (SVD) із CNN. Перший експеримент проведено на датасеті MNIST, де базова CNN продемонструвала точність близько 82,

Посилання

Tan, M., & Le, Q. V. (2021). EfficientNetV2: Smaller models and faster training. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2104.00298

Thi, T. C. (2023). Singular value decomposition and applications in data processing and artificial intelligence. HPU2 Journal of Science Natural Sciences and Technology, 2(3), 34–41. https://doi.org/10.56764/hpu2.jos.2023.2.3.34-41

Chen, W., Yang, Y., Tian, Z., Chen, Q., & Liu, J. (2024). A review of multimodal learning for text to images. Multimedia Tools and Applications. https://doi.org/10.1007/s11042-024-19117-8

Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., & Sutskever, I. (2021). Learning transferable visual models from natural language supervision. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2103.00020

Li, Q., Peng, H., Li, J., Xia, C., Yang, R., Sun, L., Yu, P. S., & He, L. (2020). A survey on text classification: From shallow to Deep learning. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2008.00364

Sidheekh, S. (2021). Learning neural networks on SVD boosted latent spaces for semantic classification. arXiv (Cornell University). https://doi.org/10.48550/arxiv.2101.00563

Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., & Parmar, M. (2024). A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57(4). https://doi.org/10.1007/s10462-024-10721-6

Santos, C. F. G. D., & Papa, J. P. (2022). Avoiding Overfitting: A survey on regularization methods for convolutional neural networks. ACM Computing Surveys, 54(10s), 1–25. https://doi.org/10.1145/3510413

Gao, J., Li, P., Chen, Z., & Zhang, J. (2020). A survey on deep learning for multimodal data fusion. Neural Computation, 32(5), 829–864. https://doi.org/10.1162/neco_a_01273

Jiao, T., Guo, C., Feng, X., Chen, Y., & Song, J. (2024). A comprehensive survey on Deep Learning Multi-Modal Fusion: Methods, Technologies and applications. Computers, Materials & Continua/Computers, Materials & Continua (Print), 80(1), 1–35. https://doi.org/10.32604/cmc.2024.053204

Hossain, M. S., Basak, N., Mollah, M. A., Nahiduzzaman, M., Ahsan, M., & Haider, J. (2025). Ensemble-based multiclass lung cancer classification using hybrid CNN-SVD feature extraction and selection method. PLoS ONE, 20(3), e0318219. https://doi.org/10.1371/journal.pone.0318219

MNIST Dataset. (2019). Kaggle. https://www.kaggle.com/datasets/hojjatk/mnist-dataset

PetFinder.my adoption prediction. (2019). Kaggle. https://www.kaggle.com/c/petfinder-adoption-prediction/data