Machine Learning Algorithms in the Detection of Pattern System using Algorithm of Textual Feature Analysis and Classification

Authors

  • Elangovan Guruva Reddy
  • Sujitha. V
  • V. Sasirekha
  • Prisca Mary J
  • V.R.R
  • T. Vengatesh

DOI:

https://doi.org/10.63682/jns.v14i14S.3430

Keywords:

Pattern Detection, Textual Data, Machine Learning, Text Classification, Feature Extraction

Abstract

For many applications, such as sentiment analysis, topic modelling, and information retrieval, pattern recognition in textual data is crucial. In order to find and classify patterns in textual data, this work explores the use of machine learning techniques for in-depth textual feature analysis. Data is first acquired from a variety of sources, such as reviews, articles, and social media. Text pre processing methods including cleaning, tokenization, and lemmatization are used to get the data ready for analysis. Feature extraction methods like Bag of Words (BoW), Term Frequency-Inverse Document Frequency (TF-IDF), and word embeddings like Word2Vec and BERT are used to convert text into numerical representations that capture semantic value. Feature selection techniques that reduce dimensionality and improve model performance, such as Chi-Square and Mutual Information, are then used to identify the most significant features. Numerous machine learning techniques are assessed for classification, including Support Vector Machines (SVM), Transformers, Random Forests, Naive Bayes, and Recurrent Neural Networks (RNNs). These algorithms are tested and trained on split datasets to ensure their robustness and dependability. The models' efficacy is assessed using performance indicators like F1-score, recall, accuracy, and precision. The proposed system is suitable for usage in real-world scenarios due to its high accuracy and scalability. This study shows how textual qualities can be evaluated and categorized using machine learning. It also demonstrates how these technologies can be applied to enhance pattern identification and interpretation in vast volumes of textual data, producing valuable insights and supporting informed decision-making across a range of industries

Downloads

Download data is not yet available.

Metrics

Metrics Loading ...

References

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). "Latent Dirichlet Allocation."Journal of Machine Learning Research, 3, 993-1022.

Blei, D., & Lafferty, J. (2006). Correlated topic models. Advances in neural information processing systems, 18, 147.

Breiman, L. (2001). Random forests. Machine learning, 45, 5-32.

Crawford, K., & Calo, R. (2016). There is a blind spot in AI research. Nature, 538(7625), 311-313.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding."NAACL-HLT.

Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

Jiang, X., Stockwell, B. R., & Conrad, M. (2021). Ferroptosis: mechanisms, biology and role in disease. Nature reviews Molecular cell biology, 22(4), 266-282.

Joachims, T. (1998). "Text Categorization with Support Vector Machines: Learning with Many Relevant Features." ECML.

Liu, L., Li, Y., Li, S., Hu, N., He, Y., Pong, R., ... & Law, M. (2012). Comparison of next‐generation sequencing systems. BioMed research international, 2012(1), 251364.

Lundberg, S. M., & Lee, S. I. (2017). A unified approach to interpreting model predictions. Advances in neural information processing systems, 30.

MacQueen, J. (1967, January). Some methods for classification and analysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics (Vol. 5, pp. 281-298). University of California press.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). "Efficient Estimation of Word Representations in Vector Space."arXiv preprint arXiv:1301.3781.

Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques. arXiv preprint cs/0205070.

Pennington, J., Socher, R., & Manning, C. (2014). "GloVe: Global Vectors for Word Representation."EMNLP.

Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training.

Ribeiro, M. T., Singh, S., & Guestrin, C. (2016). Model-agnostic interpretability of machine learning. arXiv preprint arXiv:1606.05386.

Sahin, M., & Francillon, A. (2021, February). Understanding and Detecting International Revenue Share Fraud. In NDSS.

Sparck Jones, K. (1972), "A Statistical Interpretation of Term Specificity and its Application in Retrieval", Journal of Documentation, Vol. 28 No. 1, pp. 11-21.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). Attention is all you need. Advances in neural information processing systems, 30.

Viswanathan R, T.Edison, D.N. Kumar "User Item Recommendation System Using Machine learning International Journal of Research in advent Technology" (Ncrcest2019) E-ISSN: 2321-9637.

Wang, Y. (2024). Research on the TF–IDF algorithm combined with semantics for automatic extraction of keywords from network news texts. Journal of Intelligent Systems, 33(1), 20230300.

Downloads

Published

2025-04-10

How to Cite

1.
Reddy EG, V S, Sasirekha V, Mary J P, V.R.R V, Vengatesh T. Machine Learning Algorithms in the Detection of Pattern System using Algorithm of Textual Feature Analysis and Classification. J Neonatal Surg [Internet]. 2025Apr.10 [cited 2025May21];14(14S):66-74. Available from: https://jneonatalsurg.com/index.php/jns/article/view/3430

Most read articles by the same author(s)

1 2 > >>