Last updated: August 28, 2019

The explosion of data being generated in Bioinformatics and allied fields produce real challenges to researchers. Mining useful biological information from this data and analysing it has become an important task since it give insights into the underlying functionalities and cellular processes.

The highly complex nature of biological data demands development of good quality pattern recognition algorithms to ensure reliable output. Its applications can be seen widely in almost all areas in bioinformatics such as genomics, proteomics and transcriptomics. Classification and clustering are applied in huge data in areas like gene expression analysis, pathway analysis, gene regulatory networks, disease modelling and identification of biomarkers. The advancement in machine learning techniques, algorithms and tools help researchers in solving various pattern recognition problems effectively. In this review, we discuss number of such problems in the field of bioinformatics and computational biology and various techniques used. Key words: Pattern recognition, computational biology, classification, Artificial Neural Network, clustering and deep learning

Introduction

Pattern recognition is one of the key strategies by which brain performs analogical reasoning of many life problems based on the information accumulated through its sense organs.

  In a general perspective, pattern recognition involves receiving an input data, analysing it for similar, specific, regular patterns based on which meaningful interpretation is made. Pattern recognition can be employed to make computers execute tasks like humans, even faster and more accurately, by figuring out actual problems and using a collection of mathematical, statistical, heuristic and inductive techniques to find solutions. When a computer program is trained to learn the pattern and categorize the data, then it is machine learning or machine pattern recognition. Solutions based on pattern recognition may be employed almost everywhere and anywhere – medicine, health and pharma industry, agriculture, financial markets, forensic investigations. During the last few decades, enormous amount of biological data in different formats has been generated using advanced technologies. Moreover, number of databases are also developed by researchers, which accumulates huge molecular data. Consequently, demand for new computational techniques is also increased for better processing of this data.

  Mining the useful information as well as the biological interpretation has become one of the most impressive bioinformatics problems. The different processes in the nature are analysed and many nature inspired algorithms have been derived for computational pattern recognition. Studies reveal that biomolecules (DNA, RNA, proteins) in sequence form and structural form contain different patterns that are functionally relevant. These patterns also known as motifs are very much involved in the characterization of these biomolecules. In proteins, patterns may also occur for which the elements found in secondary structure.

Helix turn helix is a widely studied motif that falls in the category of DNA binding motifs. Detection techniques of such patterns make use of the structural features as well. Recent studies in drug discovery show that proline rich linear motifs are excellent mediators for intermolecular interactions seen in many faces of immune response activities, and hence these motifs are considered as drug targets in immune mediated diseases. Alignment method, local search, heuristic approach etc. are a few among the applied techniques for this pattern identification task.

Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD) systems. CAD describes a procedure that supports the doctor's interpretations and findings. Detection of patterns demands computational techniques that produce optimum results. 

