Type: Process Essays
Sample donated: Shannon Moss
Last updated: March 22, 2019
The main approaches which arefollowed for mining of graph data are • Mining frequent sub graphs • Classification • Clustering 2.1.
1 MiningFrequent Sub Graphs: Frequent sub graphs, are subgraphs that occur frequently in data represented as graphs. Frequent Sub graphMining (FSM) is the graph mining essence. To extract all the frequent sub graphs,in a given data set, whose occurrence counts are above a specified threshold isthe objective of FSM. Other than the research activity associated with FSM isalso reflected in its many areas of its application is the development of FSM.They are useful for characterizing graph sets, discriminating different groupsof graphs, classifying and clustering graph sets, building graph indices andfacilitating similarity search in graph data bases2.
A substructure may be differentstructural forms such as trees, graphs or lattices, which may be combined withitem sets or subsequences. If a substructure occurs frequently, it is called a(frequent) structured pattern. Although graph mining may include miningfrequent sub graph patterns, clustering, graph classification and other analysistasks.2.1.2 Classification: Classification is a general process relatedto categorization, theprocess in which ideas and objects are differentiated, recognized andunderstood.
A classificationsystem is an approach to manage classification. Classificationis the method of discovering a representation that demonstrates anddistinguishes data classes or ideas, for the inspiration to use the model topredict the class of objects whose class label is unknown. The model is derivedis based on the analysis of a set of training data. There are various othermethods for constructing classification models, such as Naive Bayesianclassification, k-nearest neighbor classification and support vector machines.2.1.
3 Clustering: Clustering or Cluster analysis is the task of grouping a set ofobjects in such a way that objects in the same group are more similar (in somesense or another) to each other than to those in other groups. Cluster is a group of objectsthat belongs to the same class. In other words, similar objects are grouped inone cluster and dissimilarobjects are grouped in another cluster.This problem is challenging because of the need to match thestructures of the underlying graphs, and use these structures for clusteringpurposes.2.
2 Tools: There are several tools available forgraph mining. Some of them are given here. 2.2.
1 Cytoscape: Cytoscape is open source softwareplatform for visualizing molecularinteraction networks and integrating with gene expression profiles and other state data.Additional features are available. Plug in are available for molecularprofiling analyses, new layouts, additional file format support , networks andconnection with databases and searching in large networks. Plug in may bedeveloped using the Cytoscape open Java softwarearchitecture by anyone and plug in community development is encouraged. It isused for Cytoscape is mostcommonly used for biological applications, it is agnostic in terms of usage.Cytoscape can be used to visualize and analyze network graphs of any kindinvolving nodes and edges (e.
g., social networks). A key aspect of the softwarearchitecture of Cytoscape is the use of plug in for specialized features.
Plug inare developed by the greater user community and core developers.2.2.2 Gephi: Gephi isan open-source software for network analysis and visualization. It helps dataanalysts to intuitively reveal patterns and trends, highlight outliers andtells stories with their data. It uses a 3D render engine to display largegraphs in real-time and to speed up the exploration. Classicmetrics of social network analysis, such as node degree or betweens centralitymeasures, can be computed and used in the visualization as well. The networkcan also be altered based on attributes.
2.2.3 Graph Insight: Graph Insight is a visualization software that lets you explore graphdata through high quality interactive representations.
knowledge extraction anddata exploration from graphs is of great interest nowadays. Knowledge isdisseminated in social networks, and services are powered by cloud computingplatforms. Humans are extremely good in identifying outliers andpatterns. Graph Insight is useful for interacting visually with the data cangive us a better intuition and higher confidence on the field.
2.2.4 Network X: Network X is a Python package for the creation, dynamics manipulation,study of the structure, and functions of complex networks.Flexibility ideal for representing networks found in many different fields.2.2.5 Social Networks Visualizer: Social Network Visualizer (SocNetV) is a cross-platform, user-friendly free softwareapplication for social network analysisand visualization. Edit actors and ties through point-and-click, analyze graphand social network properties,produce beautiful HTML reports and embed visualization layouts to the network.
2.2.6 Knime: KNIME, the Konstanz Information Miner, is an open sourcedata analytics, reporting and integration platform. KNIME integrates variouscomponents for machine learning and data mining through its modular datapipelining concept.3 .
Issues :• Scalability• Data Ownership and Distribution • Dimensionality • Privacy Preservation • Streaming Data• Complex and Heterogeneous Data4. Conclusion: Inthis paper we briefly discuss about the graph mining techniques, tools andissues from its initiation to the upcoming research. This paper provides a newperspective of a researcher to overcome the challenges in methods, data andother issues of graph mining in social network.5. References:1.
Nettleton DF. Data mining of social networks represented as graphs. Elsevier.2013; 7:1–34.2.
Du H.Data Mining Techniques and Applications an Introduction, 1st Edition. CengageLearning Edition; 2010.3. Han J,Kamber M. Data Mining: Concept and Techniques, 2nd Edition.
Morgan Kauffmann;2006.4. ChenMS, Han J, Yu PS. Data mining: an overview from database perspective. IEEETransactions on Knowledge and Data Engineering. 1999 Dec; 8(6):866–83.