Textual Data Mining For Knowledge Discovery and Data Classification: A Comparative Study

Nadeem Ur-Rahman

Abstract


Business Intelligence solutions are key to enable industrial organisations (either manufacturing or construction) to remain competitive in the market. These solutions are achieved through analysis of data which is collected, retrieved and re-used for prediction and classification purposes. However many sources of industrial data are not being fully utilised to improve the business processes of the associated industry. It is generally left to the decision makers or managers within a company to take effective decisions based on the information available throughout product design and manufacture or from the operation of business or production processes. Substantial efforts and energy are required in terms of time and money to identify and exploit the appropriate information that is available from the data. Data Mining techniques have long been applied mainly to numerical forms of data available from various data sources but their applications to analyse semi-structured or unstructured databases are still limited to a few specific domains. The applications of these techniques in combination with Text Mining methods based on statistical, natural language processing and visualisation techniques could give beneficial results. Text Mining methods mainly deal with document clustering, text summarisation and classification and mainly rely on methods and techniques available in the area of Information Retrieval (IR). These help to uncover the hidden information in text documents at an initial level. This paper investigates applications of Text Mining in terms of Textual Data Mining (TDM) methods which share techniques from IR and data mining. These techniques may be implemented to analyse textual databases in general but they are demonstrated here using examples of Post Project Reviews (PPR) from the construction industry as a case study. The research is focused on finding key single or multiple term phrases for classifying the documents into two classes i.e. good information and bad information documents to help decision makers or project managers to identify key issues discussed in PPRs which can be used as a guide for future project management process.

Full Text:

PDF



European Scientific Journal (ESJ)

 

ISSN: 1857 - 7881 (Print)
ISSN: 1857 - 7431 (Online)

 

Contact: contact@eujournal.org

To make sure that you can receive messages from us, please add the 'eujournal.org' domain to your e-mail 'safe list'. If you do not receive e-mail in your 'inbox', check your 'bulk mail' or 'junk mail' folders.




Publisher: European Scientific Institute, ESI.
ESI cooperates with Universities and Academic Centres on 5 continents.