Data Mining : Task Mining

on Thursday 8 August 2013

Overview

Text mining, refers to the process of taking high-quality information of text. High-quality information is usually obtained through a pattern forecasting and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of some among others, and subsequent insertion into a database), determine the pattern in structured data, and finally evaluate and interpret the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interestingness. 


Typical text mining process include text categorization, text clustering, concept extraction / entity, production granular taxonomies, sentiment analysis, document inference, and modeling Entity-relationship (ie, learning relations between named entities (Wikipedia, 2011). Classification / categorization of documents is a problem in information science. Our task is to assign incoming electronic documents in one or more categories, based on their contents. Document classification tasks can be divided into two kinds of unsupervised document classification where some external mechanisms (such as human feedback) provides information on appropriate classification of documents, and unsupervised document classification, where the classification must be done entirely without reference to information external. There is also a semi-supervised document classification, in which part of the documents are labeled by an external mechanism (Wikipedia, 2011). Manual text mining approach extensively in laboratory first appeared in the mid-1980s, however, technological advances have allows the sphere to flourish over the last decade. Text mining is an interdisciplinary field that refers to information retrieval, data mining, machine learning, statistics, and computational linguistics. Because most information (common estimates say more than 80%) currently stored as text, text mining is believed to have potential value high commercial (Clara Bridge, 2011).

Currently, text mining has gained attention in many areas:
1. Security applications.
Many text mining software packages marketed for applications security, particularly the analysis of the plain text as Internet news. It is also studies include text encryption.

2. Biomedical applications.
Various text mining applications in the biomedical literature has been compiled. One example is PubGene that combines text mining biomedical with network visualization as an Internet service. Another example text mining is GoPubMed.org. Semantic similarity has also been used by text mining system, ie, GOAnnotator.

3. Software and Applications
Research and development departments of large companies, including IBM and Microsoft, are researching text mining techniques and developing programs to further automate the mining and analysis process. Software text mining is also being researched by different companies working in field of search and indexing in general as a way to improve its performance.

4. Online Media ApplicationsText mining is being used by large media companies, such as Tribune company, to eliminate ambiguous information and to provide readers with a better search experience, which increase loyalty and revenue on site. In addition, benefited editor by being able to share, associates and property news package, significantly increased opportunities to monetize content.

5. Marketing Applications
Text mining is also beginning to be used in marketing, more specifically in analysis of customer relationship management. Coussement and Van den Poel (2008) apply it to improve the model prediction analysis for churn customers (subscribers reduction).

6. Sentiment Analysis
Sentiment Analysis may involve analysis of movie reviews for estimate how good the reviews for a movie. This kind of analysis may require a data set labeled or labels of the effectiveness of the words. A resource for the effectiveness of the words have been made to WordNet.

7. Academic Applications
Text mining problems is important for publishers who have large databases to obtain information that requires indexing to search. This is especially true in science, where very specific information often contained in a written text. Therefore, initiatives have been taken such as Nature's proposal for Open Text Mining Interface (OTMI) and Health's common to Journal Publishing Document Type Definition (DTD) which will provide semantic cues to machines to answer specific questions contained in the text without removing publisher barriers to access public.

Previously, most frequent website using text-based search, the only finding documents that contain words or phrases that specific specified by the user. Now, through the use of semantic web, text mining can find content based on meaning and context (rather than just with a particular word). Text mining is also used in some email spam filters as a way to determine the characteristics of the message might be advertising or materials other undesirable.

0 comments:

Post a Comment