Data Mining In The Applied World
Nowadays, digital information is relatively easy to capture and fairly inexpensive to store. The digital revolution has seen collections of data grow in size, and the complexity of the data therein increase. Question commonly arising as a result of this state of affairs is, having gathered such quantities of data, what do we actually do with it? It is often the case that large collections of data, however well structured, conceal implicit patterns of information that cannot be readily detected by conventional analysis techniques. Such information may often be usefully analyzed using a set of techniques referred to as knowledge discovery or data mining. These techniques essentially seek to build a better understanding of data, and in building characterizations of data that can be used as a basis for further analysis, extract value from volume.
In this paper we present the data warehousing and mining concepts, the goals behind data mining and its applications in the real world.
We live in the Age of Information. The importance of collecting the data that reflect business or scientific activities to achieve competitive advantage is widely recognized now. Powerful systems for collecting data and managing it in large databases are in place in all large and mid-range organizations. The value of raw data (collected over a long time) is on the ability to extract high-level information: information useful for decision support, for exploration, and for better understanding of the phenomena generating the data. Traditionally this task of extracting information was done with the help of analysis where one or more analysts with the help of statistical techniques provide summaries and generate reports. Such an approach fails as the volume and dimensionality of the data increase. Who could expect to understand millions of cases each having hundreds of fields? To complicate the issue, the data expand and change at rates that could easily defy human analysis. Hence tools to aid the automation of analysis tasks are becoming a necessity. Thus, data mining was evolved which is automatic extraction of patterns of information from the data. The additional benefit of using the automated process of data mining systems is that this process has a much lower cost than hiring an army of highly trained professional statisticians (analysts). While data mining does not eliminate human participation in solving the task completely, it significantly simplifies the job and allows an analyst to manage the process of extracting knowledge from data. Many organizations now view information as one of their most valuable assets and data mining allows a company to make full utilization of these information assets. Two critical factors for success with data mining are: a large, well-integrated data warehouse and a well-defined understanding of the business process within which data
mining is to be applied (such as customer prospecting, retention, campaign management, and so on).
2. Data Warehousing
Before discussing the different applications of data mining let us first delve upon data warehousing.
Data warehousing deals with the problem of gaining unified access to data from multiple and potentially incompatible information systems. A data warehouse is a central repository for all or significant parts of the data that an enterprise's various business systems collect. A data warehouse is defined as:
(i) Subject-oriented, integrated.
(ii) Time-variant, nonvolatile collection of data in support of management decision purposes.
Data from various online transaction processing applications and other sources is selectively extracted and organized on the data warehouse database for use by analytical applications and user queries. Data warehousing emphasizes the capture of data from diverse sources for useful analysis and access.
2. Data Mining
As the term connotes, data mining refers to the mining or discovery of new information in terms of patterns or rules from vast amounts of data. Data mining helps in achieving the following goals or tasks:
. Prediction: Data mining can show how certain attributes within the data will behave in the future. Examples of predictive data mining in the business context includes the analysis of buying transactions to predict what consumers will buy under certain discounts and how much sales volume a store will generate in a given period. In a scientific context, certain seismic wave patterns may predict an earthquake with high probability.
2. Identification: Data patterns can be used to identify the existence of an item an event or an activity. For example, in biological applications, existence of a gene may be identified by certain sequences of
nucleotide symbols in the DNA sequence. It also involves authentication where it is ascertained whether a user is indeed a specific user or one from an authorized class; it involves a comparison of parameters or images or signals against a database.
3. Classification: Data mining can partition the data so that different classes or categories can be identified based on combination of parameters. For example, customers in a supermarket can be categorized into discount seeking shoppers, shoppers in a rush, loyal regular shoppers and infrequent shoppers. This classification may be used in different analysis of customer buying transactions as post mining activity.
4. Optimization: One eventual goal of data mining activity is to optimize the use of limited resources such as time, space, money, or materials and to maximize output variables such as sales or profits under a given set of constraints.
These goals are realized with the help of different approaches such as Discovery of sequential patterns, Discovery of patterns in time series, Discovery of classification rules, Regression, Neural networks, Genetic Algorithms, Clustering and Segmentation.
3. Data Mining in the Real World
Although data mining is still in its infancy, organizations working in a wide range of environments - including retail, finance, heath care, manufacturing, transportation, education, natural resource planning and aerospace - are already using data mining tools and techniques to take advantage of historical data. By using pattern recognition technologies and statistical and mathematical techniques to sift through warehoused information, data mining helps analysts recognize significant facts, relationships, trends, patterns, exceptions and anomalies that might otherwise go unnoticed. We now site some data
mining applications in operation in various fields:
3. Business Management
For businesses, data mining is used to discover patterns and relationships in the data in order to help make better business decisions. Data mining can help spot sales trends, develop smarter marketing campaigns, and accurately predict customer loyalty. Specific uses of data mining include:
Â¢ Market segmentation - Identify the common characteristics of customers who buy the same products from your company.
Â¢ Customer churn - Predict which customers are likely to leave your company and go to a competitor.
Â¢ Fraud detection - Identify which transactions are most likely to be fraudulent.
Â¢ Direct marketing - Identify which prospects should be included in a mailing list to obtain the highest response rate.
Â¢ Interactive marketing - Predict what each individual accessing a Web site is most likely interested in seeing.
Â¢ Market basket analysis - Understand what products or services are commonly purchased together.
Â¢ Trend analysis - Reveal the difference between a typical customer this month and last.
The above uses are elaborated further in the following cases:
3.. Telecommunication Company
Details about who call whom, how long they are on the phone, and whether a line is used for fax as well as voice can be invaluable in targeting sales of services and equipment to specific customers. But these tidbits are buried in masses of numbers in the database. By delving into its extensive customer-call database to
manage its communications network, a regional telephone company identified new types of unmet customer needs. Using its data mining system, it discovered how to pinpoint prospects for additional services by measuring daily household usage for selected periods. For example, households that make many lengthy calls between 3 p.m. and 6 p.m. are likely to include teenagers who are prime candidates for their own phones and lines. When the company used target marketing that emphasized convenience and value for adults - "Is the phone always tied up?" - hidden demand surfaced. Extensive telephone use between 9 a.m. and 5 p.m. characterized by patterns related to voice, fax, and modem usage suggests a customer has business activity. Target marketing offering those customers "business communications capabilities for small budgets" resulted in sales of additional lines, functions, and equipment.
3..2 Credit Card Sale
A bank searching for new ways to increase revenues from its credit card operations tested a non-intuitive possibility: Would credit card usage and interest earned increase significantly if the bank halved its minimum required payment? With hundreds of gigabytes of data representing two years of average credit card balances, payment amounts, payment timeliness, credit limit usage, and other key parameters, the bank used a powerful data mining system to model the impact of the proposed policy change on specific customer categories, such as customers consistently near or at their credit limits who make timely minimum or small payments. The bank discovered that cutting minimum payment requirements for small, targeted customer categories could increase average balances and extend indebtedness periods, generating more than $25 million in additional interest earned.
3..3 Pharmaceutical Company
A pharmaceutical company analyzed its recent sales force activity and their results to improve targeting of
high-value physicians and determine which marketing activities will have the greatest impact in the next few months. The data included competitor market activity as well as information about the local health care systems. The results were distributed to the sales force via a wide-area network that enabled the representatives to review the recommendations from the perspective of the key attributes in the decision process. The reviews of the sales force along with the results were sent back to the top management for final decisions. The ongoing, dynamic analysis of the data warehouse allows best practices from throughout the organization to be applied in specific sales situations.
3..4 Shelf spacing in supermarkets
A supermarket decided to allot shelf space to products and place them according to the requirements of the customers. For this, they performed a market basket analysis (a data mining technique) and found there was a correlation between baby diapers and beer sold at that establishment. The company used this completely non-intuitive information to rearrange its shelves and place the beer and diapers within close proximity of each other and wound up with a healthy increase in sales. The point is that these kinds of relationships are often obscure and not intuitively obvious for a human to even think of exploring.
3.2 Other Areas of Application
Though data mining is most visible in the business world, it finds application in other areas too, where it facilitates decision-making, resource optimization, cost effectiveness and classification. We hereby discuss some cases to support this:
3.2. Expert GIS for water resource planning:
The Texas water development board is a state agency responsible for long-term water supply planning. One of its major tasks is to assure water resources for a wide region through good planning and sound water management. The manual planning process is very tedious and difficult, and suffered from a number of limitations. Thus, the planning system was automated and it comprised of:
. An expert rule system.
2. A geographic information system (GIS)
3. A Network Flow solver.
The rule-based system contains expertise acquired from water resources planning experts. The GIS system stores and analyses spatially distributed water supply and demand data. The task of the network flow solver is to balance the flows in networks developed by the expert GIS with input from various water analyst. The objective of this part is to find the least costly allocation solution. In case of a deficit it is also able to suggest alternative supplies that are efficient and cost affective.
3.2.2 Intelligent search agents on the Internet
The Internet mainly uses data mining in the form of intelligent search agents. One such search agent, the Ëœ Purple Yogiâ„¢ empowers networks to understand both content and user needs, enabling the next generation of content management and enterprise knowledge management solutions. A Yogi Discovery System understands the content in the network, discovers the users' interests and empowers the network to connect the right content to the right users. By driving this awareness into the network, a Yogi Discovery System greatly reduces the time and effort users and content providers expend searching for each other. Users benefit from having relevant information made effortlessly available to them, information they might not
even know existed. Content providers benefit from reaching exactly the right set of users interested in their content.
3.2.3 Health Care
Merck-Medco Managed Care is a mail-order business which sells drugs to the country's largest health care providers: Blue Cross and Blue Shield state organizations, large HMOs, U.S. corporations, state governments, etc. Merck-Medco is mining its one terabyte data warehouse to uncover hidden links between illnesses and known drug treatments, and spot trends that help pinpoint which drugs are the most effective for what types of patients. The results are more effective treatments that are also less costly. Merck-Medco's data mining project has helped customers save an average of -5% on prescription costs.
The education domain offers many interesting and challenging applications for data mining. First, an educational institution often has many diverse and varied sources of information. There are the traditional databases (e.g. studentsâ„¢ information, teachersâ„¢ information, class and schedule information, alumni information), online information (online web pages and course content pages) and more recently, multimedia databases. Second, there are many diverse interest groups in the educational domain that give rise to many interesting mining requirements. For example, the administrators may wish to find out information such as admission requirements and to predict the class enrollment size for timetabling. The students may wish to know how best to select courses based on prediction of how well they will perform in the courses selected. The alumni office may need to know how best to perform target mailing so as to achieve the best effort in reaching out to those alumni that are likely to respond. All these applications not only contribute towards the education institute delivering a better quality education experience, but also aid
the institution in running its administrative tasks. With so much information and so many diverse needs, it is foreseeable that an integrated data mining system that is able to cater for the special needs of an education institution will be in great demand particularly in the 2st century.
Data mining challenges the long standing viewpoint that computers and internet do bring information but not knowledge. In the new millennium, competitive enterprises will be mining their data with sophisticated data mining tools to find and attract the best customers, to improve and enhance their product offerings, to maximize operating efficiency and to cut costs and improve customer satisfaction. With time and resources in short supply, data mining software will help enterprises maximize resources to remain competitive.
In the short-term, the results of data mining will be in profitable, if mundane, business related areas. Micro-marketing campaigns will explore new niches. Advertising will target potential customers with new precision.
In the medium term, data mining may be as common and easy to use as e-mail. We may use these tools to find the best airfare to New York, root out a phone number of a long-lost classmate, or find the best prices on lawn mowers.
The long-term prospects are truly exciting. Imagine intelligent agents turned loose on medical research data or on sub-atomic particle data. Computers may reveal new treatments for diseases or new insights into the nature of the universe.
Thus we see that with the advancements and deployment of sophisticated data mining tools, computers can think bringing knowledge to our desktops.