**Post: #1**

Seminar Report On

ASSOCIATION MINING

Submitted by: BENCY CLEETUS

DEPARTMENT OF COMPUTER SCIENCE

COCHIN UNIVERSITY OF SCIENCE AND TECHNOLOGY

KOCHI-682022

ABSTRACT

Association rules are one of the most researched areas of data mining and have recently received much attention from the database community. They have proven to be quite useful in the marketing and retail communities as well as other more diverse fields. Association mining task is to discover a set of attributes shared among a large number of objects in a given database. There are many potential application areas for association rule technology which include catalog design, store layout, customer segmentation, telecommunication alarm diagnosis, and so on. One of the important problems in data mining is discovering association rules from databases of transactions where each transaction consists of a set of items. The most time consuming operation in this discovery process is the computation of the frequency of the occurrences of interesting subset of items in the database of transactions

1. INTRODUCTION

Data Mining is the discovery of hidden information found in databases and can be viewed as a step in the knowledge discovery process. Data mining functions include clustering, classification, prediction, and link analysis (associations). One of the most important data mining applications is that of mining association rules. Association rules are used to identify relationships among a set of items in a database. These relationships are not based on inherent properties of the data themselves (as with functional dependencies), but rather based on co-occurrence of the data items Association rule mining, one of the most important and well researched techniques of data mining, was first introduced in Agrawal, R., Imielinski, T., and Swami, A. N. 1993. Mining association rules between sets of items in large databases. Association (rule) mining, the task of finding correlations between items in a dataset. Initial research was largely motivated by the analysis of market basket data, the results of which allowed companies to more fully understand purchasing behavior and, as a result, better target market audiences. For example, consider the sales database of a bookstore, where the objects represent customers and the attributes represent books. The discovered patterns are the set of books most frequently bought together by the customers. The store can use this knowledge for promotions, shelf placement, etc. It aims to extract interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories. Association rules are widely used in various areas such as telecommunication networks, market and risk management, inventory control etc. One of the reasons behind maintaining any database is to enable the user to find interesting patterns and trends in the data For example, an insurance company, by finding a strong correlation between two policies A and B, of the form A => B, indicating that customers that held policy A were also likely to hold policy B, could more efficiently target the marketing of policy B through marketing to those clients that held policy A but not B.

In effect, the rule represents knowledge about purchasing behavior. Association mining applications have since been applied to many different domains including market basket and risk analysis in commercial environments, epidemiology, clinical medicine, fluid dynamics, astrophysics, crime prevention, and counter-terrorism all areas in which the relationship between objects can provide useful knowledge. Association mining is user-centric as the objective is the elicitation of useful rules from which new knowledge can be derived. Association mining analysis is a two part process. First, the identification of sets of items or itemsets within the dataset. Second, the subsequent derivation of inferences from these items sets...

WHAT IS DATA MINING?

Data mining is often defined as finding hidden information in a database. It has been called exploratory data analysis, data driven discovery, and deductive learning. Data mining access of database differs from traditional access in several ways. The query might not be well formed. The data accessed is usually a different version from that of the original database. The output of the data mining query probably is not a subset of the database. Data mining algorithms can be characterized according to model, preference, and search. Model means the algorithm is fit a model to the data .Preference is some criteria must be fit one model over another. All algorithms require some technique to search the data.

The model can be either predictive or descriptive in nature. A predictive model makes a prediction about values of data using known results found from different data. Predictive modeling may be made based on the use of the other historical data. For example, a credit card use might be refused not because of the user's own credit history. Predictive model data mining tasks include classification, regression, time series analysis, and prediction. A descriptive model identifies patterns or relationships in data. It serves as a way to explore the properties of the data examined. Clustering, summarization, association rules, and sequence discovery are usually descriptive in nature.

2. OVERVIEW OF ASSOCIATION RULES

Association rule mining is to find out association rules that satisfy the predefined minimum support and confidence from a given database. The problem is usually decomposed into two sub problems. One is to find those item sets whose occurrences exceed a predefined threshold in the database; those item sets are called frequent or large item sets. The second problem is to generate association rules from those large item sets with the constraints of minimal confidence

.

Let I=I1, I2, Â¦ , Im be a set of m distinct attributes, T be transaction that contains a set

of items such that T

I, D be a database with different transaction records Ts. An association rule is an implication in the form of X =>x Where X, Y I are sets of items called item sets, and X n Y =ÃƒËœ. X is called antecedent while Y is called consequent, the rule means X implies Y. There are two important basic measures for association rules, support and confidence. Since the database is large and users concern about only those frequently purchased items, usually thresholds of support and confidence are predefined by users to drop those rules that are not so interesting or useful. The two thresholds are called minimal support and minimal confidence respectively. Support of an association rule is defined as the percentage/fraction of records that contain XUY to the total number of records in the database. Suppose the support of an item is 0.1%, it means only 0.1 percent of the transaction contain purchasing of this item.

Data mining

Predictive

Descriptive

Classification

Regression

Time series analysis

Prediction

Clustering

Association rules

Summarization

Sequence discovery

Confidence of an association rule is defined as the percentage/fraction of the number of transactions that contain X UY to the total number of records that contain X. Confidence is a measure of strength of the association rules, suppose the confidence of the association rule X=>Yis 80%, it means that 80% of the transactions that contain X also contain Y together.

Constraint-based association mining a process of weeding out uninteresting rules using constraints provided by the user.

• knowledge type constraints: specify what to be mined, e.g., association rules, classification etc

• data constraints: specify the set of task-relevant data, e.g., Find product pairs sold together in North region in Q3â„¢02"

• Dimension/level constraints: specify the dimension of the data or levels of concept hierarchies.

• rule constraints: specify the form of rules, e.g., metarules, max./min. number of predicates, etc

Association rule problem

A formal statement of the association rule problem is:

Definition 1: Let I ={I 1 , I 2 , Â¦ , I m } be a set of m distinct attributes, also called literals. Let D be a database, where each record (tuple) T has a unique identifier, and contains a set of items such that T I An association rule is an implication of the form X=>Y, where X, Y I, are sets of items called itemsets, and Xn Y=ÃƒËœ Here, X is called antecedent, and Y consequent.

Two important measures for association rules, support (s) and confidence (a), can be defined as follows.

Definition 2: The support (s) of an association rule is the ratio (in percent) of the records that contain X" Y to the total number of records in the database. Therefore, if we say that the support of a rule is 5% then it means that 5% of the total records contain X" Y. Support is the statistical significance of an association rule. Grocery store managers probably would not be concerned about how peanut butter and bread are related if less than 5% of store transactions have this combination of purchases. While a high support is often desirable for association rules, this is not always the case. For example, if we were using association rules to predict the failure of telecommunications switching nodes based on what set of events occur prior to failure, even if these events do not occur very frequently association rules showing this relationship would still be important. Definition 3: For a given number of records, confidence ( a ) is the ratio (in percent) of the number of records that contain X" Y to the number of records that contain X. Thus, if we say that a rule has a confidence of 85%, it means that 85% of the records containing X also contain Y. The confidence of a rule indicates the degree of correlation in the dataset between X and Y. Confidence is a measure of a ruleâ„¢s strength. Often a large confidence is required for association rules. If a set of events occur a small percentage of the time before a switch failure or if a product is purchased only very rarely with peanut butter, these relationships may not be of much use for management. Mining of association rules from a database consists of finding all rules that meet the user-specified threshold support and confidence. The problem of mining association rules can be decomposed into two subproblems Â¢ Generate all itemsets that have a support that exceeds the threshold. These sets of the item are called large(Frequent) itemsets.Note that large here means large support.

Â¢ For each large itemset all the rules that have a minimum confidence are generated as follows:

2.2 Association rules Generation

Number of association rules that can be generated from d items 3d - 2d+1 + 1. For example 6 items will yield 36 -27 + 1 = 602 rules. Generating all association rules for large d is intractable. Frequent itemset mining came from efforts to discover useful patterns in customer's transaction databases. A customer's transaction database is a sequence of transactions

(T={t 1 ,t 2 ,Â¦.t n }), where each transaction is an itemset (t i T).

An item set with k elements is called a k-item set. The support of an item set X in T denoted as support(X), is the number of those transactions that contain X,

I.e. support (X)= |{t i : X t j }| .

An item set is frequently if its support is greater than a support threshold, originally denoted by min_supp. The frequent itemset mining problem is to find all frequent item set in a given transaction database. The algorithms were judged for three main tasks: all frequent items sets mining, closed frequent item set mining, and maximal frequent itemset mining. A frequent item set is called closed if there is no superset that has the same support (i.e., is contained in the same number of transactions). Closed item sets capture all information about the frequent item sets, because from them the support of any frequent item set can be determined. A frequent item set is called maximal if there no superset that is frequent. Maximal item sets define the boundary between frequent and infrequent sets in the subset lattice. Any frequent item set is often also called free item set to distinguish it from closed and maximal ones.

2.3 Basic Association Rules

The minimal set of association rules are analogous to minimal functional dependancies and propose, a set of inference rules based on restricted conditional probability distribution that address Armstrongâ„¢s axioms. The GenBR algorithm results in the generation of Basic Association Rules which are nonredundant single consequent or canonical rules. Theoretical analysis shows that the search space of the algorithm can be translated to an n-cube graph. The set of classes of basic association rules generated by GenBR is easy for users to understand and manage.

3.APRIORI ITEMSET GENERATION

Apriori is a classic algorithm for learning association rules. Apriori is designed to operate on databases containing transactions. Apriori uses a "bottom up" approach, where frequent subsets are extended one item at a time and groups of candidates are tested against the data. Apriori uses breadth-first search and a hash tree structure to count candidate item sets efficiently. It generates candidate item sets of length k from item sets of length k - 1. Then it prunes the candidates which have an infrequent sub pattern. According to the downward closure lemma, the candidate set contains all frequent k- length item sets. After that, it scans the transaction database to determine frequent item sets among the candidates. For determining frequent items quickly, the algorithm uses a hash tree to store candidate itemsets. This hash tree has item sets at the leaves and hash tables at internal nodes. Note that this is not the same kind of hash tree used in for instance p2p systems.

Apriori Algorithm:

Pass 1

1. Generate the candidate itemsets in C

1.2. Save the frequent itemsets in L

1Pass k

1. Generate the candidate itemsets in Ck from the frequent itemsets in L

k-1 1. Join L k-1 p with L k-1 q,

as follows:

insert into C k

Select p.item 1 , q.item 1 , . . . , p.item k-1 , q.item k-1 from L k-1 p, L k-1 q

Where p.item 1 = q.item 1 , . . . p.item k-2 = q.item k-2 , p.item k-1 < q.item k-1

2. Generate all (k-1)-subsets from the candidate item sets in C

k

3. Prune all candidate itemsets from C

k

where some (k-1)-subset of the candidate itemset is not in the frequent itemset L

k-1

2. Scan the transaction database to determine the support for each candidate itemset in C

k

3. Save the frequent itemsets in L

k

The candidate itemsets in C

2are shown below

Itemset X supp(X)

{A,B} 25% {A,C} 50% {A,D} 25% {B,C} 25% {B,D} 50% {C,D} 25% Â¢

The frequent itemsets in L

2

are shown below Itemset X supp(X) {A,C} 50% {B,D} 50% Assume the user-specified minimum support is 40%, then generate all frequent itemsets. Given: The transaction database shown below TID A B C D E T 1 1 1 1 0 0 T 2 1 1 1 1 1 T 3 1 0 1 1 0 T 4 1 0 1 1 1 T 5 1 1 1 1 0 Pass 1 C 1 L 1 Itemset X supp(X) A ? B ? C ? D ? E ? Pass 2 C 2 Itemset X supp(X) A,B ? A,C ? A,D ? A,E ? B,C ? B,D ? B,E ? C,D ? Itemset X supp(X) A 100% B 60% C 100% D 80% E 40% 13 12 C,E ? D,E ? Nothing pruned since all subsets of these itemsets are infrequent L 2 L 2 after saving only the frequent itemsets Itemset X supp(X) A,B 60% A,C 100% A,D 80% A,E 40% B,C 60% B,D 40% B,E 20% C,D 80% C,E 40% D,E 40% Pass 3 To create C3 only look at items that have the same first item (in pass k, the first k - 2 items must match) C 3 C 3 after pruning Itemset X supp(X) A,B 60% A,C 100% A,D 80% A,E 40% B,C 60% B,D 40% C,D 80% C,E 40% D,E 40% Itemset X supp(X) A,B,C ? A,B,D ? A,C,D ? A,C,E ? A,D,E ? B,C,D ? C,D,E ? Itemset X supp(X) join AB with AC A,B,C ? join AB with AD A,B,D ? join AB with AE A,B,E ? join AC with AD A,C,D ? join AC with AE A,C,E ? join AD with AE A,D,E ? join BC B,C,D ? 14 13 Â¢ Pruning eliminates ABE since BE is not frequent Â¢ Scan transactions in the database L 3 Itemset X supp(X) A,B,C 60% A,B,D 40% A,C,D 80% A,C,E 40% A,D,E 40% B,C,D 40% C,D,E 40% Pass 4 First k - 2 = 2 items must match in pass k = 4 C 4 Itemset X supp(X) combine ABC with ABD A,B,C,D ? combine ACD with ACE A,C,D,E ? Â¢ Pruning: o For ABCD we check whether ABC, ABD, ACD, BCD are frequent. They

are in all cases, so we do not prune ABCD.

o

For ACDE we check whether ACD, ACE, ADE, CDE are frequent. Yes,

in all cases, so we do not prune ACDE

L

4

Itemset X supp(X) A,B,C,D 40% A,C,D,E 40% with BD join CD with CE C,D,E ?

Both are frequent

Pass 5: For pass 5 we can't form any candidates because there aren't two frequent 4- itemsets beginning with the same 3 items. The Apriori algorithm assumes that the database is memory-resident. The maximum number of database scans is one more than the cardinality of the largest large itemset. Given an itemeset I= {a,b,c,d,e}.If an item set is frequent, then all of its subsets must also be frequent and vice-versa. if {c,d,e} is frequent then all its subsets must also be frequent If {a,b} is infrequent, then all it supersets are infrequent

4. ADVANCED ASSOCIATION RULE TECHNIQUES

4.1 Generalized Association Rules

We introduce the problem of mining generalized association rules. Given a large database of transactions, where each transaction consists of a set of items, and a taxonomy (is-a hierarchy) on the items, we find associations between items at any level of the taxonomy. For example, given a taxonomy that says that a jacket is-a outerwear is-a clothes, we may infer a rule that people who buy outerwear tend to buy shoes. This rule may hold even if rules that people who buy jackets tend to buy shoes, and people who buy clothes tend to buy shoes do not hold. An obvious solution to the problem is to add all ancestors of each item in a transaction to the transaction, and then run any of the algorithms for mining association rules on these extended transactions . However, this Basic algorithm is not very fast; we present two algorithms, Cumulate and EstMerge, which run 2 to 5 times faster than Basic (and more than 100 times faster on one real-life dataset). We also present a new interest-measure for rules which uses the information in the taxonomy. Given a user-specified minimum-interest-level, this measure prunes a large number of redundant rules; 40% to 60% of all the rules were pruned on two real-life datasets.

Clothes

Footwear

Outerwear

shirts

Shoes

Hiking Boots

Jackets

Ski Pants

Example of Taxonomy

Earlier work on association rules did not consider the presence of taxonomies and restricted the items in association rules to the leaf-level items in the taxonomy. However, finding rules across different levels of the taxonomy is valuable since: Â¢ Rules at lower levels may not have minimum support. Few people may buy Jackets with Hiking Boots, but many people may buy Outerwear with Hiking Boots. Thus many significant associations may not be discovered if we restrict rules to items at the leaves of the taxonomy. Since department stores or supermarkets typically have hundreds of thousands of items, the support for rules involving only leaf items (typically UPC or SKU codes) tends to be extremely small.

Â¢ Taxonomies can be used to prune uninteresting or redundant rules. Multiple taxonomies may be present. For example, there could be a taxonomy for the price of items (cheap, expensive, etc.), and another for the category. Multiple taxonomies may be modeled as a single taxonomy which is a DAG (directed acyclic graph). A common application that uses multiple taxonomies is loss-leader analysis. In addition to the usual taxonomy which classifies items into brands, categories, product groups, etc., there is a second taxonomy where items which are on sale are considered to be children of a items-on-sale category, and users look for rules containing the items-on-sale item.

4.2 Multiple-Level Association Rules

Previous studies on mining association rules find rules at single concept level; however, mining association rules at multiple concept levels may lead to the discovery of more specific and concrete knowledge from data. A top-down progressive deepening method is developed for mining multiple-level association rules from large transaction databases by extension of some existing association rule mining techniques. Multiple-Level Association Rules use a hierarchy information encoded transaction table, instead of the original transaction table, in iterative data mining. This is based on the following considerations. First, a data mining query is usually in relevance to only a portion of the transaction database, such as food instead of all the items. It is beneficial to first collect the relevant set of data and then work repeatedly on the task- relevant set. Second, encoding can be performed during the collection of task-relevant data, and thus there is no extra encoding pass required. Third, an encoded string, which represents a position in a hierarchy, requires fewer bits than the corresponding object identifier or bar-code. Moreover, encoding makes more items to be merged due to their identical encoding, which further reduces the size of the encoded transaction table. For example encoded as a sequence of digits in the transaction table , the item Ëœ2% Foremost milkâ„¢ is encoded as Ëœ112â„¢ in which the first digit, Ëœlâ„¢, represents Ëœmilkâ„¢ at level-l, the second, ËœIâ„¢, for Ëœ2% (milk)â„¢ at level-2, and the third, Ëœ2â„¢, for the brand ËœForemostâ„¢ at level-3..

4.3 Quantitative Association Rules

We introduce the problem of mining association rules in large relational tables containing both quantitative and categorical attributes. An example of such an association might be 10% of married people between age 50 and 60 have at least 2 cars. We deal with quantitative attributes by fine partitioning the values of the attribute and then combining adjacent partitions as necessary. We introduce measures of partial completeness which quantify the information lost due to partitioning. A direct application of this technique can generate too many similar rules. We tackle this problem by using a greater-than-expected-value interest measure to identify the interesting rules in the output. We give an algorithm for mining such quantitative association rules. Finally, we describe the results of using this approach on a real-life dataset

4.4 Using Multiple Minimum Supports

Since a single threshold support is used for the whole database, it assumes that all items in the data are of the same nature and/or have similar frequencies. In reality, some items may be very frequent while others may rarely appear. However, the latter may be more informative and more interesting than the earlier. For example, besides finding a rule bread => cheese with a support of 8%, it might be more informative to show that wheat Bread => swissCheese with a support of 3%. Another simple example could be some items in a super market which are sold less frequently but more profitable, food processor and cooking pan. Therefore, it might be very interesting to discover a useful rule food Processor => cooking Pan with a support of 2%. If the threshold support is set too high, rules involving rare items will not be found. To obtain rules involving both frequent and rare items, the threshold support has to be set very low. Unfortunately, this may cause combinatorial explosion, producing too many rules, because those frequent items will be associated with another in all possible ways and many of them are meaningless. This dilemma is called the rare item problem To overcome this problem, one of the following strategies may be followed : (a) split the data into a few blocks according to the supports of the items and then discover association rules in each block with a different threshold support, (b) group a number of related rare items together into an abstract item so that this abstract item is more frequent. Then apply the algorithm of finding association rules in numerical interval data. It is evident that both approaches are ad hoc and approximate. Rules associated with items across different blocks are difficult to find using the first approach. The second approach cannot discover rules involving individual rare items and the more frequent items. Therefore, a single threshold support for the entire database is inadequate to discover important association rules because it cannot capture the inherent natures and/or frequency differences in the database. It extended the existing association rule model to allow the user to specify multiple threshold supports. The extended new algorithm is named as MISapriori. In this method, the threshold support is expressed in terms of minimum item supports (MIS) of the items that appear in the rule. The main feature of this technique is that the user can specify a different threshold item support for each item. Therefore, this technique can discover rare item rules without causing frequent items to generate too many unnecessary rules. Similar to conventional algorithms, the MISapriori generates all large itemsets by making multiple passes over the data. In the first pass, it counts the supports of individual items and determines whether they are large. In each subsequent pass, it uses large itemsets of the previous pass to generate candidate itemsets. Computing the actual supports of these candidate sets, the MISaprioi determines which of the candidate sets are actually large at the end of the pass. However, the generation of large itemsets in the second pass differs from other algorithms.A key operation in the MISapriori is the sorting of the items I in ascending order of their MIS values. This ordering is used in the subsequent operation of the algorithm. The extended model was tested and evaluated by using synthetic data as well as real-lifedata sets. In the experimental study of this algorithm with synthetic data, three very low LSvalues, 0.1%, 0.2%, and 0.3% were used. It has been reported that the number of large itemsets is significantly reduced by MISapriori method when < is not too large. The number of large itemsets found by this approach is close to single minsup method when < becomes larger. This is because when < becomes larger more and more itemsâ„¢ MIS values reach LS. It has also been argued that the execution time reduces significantly.

4.5 Correlation Rules

A correlation rule is defined as a set of item sets that are correlated. The motivation for developing these correlation rules is that negative correlations may be useful. Correlation satisfies upward closure in the item set lattice. Thus ,if a set is correlated, so is every superset of it Correlation(A=>B)= P(A,B) / [ P(A) P(B)] This correlation value is lower than 1,it indicates a negative correlation between A and B

5. MEASURING THE QUALITY OF RULES

Support and confidence are the normal methods used to measure the quality of an association rules :

s(A=>B) = P(A,B) and a(A=>B) = P(B | A)

Another technique to measure the significance of rules by using the chi squared test for independence has been proposed .This significance test was proposed for use with correlation rules. Unlike the support or confidence measurement ,the chi squared significance test takes into account both the presence and the absence of items in sets. Here it is used to measure how much an item set count differs from the expected. The chi squared statistic can be calculated in the following manner. Suppose the set of items is I= {I 1 ,I 2 ,Â¦Â¦Â¦I m }.A transaction t j can be viewed as tj ? {I 1 , I 1 } x {I 2 , I 2 } x Â¦Â¦x {I m , I m } Given any possible item set X, it also is viewed as a subset of the Cartesian Product. The chi squared statistic is then calculated for X as

X 2 =S x=I [O(x)-E(x)] 2 / E(x)

Here O(x) is the count of number of transactions that contain the items in X.For one item I

1

,the expected values is E(I i )=O(I i ),the count of the number of transactions that contain I i .E(I i )=n-O(I i ).The Expected value E(x) is calculated assuming independence And is thus defined as

E(x)= n x m p i=1 E(I i ) / n Here n is the number of transactions

6. APPLICATIONS OF ASSOCIATION MING

For develop a new generation of databases, called inductive databases (IDBs), suggested by Imielinski and Mannila .This kind of databases integrate raw data with knowledge extracted from raw data, materialized under the form of patterns into a common framework that supports the knowledge discovery process within a database framework. In this way, the process of KDD consists essentially in a querying process, enabled by an ad-hoc, powerful and universal query language that can deal either with raw data or patterns and that can be used throughout the whole KDD process across many different applications. We are far from an understanding of fundamental primitives for such query languages when considering various kinds of knowledge discovery processes. The so-called association rule mining. It provides an interesting context for studying the inductive database framework The association rule mining (ARM) method to discovering heuristic rules for power system restoration (PSR) to guide a fast restoration process. In order to employ the popular algorithms of ARM, the process of PSR is represented as a series of actions out of a finite action set. The interesting attributes of each action are mapped as items and the actions are mapped as transactions. Fuzzy set and clustering method are adopted to evaluate the performance of individual action. The association rules mining for Named Entity Recognition (NER) and co- reference resolution. The method uses several morphological and lexical features such as Pronoun Class (PC) and Name Class (NC), String Similarity (SP) and Position (P) in the text, into a vector of attributes. Applied to a corpus of newspaper in the Indonesian language, the method outperforms state-of-the-art maximum entropy method in name entity recognition and is comparable with state-of-the-art machine learning methods, decision tree, for co-reference resolution. The association rules mining method, considering users' differential emphasis on each item through fuzzy regions. This is more realistic and practical than prior association rules methods. Moreover, the discovered rules are expressed in natural language that is more understandable to humans. The discovery of spatial association rules, that is, association rules involving spatial relations among (spatial) objects. Spatial association rule mining is the extension of transaction association rule The method is based on a multi-relational data mining approach and takes advantage of the representation and reasoning techniques developed in the field of inductive logic programming (ILP). In particular, the expressive power of predicate logic is profitably used to represent spatial relations and background knowledge (such as spatial hierarchies and rules for spatial qualitative reasoning) in a very elegant, natural way. The integration of computational logics with efficient spatial database indexing and querying procedures permits applications that cannot be tackled by traditional statistical techniques in spatial data analysis. The proposed method has been implemented in the ILP system SPADA (spatial pattern discovery algorithm).The preliminary results of the application of SPADA to Stockport census data.

7. CONCLUSION

Association mining has become a mature field of research with diverse branches of specialization. The fundamentals of association mining are now well established and, with some important exceptions. The task of finding correlations between items in a dataset, association mining, has received considerable attention. There appears little current research involving the improvement of general item set identification or rule generation. Modeling of specific association patterns that are both statistically based on support and confidence and semantically related to given objective that a user wants to achieve or is interested in.

8. REFERENCES

[1] ACM Computing Surveys ,Vol.38,No.2,Article 5,Publication date: July 2006 [2] Data Mining Introductory and Advanced Topics Margaret H.Dunham

Southern Methodist University

[3]www.cs.bme.hu/bodon/en/apriori

[4]http://en.wikipedia.org/wiki/Association_rule_learning