| Data mining is the process of discovering relationships | | | | strategy was applied to the stars in space, it could find |
| in large data sets. It is an area of Computer Science | | | | that each galaxy is a cluster and assign a unique |
| that has received a large amount of commercial | | | | cluster identification to each star in a specified galaxy. |
| interest. In this article, I'll detail a few of the most | | | | This cluster identification then becomes another field in |
| common techniques of data mining analysis. | | | | the data set and may be employed in further data |
| Association rule discovery: Association rule discovery | | | | mining analysis. For instance, you may use a cluster id |
| techniques are used to extract associations from data | | | | field to form association rules to some other fields in |
| sets. Historically, the strategy was developed on | | | | the data set. |
| supermarket purchase data. An association rule is a | | | | Decision trees: Decision trees are used to form a tree |
| rule of the form X -> Y. An instance of this may be "If | | | | of decisions in a data set to help forecast a value |
| a buyer purchases milk this implies ( -> ) that the buyer | | | | associate with that data. For example, if you were |
| will also purchase bread". An association rule has | | | | looking at a data set that was employed to predict |
| associated with it support and confidence values. The | | | | whether a potential loan applicant would be a credit |
| support is the proportion of all entries (or transactions in | | | | risk, a tree of decisions would be formed based on |
| this example) that have all of the items. For example, | | | | factors in the data set. The tree may contain decisions |
| the proportion of all purchases in which both milk and | | | | like whether the applicant had defaulted on a loan |
| bread were purchased. The confidence is the | | | | before, the age of the applicant, whether the applicant |
| proportion of the transactions that satisfy the left side | | | | was employed or not, the applicants earnings and the |
| of the rule that also satisfy the right side of the rule. | | | | total repayments on the loan. You could then follow |
| For instance, in this situation, the confidence would be | | | | this tree of decisions to say for example, if an |
| the proportion of purchases that purchased milk which | | | | applicant has never defaulted on a loan before, the |
| also purchased bread. Association discovery | | | | applicant is employed, their earnings is in the top 15 |
| techniques will extract all possible association rules | | | | percentile for the country and the loan amount is |
| from a data set for which the user has stipulated a | | | | relatively low then there's a extremely low risk of |
| minimum support and confidence. | | | | default. |
| Cluster analysis: Cluster analysis is the process of | | | | These are some of the more common methods for |
| taking several numeric fields and assigning clusters to | | | | data mining analysis amongst a large group of data |
| their values. These clusters represent groups of points | | | | mining methods that are frequently applied to analyzing |
| which are close to one another. For instance, if you | | | | large data sets. These strategies have proved |
| watch a documentary on space, you will see that | | | | valuable to gather helpful information and relationships |
| galaxies contain a large number of stars and planets. | | | | from data sets that may otherwise be too large to |
| There are many galaxies in space, however the stars | | | | analyse well. |
| and planets all occur in clusters that are the galaxies. | | | | The author owns a number of websites that provide |
| That is, the stars and planets are not at random | | | | financial loan calculators including this refinancing |
| locations in space but are clumped together in groups | | | | calculator, this amortization calculator and this boat loan |
| that are galaxies. A cluster analysis technique is used | | | | calculator. |
| to find these types of groups. If a cluster analysis | | | | |