MINING:

Consider a 2-dimensional data set with points at:
(1,1), (1,2), (1,100), (2,3), (2,4), (3,1), (3,3)
Can you divide this data set into two clusters?
State what (objective) criteria you would use for the clustering.
Is it reasonable to require that clusters be equal in size?


Consider a mortgage company trying to decide whether to grant someone a
loan.  They only have 3 pieces of information available to them for this
purpose.  (1) Whether applicant has a job  (2) Whether applicant is married
(3) Whether applicant has defaulted on a loan before.
A decision tree is being built to classify applicants into two categories --
loanworthy and not loanworthy.
What is the most complex decision tree that could be constructed under the
circumstances -- how many nodes, leaves, edges, levels?
Draw an example.
What is the simplest decision tree possible?
Draw an example.


Simulate the running of the A Priori algorithm over the following data set:
ABC, ABD, ABCE, ABDFG, ABCG, BCEFG
Use a confidence threshold of 0.9 and a support threshold of 0.25.
(The above data means that items A, B, and C were purchased together
in the first transaction; A,B and D in the second, and so on)

Instead of 6 transactions, suppose I had 6 million transactions, but still
only 7 different items.  How should I change my confidence and support
thresholds.  Argue qualitatively.

If I have 6 million transactions, and 100,000 different items (instead of 7),
then how should I change my confidence and support thresholds.