Homework 4: Decision Tree Inductive Learning

The file “dt_train.csv” contains 601 lines with 10 variables. The first line contains column headers that may be interpreted as follows:

id: observation identifier.

t1: measurement on test 1; t2: measurement on test 2.

t3: measurement on test 3; t4: measurement on test 4.

t5: measurement on test 5; t6: measurement on test 6.

t7: measurement on test 7; t8: measurement on test 8.

d: binary output variable set to 1 if product is defective and 0 otherwise.

The next 600 lines contain 600 examples, for which the values of the above features are specified.

The table below reproduces the first 2 observations.

id	t1	t2	t3	t4	t5	t6	t7	t8	d
1	17	3	31	54	66	54	45	84	1
2	2	15	6	5	82	54	59	87	1

· Use rpart with the training examples to come up with a small set of rules that correctly classify the output variable “d” based on input variable values (t1, t2, t3, t4, t5, t6, t7, and t8).

· Specify the rules.

· The file “dt_test.csv” contains 200 test examples with the same 10 variables. Test your trained classifier on these test example and present your confusion matrix. Comment on your classification accuracy.

· Then use the rules to predict the output class d for the following test cases (presented in the file “dt_new.csv”):

new_case	t1	t2	t3	t4	t5	t6	t7	t8	d
1	8	86	55	53	36	12	82	19
2	22	36	80	69	90	33	22	6
3	74	26	32	26	38	52	63	12
4	66	71	71	52	42	88	89	70
5	55	72	61	41	91	39	50	96
6	34	58	22	84	84	61	95	57
7	23	70	39	65	16	71	96	78
8	9	19	67	43	2	20	92	3
9	6	71	20	6	27	58	6	22
10	68	40	86	82	82	44	61	48

Attachments:

Field of study:

Date Due:

Sunday, November 11, 2018

Answer

Decision Tree Inductive Learning

Buy this answer to view and download it immediately