Follow
Publications: 18 | Followers: 0

_-Research Methods for the Learning Sciences

Publish on Category: Birds 0

Core Methods inEducationalData Mining
HUDK4050Fall 2014
Any administrative questions?
The Homework
Let’s go over the homework
Q1
Build a decision tree (using operator W-J48 from the Weka Extension Pack) on the entire data set. What is the non-cross-validated kappa?What was process?What was result?
Q2
That’s the correct answer, but let’s think about it. The kappa value you just obtained is artificially high – the model is over-fitting to which student it is. What is the non-cross-validated kappa, if you build the model (using the same operator), excluding student?How did you modify the model to remove the student term?Were there multiple ways to accomplish this?
Q2
That’s the correct answer, but let’s think about it.Why was the kappa value artificially high?
Q2
That’s the correct answer, but let’s think about it.Why was the kappa value artificially high?How do we know that we were over-fitting to the student?
Q2
How did you remove student from the model?There were multiple ways to accomplish this
Q2
What is the non-cross-validated kappa, if you build the model (using the same operator), excluding student?Did the number go up? Go down? Stay the same?What does this mean?
Q3
Some other features in the data set may make your model overly specific to the current data set. Which data features would not apply outside of the population sampled in the current data set?Answers?
Q4
What is the non-cross-validated kappa, if you build the W-J48 decision tree model (using the same operator), excluding student and the variables from Question 3? Recall that you decided to eliminate School, Class, and Coder, as well as STUDENTID.Answers?
Q4
What is the non-cross-validated kappa, if you build the W-J48 decision tree model (using the same operator), excluding student and the variables from Question 3? Recall that you decided to eliminate School, Class, and Coder, as well as STUDENTID.Answers?Was this algorithm successful?
Q5
What is the non-cross-validated kappa, for the same set of variables you used for question 4, if you use Naïve Bayes?Answers?
Q6
What is the non-cross-validated kappa, for the same set of variables you used for question 4, if you use W-JRip?Answers?
Q7
What is the non-cross-validated kappa, for the same set of variables you used for question 4, if you use Logistic Regression? (Hint: You will need to transform some variables to make this work;RapidMinerwill tell you what to do)How did you do the variable transform?Why did you need to do the variable transform?Answers?
Q8
Wow, that was a lot of waiting for nothing. What is the non-cross-validated kappa, for the same set of variables you used for question 4, if you use Step Regression (called Linear Regression)?Answers?
Q9
What is the non-cross-validated kappa, for the same set of variables you used for question 4, if you use k-NN instead of W-J48? (We’ll discuss the results of this test later).Answers?Why did you get that result?
Q10
What is the kappa, if you delete School, Class, Coder, and STUDENTID, use W-J48, and conduct 10-fold stratified-sample cross-validation?How did you set this up?Answers?
Q11
Why is the kappa lower for question 11 (cross-validation) than question 4 (no cross-validation?)
Q12
What is the kappa, for the same set of variables you used for question 4, if you use k-NN, and conduct 10-fold stratified-sample cross-validation?
Q13
k-NN and W-J48 got almost the same Kappa when compared using cross-validation. But the kappa for k-NN was much higher (1.000) when cross-validation wasn't used. Why is that?
Questions? Comments? Concerns?
How did you likeRapidMiner?
OtherRapidMinerquestions?
What is the difference between a classifier and aregressor?
What are some thingsyou might use a classifier for?
Bonus points for examples other than those in the BDE videos
Any questions about any algorithms?
Do folks feel like they understood logistic regression?
Any questions?
Anyone willing to come up and do a couple examples?
Logistic Regression
m = 0.5A - B + C
Logistic Regression
m = 0.5A - B + C
Logistic Regression
m = 0.5A - B + C
Logistic Regression
m = 0.5A - B + C
Logistic Regression
m = 0.5A - B + C
Why would someone
Use a decision tree rather than, say, logistic regression?
Has anyone
Used any classification algorithms outside the set discussed/recommended in the videos?Say more?
Other questions, comments, concerns about lectures?
Did anyone read Hand article?
Thoughts?
Did anyone readPardosarticle?
Thoughts?
Creative HW 1
Questions about Creative HW 1?
Questions? Concerns?
Other questions or comments?
Next Class
Tuesday, September 15Behavior DetectionBaker, R.S. (2014) Big Data and Education. Ch.1,V5.Ch. 3, V1, V2.Baker,R.S.J.d., Corbett, A.T., Roll, I.,Koedinger, K.R. (2008) Developing a Generalizable Detector of When Students Game the System. User Modeling and User-Adapted Interaction, 18, 3,287-314.SaoPedro, M.A., Baker,R.S.J.d.,Gobert, J., Montalvo, O.Nakama, A. (2013) Leveraging Machine-Learned Detectors of Systematic Inquiry Behavior to Estimate and Predict Transfer of Inquiry Skill.User Modeling and User-Adapted Interaction, 23(1), 1-39.Creative HW 1 due
The End

0

Embed

Share

Upload

Make amazing presentation for free
_-Research Methods for the Learning Sciences