Assignment 6 Chi-Squared Details

Assignment 6 Chi-Squared Details

by Edgardo Torres -
Number of replies: 4

So im done the assignment, but something doesn't feel right about the chi-squared section... Once "n" reaches a certain number (eg. 400), all p-values are astronomically small, which would render "ProbTest" from part two of the assignment (which uses an n of 10000) useless. I may have misunderstood something.

Could someone clarify if we are entering a single vector of 'realistic' frequencies generated by "sample.n (TRUE)" into the chi-squared? Because now that I'm re-reading the section, it seems to imply we are using a vector of 'realistic' frequencies, as well as a vector of 'non-realistic' frequencies (eg. sample.n (FALSE)), and comparing them with the chi-squared. Is this correct?

In reply to Edgardo Torres

Re: Assignment 6 Chi-Squared Details

by Sarah Shawky -
Hi Edgardo,

I understood this question as what you said in the second paragraph - using the chisq.test function to compare real data frequencies (TRUE) and "non-real" frequencies (FALSE). Can anyone else also confirm?

Also, as a follow-up question - my chi-squared test outputs the test results and a p value, however no matter the n I use, it always gives a warning that says "Chi-squared approximation may be incorrect". Is this because of the zero-counts that we keep in the table? Or does this mean that I am making a mistake somewhere else?

Thank you
In reply to Sarah Shawky

Re: Assignment 6 Chi-Squared Details

by Erik Spence -
As I said in the last lecture, this is a single-sample test, not a two-sample test. So, no, you should only be using data generated using realistic frequencies. The default null hypothesis for this test is that all cases have equal probability, so there's no need to compare the real probabilities to artificially generated equal-probability data.

Yes, the chisq warning is due to the low number of data being used.
In reply to Erik Spence

Re: Assignment 6 Chi-Squared Details

by Edgardo Torres -
So should the p-value at n=10000 always be < 0.05? I’m confused because then what’s the point of adding a statement for when p>0.05?
In reply to Edgardo Torres

Re: Assignment 6 Chi-Squared Details

by Erik Spence -
It might be. It might not be. The point is to automate the checking of the significance.