Assignment 6 Chi-Squared Details

Assignment 6 Chi-Squared Details

par Edgardo Torres,
Nombre de réponses : 4

So im done the assignment, but something doesn't feel right about the chi-squared section... Once "n" reaches a certain number (eg. 400), all p-values are astronomically small, which would render "ProbTest" from part two of the assignment (which uses an n of 10000) useless. I may have misunderstood something.

Could someone clarify if we are entering a single vector of 'realistic' frequencies generated by "sample.n (TRUE)" into the chi-squared? Because now that I'm re-reading the section, it seems to imply we are using a vector of 'realistic' frequencies, as well as a vector of 'non-realistic' frequencies (eg. sample.n (FALSE)), and comparing them with the chi-squared. Is this correct?

En réponse à Edgardo Torres

Re: Assignment 6 Chi-Squared Details

par Sarah Shawky,
Hi Edgardo,

I understood this question as what you said in the second paragraph - using the chisq.test function to compare real data frequencies (TRUE) and "non-real" frequencies (FALSE). Can anyone else also confirm?

Also, as a follow-up question - my chi-squared test outputs the test results and a p value, however no matter the n I use, it always gives a warning that says "Chi-squared approximation may be incorrect". Is this because of the zero-counts that we keep in the table? Or does this mean that I am making a mistake somewhere else?

Thank you
En réponse à Sarah Shawky

Re: Assignment 6 Chi-Squared Details

par Erik Spence,
As I said in the last lecture, this is a single-sample test, not a two-sample test. So, no, you should only be using data generated using realistic frequencies. The default null hypothesis for this test is that all cases have equal probability, so there's no need to compare the real probabilities to artificially generated equal-probability data.

Yes, the chisq warning is due to the low number of data being used.
En réponse à Erik Spence

Re: Assignment 6 Chi-Squared Details

par Edgardo Torres,
So should the p-value at n=10000 always be < 0.05? I’m confused because then what’s the point of adding a statement for when p>0.05?
En réponse à Edgardo Torres

Re: Assignment 6 Chi-Squared Details

par Erik Spence,
It might be. It might not be. The point is to automate the checking of the significance.