havelaar {languageR} | R Documentation |
The frequency of the determiner 'het' in the Dutch novel 'Max Havelaar' by Multatuli (Eduard Douwes Dekker), in 99 consecutive text fragments of 1000 tokens each.
data(havelaar)
A data frame with 99 observations on the following 2 variables.
Chunk
Frequency
The text of Max Havelaar was obtained from the Project Gutenberg at at http://www.gutenberg.org/wiki/Main_Page
## Not run: data(havelaar) n = 1000 # token size of text fragments p = mean(havelaar$Frequency / n) # relative frequencies plot(qbinom(ppoints(99), n, p), sort(havelaar$Frequency), xlab = paste("quantiles of (", n, ",", round(p, 4), ")-binomial", sep=""), ylab = "frequencies") lambda = mean(havelaar$Frequency) ks.test(havelaar$Frequency, "ppois", lambda) ks.test(jitter(havelaar$Frequency), "ppois", lambda) ## End(Not run)