etymology {languageR} | R Documentation |
Estimated etymological age for regular and irregular monomorphemic Dutch verbs, together with other distributional predictors of regularity.
data(etymology)
A data frame with 285 observations on the following 14 variables.
Verb
WrittenFrequency
NcountStem
MeanBigramFrequency
InflectionalEntropy
Auxiliary
hebben
, zijn
and zijnheb
for the verb's auxiliary in the perfect tenses.Regularity
irregular
and regular
.LengthInLetters
Denominative
Den
and N
specifying
whether a verb is derived from a noun according to the CELEX lexical database.FamilySize
EtymAge
Dutch
, DutchGerman
, WestGermanic
, Germanic
and IndoEuropean
.Valency
NVratio
WrittenSpokenRatio
Baayen, R. H. and Moscoso del Prado Martin, F. (2005) Semantic density and past-tense formation in three Germanic languages, Language, 81, 666-698.
Tabak, W., Schreuder, R. and Baayen, R. H. (2005) Lexical statistics and lexical processing: semantic density, information complexity, sex, and irregularity in Dutch, in Kepser, S. and Reis, M., Linguistic Evidence - Empirical, Theoretical, and Computational Perspectives, Berlin: Mouton de Gruyter, pp. 529-555.
## Not run: data(etymology) # ---- EtymAge should be an ordered factor, set contrasts accordingly etymology$EtymAge = ordered(etymology$EtymAge, levels = c("Dutch", "DutchGerman", "WestGermanic", "Germanic", "IndoEuropean")) options(contrasts=c("contr.treatment","contr.treatment")) library(Design) etymology.dd = datadist(etymology) options(datadist = 'etymology.dd') # ---- EtymAge as additional predictor for regularity etymology.lrm = lrm(Regularity ~ WrittenFrequency + rcs(FamilySize, 3) + NcountStem + InflectionalEntropy + Auxiliary + Valency + NVratio + WrittenSpokenRatio + EtymAge, data = etymology, x = TRUE, y = TRUE) anova(etymology.lrm) # ---- EtymAge as dependent variable etymology.lrm = lrm(EtymAge ~ WrittenFrequency + NcountStem + MeanBigramFrequency + InflectionalEntropy + Auxiliary + Regularity + LengthInLetters + Denominative + FamilySize + Valency + NVratio + WrittenSpokenRatio, data = etymology, x = TRUE, y = TRUE) # ---- model simplification etymology.lrm = lrm(EtymAge ~ NcountStem + Regularity + Denominative, data = etymology, x = TRUE, y = TRUE) validate(etymology.lrm, bw=TRUE, B=200) # ---- plot partial effects and check assumptions ordinal regression par(mfrow = c(3, 3)) plot(etymology.lrm) resid(etymology.lrm, 'score.binary', pl = TRUE) plot.xmean.ordinaly(EtymAge ~ NcountStem, data = etymology) par(mfrow = c(1, 1)) ## End(Not run)