1.3 The data base and its usage here.

A casual thumbing-through of chapters in this grammar (except the early phonetics/phonology ones) will immediately reveal a large number of lists of numbers like "15.4.3, 17.5.2/3, TNT 4, MT 6." These numbers indicate textual references, and play a major part in the organisation of the present volume (as they did in the dictionary). The three-part sequences like "15.4.3" refer to NMET (i.e., my own texts volume) and indicate text number (15), segment number within that text (4), and line of Nunggubuyu text (disregarding English interlinear glosses) within that segment (3). "17.5.2/3" with slash indicates occurrence of the feature in question on both line 2 and line 3 of text segment 17.5, while "17.5.2-3" indicates that the feature spills over from line 2 into line 3. "TNT" (Tales of the Nunggubuyu Tribe) and "MT" (More Tales of the Nunggubuyu Tribe) refer to two volumes of mimeoed texts prepared by Rev. Earl Hughes, and which are available at the library of the Australian Institute of Aboriginal Studies in Canberra (as well as at Numbulwar itself); citations with TNT or MT are followed by page number only.

These textual citations serve several purposes. When attached to a fully cited Nunggubuyu ex., they have basically documentary value--the reader is assured that the ex. is from a real text, and a reader wanting to know more or having doubts about the analysis can find it and analyse it. Often, however, I cite just one or two exx. of some pattern in this way, then append a list of other textual exx.--readers get an immediate idea of how common the pattern is (if the list is stated to be exhaustive at least for NMET), and a reader with a specialist's interest in a particular grammatical problem will be able to analyse a much larger number of real exx. than could possibly be presented in full in a one-volume grammar. I also occasionally use an intermediate format, with perhaps one ex. presented in full, and with an accompanying list of other textual exx. with a schematic "English" translation or summary added to the numerical citation in parentheses or brackets. Such formats give readers at least an idea of the kinds of contexts involved, again without taking up too much space. In this way, we take maximal advantage of the published texts (especially NMET), achieving a far higher level of documentation than is observable in other reference grammars, while still being able to devote most of the pp. in this volume to commentary and analysis.

The standards of accuracy and documentation which I have set for myself in preparing this volume have been high, though I may not have lived up to them uniformly. In essence, this is a corpus-based grammar, and my ideal has been to account for all or nearly all instances in the texts of each morpheme or other feature under consideration. Accordingly, I have totally revised earlier unpublished versions of this grammar, which were based to a large extent on directly elicited sentences and on my own "knowledge" of the language. In combing through the texts while preparing various sections of the final version, I have discovered that some of my rules were wrong, but above all that my rules were oversimplified, missing semantic and syntactic patternings which emerged from collating and organising large numbers of textual occurrences.

Particularly in the case of demonstratives (Chapter 7), I found it necessary in addition to generate statistics about correlations among roots and affixes. I therefore installed a concordance program in our computing centre at Harvard (the Oxford Concordance Project), typed in the more than 3000 tokens of demonstratives in NMET with a schematisation of contexts, and produced a working concordance and statistical word-list. This material was highly valuable in identifying categorical and statistical covariation patterns among the morphemes in the demonstrative system, and led to previously unsuspected conclusions about the usage of each morpheme. Some statistical material is also given in connection with nouns (Chapter 4), though in that case the problems were more manageable without computational resources.

The extensive exposition of textual citations and statistics in many chapters of this volume may strike some readers as reflecting a personal fetish of mine. While this may be true, it is a fetish which I would defend. The format used permits those readers who just want the bottom line to get it (by skipping over lists of exx., statistics presented in tables, etc.), and indeed I assist them further by double-underlining of important terms and conclusions. On the other hand, it gives a more patient (or more skeptical) reader a feeling for the raw data which underlie the analysis and an opportunity to "cross-examine" the author by going directly to the data. It also encourages readers with highly specialised interests, or with a different theory of language, to discover new patterns which I overlooked or did not have space to discuss.

My concern with documentation reflects my own sad experiences as a reader of other linguists' grammars, which have almost never provided me with the information I wanted to undertake my own (re-)analysis of the language in question. It also reflects my experience that most published grammars are based on material obtained in unreliable direct-elicitation (sentence-translation) sessions, and/or utterances which were produced by the linguist with or without "confirmation" from a native informant. I have no confidence whatever in such data, since my own early "data" of this type often turned out to be seriously wrong. Accordingly, to other linguists who express disapproval of my emphasis on documentation, I suggest that they try doing an analysis based on a comparable textual corpus and see if it doesn't add to their understanding of their favourite language.

In the present volume, I have been least concerned with textual documentation in the phonetics/phonology chapters, and in Chapter 9 (pronominal prefixes), since even a large corpus would not contain all of the pronominal prefixes, and certainly not all of the forms needed to specify details of phonological rules and their ordering relationships. I have also kept textual documentation low in some syntax sections such as §15.2 (status of "NP") and §15.4 (word order) because of the open-endedness of the problems raised (and the finiteness of this volume). Of course, elicited utterances have been used as exx. in various places where no suitable textual ex. was available or for other reasons, but this should be clear from the absence of numerical citations if not from the commentary.