Up to [Law Notes].
Extract from some correspondence with H.T.
From harold at mdx ac uk Wed Sep 20 23:19:34 1995
To: lloyd at bruce cs monash edu au (Lloyd Allison)
From: harold at mdx ac uk (Prof. Harold Thimbleby x6061)
Subject: Re: Australian Senate Enquiry
>The "47% of the 11000 most repeated searches":
> 1. this could be just 1% of the total searches (I doubt it is),
> can you clarify?
It was 1%ish of the total. I had about 10^6 searches, so I didn't classify
it all. Somebody with a thesaurus [ ... ] could
help here; or maybe we could develop a methodology to automatically follow
the references and see what proportion refer transitively to stuff that can
be classified.... sounds dubious.
[ ... ]
> What was the total # of searches during the collection?
> Is it 11000 words or 11000 phrases/searches?
There are very few phrases, although the data has them (see below) That is,
few people look for ands/ors etc
> 2. what is the list of pornographic words ?
>
The list of searches starts off:
1317 sex
446 erotic
424 nude
369 erotica
272 penthouse
244 playboy
237 porn
224 pornography
223 porno
205 isindex=
187 adult
150 mpeg
118 ebola [ <-- There was an Ebola virus outbreak in Africa mid 1995]
86 girls
85 hustler
84 news
83 games
81 music
77 bondage
76 robots
76 netscape
71 supermodels
71 gif
71 gay
64 pictures
63 weather
62 doom
54 x-rated
54 alt.sex
52 xxx
52 supermodel
51 SEX
50 nudity
50 nudes
49 genealogy
49 Sex
The numbers indicate the exact ASCII-match frequencies (ie., sex and Sex
are different) of expression searches (some are in the form
'pamela+anderson' for example). I am going to organise the data more
helpfully and put it on the Web, probably next week.
[ ... ]
-------------------------------------------------------------------------
Prof Harold Thimbleby Computing Science
+44 (0)181 362 6061 direct Middlesex University
FAX/ansaphone 0181 362 6411 Bounds Green Road
University 0181 362 5000 LONDON, N11 2NQ
harold at mdx ac uk
WWW URL: http://www.cs.mdx.ac.uk/harold