qminer
Version:
A C++ based data analytics platform for processing large-scale real-time streams containing structured and unstructured data
1,041 lines (990 loc) • 8.49 kB
Plain Text
| From svn.tartarus.org/snowball/trunk/website/algorithms/english/stop.txt
| This file is distributed under the BSD License.
| See http://snowball.tartarus.org/license.php
| Also see http://www.opensource.org/licenses/bsd-license.html
| - Encoding was converted to UTF-8.
| - This notice was added.
| - Added 1 letter tokens
| - Uncommented common words at the end of the original snowball stopwords
| - Added En523 stopwords not present in the original snowball stopwords
| - Added URL parts
| An English stop word list. Comments begin with vertical bar. Each stop
| word is at the start of a line.
| All tokens consisting of 1 letter from the alphabet
a
b
c
d
e
f
g
h
i
j
k
l
m
n
o
p
q
r
s
t
u
v
w
x
y
z
| Many of the forms below are quite rare (e.g. "yourselves") but included for
| completeness.
| PRONOUNS FORMS
| 1st person sing
i | subject, always in upper case of course
me | object
my | possessive adjective
| the possessive pronoun `mine' is best suppressed, because of the
| sense of coal-mine etc.
myself | reflexive
| 1st person plural
we | subject
| us | object
| care is required here because US = United States. It is usually
| safe to remove it if it is in lower case.
our | possessive adjective
ours | possessive pronoun
ourselves | reflexive
| second person (archaic `thou' forms not included)
you | subject and object
your | possessive adjective
yours | possessive pronoun
yourself | reflexive (singular)
yourselves | reflexive (plural)
| third person singular
he | subject
him | object
his | possessive adjective and pronoun
himself | reflexive
she | subject
her | object and possessive adjective
hers | possessive pronoun
herself | reflexive
it | subject and object
its | possessive adjective
itself | reflexive
| third person plural
they | subject
them | object
their | possessive adjective
theirs | possessive pronoun
themselves | reflexive
| other forms (demonstratives, interrogatives)
what
which
who
whom
this
that
these
those
| VERB FORMS (using F.R. Palmer's nomenclature)
| BE
am | 1st person, present
is | -s form (3rd person, present)
are | present
was | 1st person, past
were | past
be | infinitive
been | past participle
being | -ing form
| HAVE
have | simple
has | -s form
had | past
having | -ing form
| DO
do | simple
does | -s form
did | past
doing | -ing form
| The forms below are, I believe, best omitted, because of the significant
| homonym forms:
| He made a WILL
| old tin CAN
| merry month of MAY
| a smell of MUST
| fight the good fight with all thy MIGHT
| would, could, should, ought might however be included
| | AUXILIARIES
| | WILL
|will
would
| | SHALL
|shall
should
| | CAN
|can
could
| | MAY
|may
|might
| | MUST
|must
| | OUGHT
ought
| COMPOUND FORMS, increasingly encountered nowadays in 'formal' writing
| pronoun + verb
i'm
you're
he's
she's
it's
we're
they're
i've
you've
we've
they've
i'd
you'd
he'd
she'd
we'd
they'd
i'll
you'll
he'll
she'll
we'll
they'll
| verb + negation
isn't
aren't
wasn't
weren't
hasn't
haven't
hadn't
doesn't
don't
didn't
| auxiliary + negation
won't
wouldn't
shan't
shouldn't
can't
cannot
couldn't
mustn't
| miscellaneous forms
let's
that's
who's
what's
here's
there's
when's
where's
why's
how's
| rarer forms
| daren't needn't
| doubtful forms
| oughtn't mightn't
| ARTICLES
a
an
the
| THE REST (Overlap among prepositions, conjunctions, adverbs etc is so
| high, that classification is pointless.)
and
but
if
or
because
as
until
while
of
at
by
for
with
about
against
between
into
through
during
before
after
above
below
to
from
up
down
in
out
on
off
over
under
again
further
then
once
here
there
when
where
why
how
all
any
both
each
few
more
most
other
some
such
as
no
nor
not
only
own
same
so
than
too
very
| the following words are among the commonest in English
one
every
least
less
many
now
ever
never
say
says
said
also
get
go
goes
just
made
make
put
see
seen
whether
like
well
back
even
still
way
take
since
nother
however
two
three
four
five
first
second
new
old
high
long
| extra words from En523
secondly
consider
whoever
edu
causes
seemed
whose
certainly
th
sorry
sent
far
cause
hereafter
try
likely
appear
brief
sup
respectively
let
others
alone
along
allows
howbeit
usually
que
changes
thats
hither
via
useful
merely
viz
everybody
use
contains
next
therefore
taken
thru
tell
knows
becomes
hereby
everywhere
particular
known
must
none
oh
anywhere
nine
can
following
example
indicated
indicates
something
want
needs
rather
six
instead
okay
tried
may
different
tries
third
whenever
maybe
appreciate
specifying
allow
keeps
looking
help
indeed
mainly
soon
course
looks
thank
thence
selves
inward
actually
better
willing
thanx
might
non
someone
somebody
thereby
several
name
always
reasonably
whither
went
mean
everyone
eg
ex
et
beyond
really
furthermore
rd
re
seriously
got
forth
thereupon
given
quite
whereupon
besides
ask
anyhow
hereupon
keep
ltd
hence
onto
think
already
seeming
thereafter
awfully
done
another
little
accordingly
anyone
indicate
gives
mostly
exactly
took
immediate
regards
somewhat
believe
specify
unfortunately
gotten
zero
toward
beforehand
unlikely
need
seem
saw
clearly
relatively
thoroughly
self
able
aside
thorough
towards
unless
though
eight
nothing
sub
don
especially
noone
sometimes
definitely
normally
came
saying
particularly
anyway
fifth
outside
going
meanwhile
overall
truly
ones
nearly
despite
regarding
qv
twice
contain
thanks
ignored
namely
anyways
best
wonder
away
currently
please
behind
various
hopefully
probably
neither
across
available
come
last
whereafter
according
somewhere
became
whole
comes
otherwise
among
presumably
co
afterwards
whatever
moreover
throughout
considering
sensible
described
much
hardly
wants
corresponding
latterly
concerning
else
former
novel
look
value
will
near
theres
seven
ve
almost
wherever
thus
herein
cant
vs
ie
containing
etc
perhaps
insofar
nobody
wherein
beside
gets
used
upon
uses
kept
whereby
nevertheless
com
anybody
obviously
without
latter
lest
downwards
liked
greetings
followed
yes
yet
unto
seems
except
around
possible
know
using
apart
necessary
follows
either
become
therein
right
often
somehow
sure
specified
happens
shall
per
everything
asking
provides
tends
nowhere
although
entirely
ok
anything
getting
whence
plus
consequently
seeing
formerly
within
appropriate
inasmuch
inner
elsewhere
enough
becoming
amongst
hi
trying
wish
us
placed
un
gone
later
associated
certain
doesn
sometime
inc
uucp
whereas
nd
lately
regardless
welcome
together
serious
hello
| URL parts
http
https
www
t.co | twitter url shortener
| ccTLDs
af
al
dz
ad
ao
ag
ar
am
au
at
az
bs
bh
bd
bb
by
be
bz
bj
bt
bo
ba
bw
br
bn
bg
bf
bi
kh
cm
ca
cv
cf
td
cl
cn
co
km
cd
cg
cr
ci
hr
cu
cy
cz
dk
dj
dm
do
ec
eg
sv
gq
er
ee
et
fj
fi
fr
ga
gm
ge
de
gh
gr
gd
gt
gn
gw
gy
ht
hn
hu
is
in
id
ir
iq
ie
il
it
jm
jp
jo
kz
ke
ki
kp
kr
kw
kg
la
lv
lb
ls
lr
ly
li
lt
lu
mk
mg
mw
my
mv
ml
mt
mh
mr
mu
mx
fm
md
mc
mn
me
yu
ma
mz
mm
na
nr
np
nl
nz
ni
ne
ng
no
om
pk
pw
pa
pg
py
pe
ph
pl
pt
qa
ro
ru
su
rw
kn
lc
vc
ws
sm
st
sa
sn
rs
yu
sc
sl
sg
sk
si
sb
so
za
es
lk
sd
sr
sz
se
ch
sy
tj
tz
th
tl
tg
to
tt
tn
tr
tm
tv
ug
ua
ae
uk
us
uy
uz
vu
va
ve
vn
ye
zm
zw
ge
tw
az
nc.tr
md
so
ge
au
cx
cc
au
hm
nf
nc
pf
yt
gp
pm
wf
tf
pf
bv
ck
nu
tk
gg
im
je
ai
bm
io
| BCP47 languages (two-letter)
aa
ab
ae
af
ak
am
an
ar
as
av
ay
az
ba
be
bg
bh
bi
bm
bn
bo
br
bs
ca
ce
ch
co
cr
cs
cu
cv
cy
da
de
dv
dz
ee
el
en
eo
es
et
eu
fa
ff
fi
fj
fo
fr
fy
ga
gd
gl
gn
gu
gv
ha
he
hi
ho
hr
ht
hu
hy
hz
ia
id
ie
ig
ii
ik
in
io
is
it
iu
iw
ja
ji
jv
jw
ka
kg
ki
kj
kk
kl
km
kn
ko
kr
ks
ku
kv
kw
ky
la
lb
lg
li
ln
lo
lt
lu
lv