Home

Dubsar

The Dubsar Dictionary Project is a suite of English dictionary applications using WordNet® 3.1. The website is built using Ruby on Rails®. The original website was launched as a personal project in late 2010. Since then, it sprouted a mobile website and native applications for Android and iOS. The current website is the sixth interface constructed as part of the Dubsar Dictionary Project. The new website is much more mobile friendly and supersedes the star-crossed m.dubsar-dictionary.com, RIP.

WordNet®

The WordNet® 3.1 data set includes only adjectives, adverbs, nouns and verbs. It does not include conjunctions, interjections, prepositions or pronouns. It includes many proper nouns and technical terms.

The WordNet® content governs the Dubsar feature set by limiting what can be shown. A user might expect a dictionary to include etymology and pronunciation information, for example, maybe also illustrations for some entries. And of course, she'll expect to find common auxiliary verbs like should and all parts of speech.

By contrast, WordNet® is aimed at automated text processing and identification of lexical and semantic textual information via salient terms (represented by the four major parts of speech WordNet® comprises). As a result, the content makes WordNet® (and Dubsar) at times more closely resemble a thesaurus than a dictionary.

Inflections

Dubsar provides one important class of information not present in WordNet®. While WordNet® does provide exceptional inflections for irregular words, it does not offer much help with regular inflections.

The original inflections table in the Dubsar database was constructed using a lengthy, yet incomplete, set of regular expressions and exceptions. Noun plurals were originally constructed using the ActiveSupport::Inflector from Ruby on Rails®. This resulted in plurals like blice for blouse. Many erroneous verb forms persisted for some time. A manual editorial process eventually weeded out persistent problems. There are certainly cases where a plural for, say, a pharmaceutical trade name, is dubious. But the current inflections table is free from egregious errors.

Other than irregular inflections provided by WordNet®, inflected forms are only listed for nouns and verbs, and only for those with no spaces, capitalization, numbers or punctuation. Inflected forms include any form of a word that would not usually be listed under its own head. For nouns, plurals are listed. For verbs, the third-person singular present form is listed as well as the present and past participles.

The database currently contains 222,704 inflection entries for 156,762 words. Note that each word, uninflected, is listed in the Inflections table. For example, the verb be has eight inflected forms:

am
are
be
been
being
is
was
were

When searching words, the inflections table is actually consulted instead of the words table. So you can match the verb be by searching for was, and if you search for thought, the first thing you'll see is the verb think.