About StringNet 3.0 and StringNet Navigator

What is StringNet?

StringNet is an English lexico-grammatical knowledgebase consisting of multiword patterns of word behavior. These are represented by what we call hybrid n-grams and their relations to each other. Currently, StringNet contains about two billion hybrid n-grams extracted from the British National Corpus (BNC), each hybrid n-gram linked to all tokens attested in BNC. The design and motivation of (an earlier version of) StringNet are described in Wible and Tsao (2010).

What are Hybrid n-grams?

The multiword patterns that we call hybrid n-grams are sequences of grams which may consist of (1) specific word forms (e.g., ‘trying’ but not ‘tried’ or ‘tries’ or ‘try’); (2) lexemes (e.g., try, including its various forms—trying, tried, etc.); or (3) parts of speech (POSs), marked off in brackets, including specific POSs such as [V-ing] or more general POSs such as [verb], which cover different more specific POSs such as [V-ing]. An example hybrid n-gram is:


there be no point in [v-ing]

Important: Click on this hybrid n-gram anywhere and wait a bit. A pop-up shows all the forms attested in that slot. Try it above.

What is StringNet Navigator?

It is the user-interface for querying and navigating StringNet (http://nav.stringnet.org). It takes queries of one or more words submitted to its query box and provides a list of patterns in which the query word is conventionally used (or, in the case of multi-word queries, patterns in which the query words conventionally co-occur). For example, a query of ‘take’ yields: ‘take place [prep]’, ‘take part in’, ‘take advantage of’, and many others. Each hybrid n-gram listed in search results is accompanied by a variety of related links and information. And that is what makes StringNet a net.

The Navigable Links among Patterns that Make StringNet a Net (New)

The two figures below illustrate the links available between and among patterns that show up in the search results.


[Enlarge]


[Enlarge]


Each of the 2 billion patterns in StringNet is indexed (linked) to other related patterns by four basic types of relations.

1.Parents: more abstract versions of itself
2.Children: more specific versions of itself
3.Expand: to longer versions of itself
4.Contract: to shorter versions of itself

For example, for a query of the word ‘step’, the first pattern listed in the results is “step by step” and the second is this: “take the unprecedented step of [v-ing]”

Here are examples of patterns that are related in the four ways to the hybrid n-gram “take the unprecedented step of [v-ing]
    Some parents of it: “[verb] the unprecedented step of [v-ing]”
“take the [adj] step of [v-ing]”
    A child of it: “took the unprecedented step of [v-ing]”
    Contracted (shorter version): “take the step of [v-ing]”
    Expanded (longer version): “[noun] take the unprecedented step of [v-ing]”


Who are we?

David Wible is Distinguished Professor of Learning and Instruction at National Central University (NCU) in Taiwan and Dean of the College of Liberal Arts.
Email: wible@stringnet.org

Nai-Lung Tsao is a research assistant at the Graduate Institute of Learning and Instruction at National Central University in Taiwan.
Email: beaktsao@stringnet.org

We have developed StringNet as one of a suite of forthcoming tools for various aspects of second language vocabulary learning, teaching, and materials development. Our ongoing research and development has been supported by grants from Taiwan's National Science Council.


References

  1. David Wible and Nai-Lung Tsao. "StringNet as a Computational Resource for Discovering and Investigating Linguistic Constructions", The NAACL HLT Workshop on Extracting and Using Constructions in Computational Linguistics, LA, June 1-June6, 2010. [pdf]
  2. Nai-Lung Tsao and David Wible. "A Method for Unsupervised Broad-Coverage Lexical Error Detection and Correction", The NAACL HLT Workshop on Innovative Use of NLP for Building Educational Applications, Boulder, Colorado, May 31-June5, 2009. [pdf]


A Bit of History: All Versions of StringNet
  1. LexChecker (StringNet 1.0): November 1, 2009
  2. StringNet 2.0: February 1, 2011
  3. StringNet 3.0: May 16, 2012

Join our mailing list

Email: