package.html :  » Natural-Language-Processing » MinorThird » edu » cmu » minorthird » Java Open Source

Java Open Source » Natural Language Processing » MinorThird 
MinorThird » edu » cmu » minorthird » package.html
  <body>
    Minorthird is a collection of methods for learning to extract
    entities and categorize text.

    <p>Some basic concepts: in Minorthird, a collection of documents
    are stored in a {@link edu.cmu.minorthird.text.TextBase}.
    Annotations about these documents are stored in a corresponding
    {@link edu.cmu.minorthird.text.TextLabels} object.  Each
    annotation asserts a category or property for a word, a document,
    or a subsequence of words (aka a {@link
    edu.cmu.minorthird.text.Span}).  TextLabels stored information
    from many sources: they might hold annotations produced by human
    labelers (perhaps using a GUI tool like the {@link
    edu.cmu.minorthird.text.gui.TextBaseEditor}) or, annotations
    produced by a hand-writted program, or annotations produced by a
    learned program.  Multiple TextLabels can annotate a single
    TextBase, if necessary.

    <p>More about the text manipulation and processing can
    be found in the Javadocs for the minorthird.text and
    minorthird.text.mixup packages.

    <p>Annotated TextBases can be stored in many ways, so a
    "repository" can be configured to hold a bunch of TextLabels and
    their associated TextBases.  TextLabels in the repository are
    loaded with the {@link edu.cmu.minorthird.text.FancyLoader}.
    TextLabels and TextBases can also be loaded directly with 
    the {@link edu.cmu.minorthird.text.TextBaseLoader} and the 
    {@link edu.cmu.minorthird.text.gui.TextBaseEditor}.
          
    <p>Moderately complex annotation programs can be implemented with
    {@link edu.cmu.minorthird.text.mixup.Mixup}, a special-purpose
    annotation language which is part of Minorthird.  Mixup can also
    be used to generate features for learning algorithms.  A sequence
    of Mixup commands can be combined in a {@link
    edu.cmu.minorthird.text.mixup.MixupProgram}. The {@link
    edu.cmu.minorthird.text.gui.MixupDebugger} is a gui tool for
    testing a MixupProgram.

    <p>Minorthird contains a number of methods for learning to extract
    Spans from a document, or learning to classify Spans.  Top-level
    programs for conducting learning experiments and training, testing
    and applying {@link edu.cmu.minorthird.text.Annotator}s can be found in
    the {@link edu.cmu.minorthird.ui} package.  (The {@link
    edu.cmu.minorthird.ui.Help} class is a main program that, when
    invoked, lists the relevant main methods.)

    <p>Under the hood, learning is performed using classes from inside
    the {@link edu.cmu.minorthird.classify} package.  A {@link
    edu.cmu.minorthird.classify.ClassifierLearner} learns a {@link
    edu.cmu.minorthird.classify.Classifier} from a set of labeled
    {@link edu.cmu.minorthird.classify.Example}s, usually stored in a
    {@link edu.cmu.minorthird.classify.Dataset}.  Several sequential
    classification algorithms are also implemented in the package
    {@link edu.cmu.minorthird.classify.sequential}.  The classify
    package is independent of the {@link edu.cmu.minorthird.text}
    package, but linked to it by the routines in {@link
    edu.cmu.minorthird.text.learn}.  Most importantly, the {@link
    edu.cmu.minorthird.text.learn.SpanFE} package implements what is
    essentially a small feature extraction sub-language, embedded in
    Java, which makes it possible to easily generate a wide variety of
    features of a document, token, or Span.  This language is even
    more powerful because it can base features on annotations stored
    in {@link edu.cmu.minorthird.text.TextLabels} that are associated with
    the Span.

</body>
java2s.com  | Contact Us | Privacy Policy
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.