Software Tools for NLP

阅读更多

Software Archive

  • CMU Artificial Intelligence Repository
  • Resources Available Through CRL
  • SIL Computing Resources
  • Linguistics Tools at the University of Vaasa in Finland
  • Leeds University, Natural Language Processing Research Group: RESOURCES
  • ICOT Free Software
  • Netlib Repository (mirror in Japan)

 

General Information

  • Sourcebank - a search engine for programming resources.
  • Resources related to content analysis and text analysis - Software
  • Some publically available NLP packages
  • SAL (Scientific Applications on Linux)
      Artificial Intelligence
  • Public Domain Generic Tools: An Overview - a paper written by Tomaz Erjavec
  • A collection of online interactive CL tools (Computational Linguistics Group, University of Zurich)
  • The LINGUIST List: Software
  • The Natural Language Software Registry
  • Language Software Helpdesk
    • Frequently Asked Questions
  • PennTools - Computational Linguistics Resources At Penn.
  • Parsing Resources
  • Taggers online, email message containing addresses
  • Parsers and Taggers Information (by Steven Paul Abney)
  • Relator Language Processing Resources
  • Corpus Search Tools
  • Neural Networks & Statistics: Software

 

Tagger, Morphological Analyzer

  • A Perl/Tk text tagger
  • Conexor
  • Cogilex R&D inc - Makers of expert tools for natural language processing
  • CLAWS part-of-speech tagger
  • TnT - Statistical Part-of-Speech Tagging
  • POS tagger for Spanish
  • Tagging and Parsing tools
  • AUTASYS - A Fully Automatic English Wordclass Analysis System
  • TOSCA/LOB tagger
  • Relaxation Labelling Based Multi-Tagger
  • The QTAG Part of Speech Tagger
  • QTAG: A portable Parts of Speech Tagger
  • The Alvey Natural Language Tools
  • The XTAG Project
  • TreeTagger - a language independent part-of-speech tagger
  • Xerox Part-of-Speech Tagger
  • The Edinburgh/Cambridge Morphological Analyser System
  • Winbrill - An adaptation of Brill’s tagger to Windows 95/98.
  • Eric Brill’s Part of Speech Tagger
  • Software Plaza: Brill’s Tagger
  • Morphy - An integrated tool for German morphology and statistical part-of-speech tagging.
  • Korean Morphological Analyzer
  • Natural Language Tools - Japanese morphological analyzer (JUMAN) and parser (KNP) developed by Nagao Lab. at Kyoto University, Japan.
  • WordSmith Tools - Wordsmith Tools is the Swiss Army knife of lexical analysis - an integrated suite of programs for looking at how words behave in texts. It is intended for linguists, language teachers, and anyone who needs to examine language.
    • Mike Scott’s Home Page
    • Oxford University Press
  • A Lexical Analyzer for HTML and Basic SGML
  • ARIES Natural Language Tools - Lexical platform for the Spanish language.

 

Stemmer

  • Porter stemmer
  • Porter stemmer
  • Dutch Porter stemmer
  • IRIS stemmer
  • Iterated Lovins stemmer

 

Collocation

  • Xtract - Frank Smadja’s Collocation Extractor.

 

Parser

  • Malaga - a system for automatic language analysis
  • Attribute-Logic Engine (ALE) System and Grammars - A freeware logic programming and grammar parsing system.
  • CG Parser - Natural deduction categorial grammar and lambda-calculus parser.
  • Head-Corner Parser (by Gertjan van Noord)
  • A basic parser written to illustrate the bottom up parsing algorithms in Natural Language Understanding, Second Edition
  • Cass Partial Parser
  • CHILL: An empirical parser acquisition system using inductive logic programming
  • ISSCO Tools - Left-head-corner Island Parser Compiler, etc.
  • Georgetown University Natural Language Processing
    Parser Modularity Demo page
  • PC-PATR: A syntactic parser
  • IMS Stuttgart: The CUF Web Page - Comprehensive Unification Formalism
  • Apple Pie Parser - The Apple Pie Parser is a bottom-up probabilistic chart parser which finds the parse tree with the best score by best-first search algorithm.
  • Link Grammar Parser

 

Corpus Tools

  • WebCorp
  • Concordances: Producing and Using them
  • XCES: Corpus Encoding Standard for XML
  • RST Tool - An RST (Rhetorical Structure Theory) Markup Tool.
  • RST Annotation Tool
  • Qwick - corpus browser
  • Linguistic Annotation - This page describes tools and formats for creating and managing linguistic annotations.
  • Alembic Workbench - a suite of tools for the analysis of a corpus, along with the Alembic system to enable the automatic acquisition of domain-specific tagging heuristics.
  • The System Quirk - Workbench for Terminology, Lexicography and Text Analysis.
  • Multext: Multilingual Text Tools and Corpora
  • XCorpus - An Environment for Managing Corpus and Multilingual Web Server
  • The IMS Corpus Toolbox Webpage
    X
  • Kobe Phoenix Laboratory - Corpus Wizard program.
  • Concordance - A program for Windows NT 4.0 and Windows 95/98 which makes wordlists, concordances, and Web Concordances from your electronic texts.
  • MonoConc (concordance program)
  • MonoConc for Windows (concordance program)
  • Text Analysis Computing Tools (TACT)
  • The Lingua Project: The World of MultiLingual Parallel Concordancing
    (http://prune.loria.fr/~bonhomme/lingua/)
    - Sentences alignment tool in multilingual corpora.
  • The Lingua Project: The World of MultiLingual Parallel Concordancing
    (http://www.loria.fr/exterieur/equipe/dialogue/lingua/)
  • Textual Corpora and Tools for their Exploration

 

Language Modeling

  • Maximum Entropy Modeling
  • Maximum Entropy Modeling Toolkit
  • CMU-Cambridge Statistical Language Modeling Toolkit
  • CMU Statistical Language Modeling Toolkit by Roni Rosenfeld
    • Program
    • Document
  • Trigger Toolkit
  • Simple Good-Turing Smoothing
  • Smoothing tools software by Joshua Goodman and Stanley Chen
  • Language modeling tools
  • Statistical Decision Trees

 

HMM

  • A HMM mini-toolkit (by Anand Venkataraman)
  • HMM Software
    see also: Exercise: Using a Hidden Markov Model
  • Discrete HMM Toolkit
  • Hidden Markov Model (HMM) Toolbox
  • Meta-MEME: Motif-based Hidden Markov Models of Biological Sequences

 

Language Identification

  • Ted E. Dunning’s program
  • Gertjan van Noord’s program
  • Doug Beeferman’s program

 

FSA Tools

  • Finite State Utilities
  • Automata Learning from Theory to Practice
    • Downloadable Software
  • Index to finite-state machine software, products, and projects
  • FSA utilities
    • FSA Utilities: A Toolbox to Manipulate Finite-state Automata
  • Grail - a symbolic computation environment for finite-state machines, regular expressions, and other formal language theory objects.
  • AMoRE - A program for the computation of Automata, Monoids, and Regular Expressions.

 

Speech

  • HTK: Hidden Markov Model Toolkit
  • CSLU Toolkit
  • The Epos Speech Synthesis System
  • ISIP public domain speech to text system
    • The ISIP Automatic Speech Recognition Toolkit
  • CSLU Toolkit (Center for Spoken Language Understanding, Oregon Graduate Institute of Science and Technology)
  • Computer generation of accent marks
  • Spoken Natural Language Processing Group Software
  • CMU Error Analysis Toolkit
  • Audio Tools
  • VOICEBOX: Speech Processing Toolbox for MATLAB

 

Mathematical Software

  • NIST Guide to Available Mathematical Software

 

Statistics

  • Bayesian inference Using Gibbs Sampling
  • CoCo - A statistics package for analysis of associations between discrete variables.

 

Machine Learning

  • Machine Learning Toolbox (MLT)
  • The Machine Learning Programs Repository
  • The RIPPER rule learner
  • mFOIL - An ILP systems designed to handle noisy examples.

 

Support Vector Machine

  • SVMLight
  • SVM package by William Noble Grundy
  • Kernel Machines Web Site

 

Information Retrieval & Filtering

  • seft - a Search Engine For Text
  • MG - Managing Gigabytes
  • Isearch - software for indexing and searching text documents.
  • SMART Software and test collections (Cornell University)
    • see also SMART links
  • Doug Oard’s Research Software Page - SMART Modifications
  • Bow: A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering
  • ifile - A general mail filtering system.
  • IR-STAT-PAK - A program to compute descriptive and analytic statistics for the TREC IR trials.
  • Yavi - A visual interface to textual information.
  • Labeled data sets for information extraction

 

String/Pattern Matching

  • Online Approximate String Matching
  • Strmat package (exact string matching and suffix trees)

 

Sentence Boundary Detector

  • SATZ: An Adaptive Sentence Boundary Detector
  • Adwait Ratnaparkhi’s MXTERMINATOR

 

Clustering/Classification

  • FCLUSTER - A tool for fuzzy cluster analysis
  • LNKnet Pattern Classification Software
  • Principal Direction Divisive Partitioning
  • k-means clustering

 

WWW

  • w3mir - HTTP copying and mirroring tool.
  • HTTrack - The Web mirror utility.
  • HTML Conversion, Shareware and Freeware

 

Other Tools

  • German Morphology Browser (online service)
  • ‘mat2D’ Matrix/Vector Library in C
  • Content Analysis Resources - for quantitative analyses of texts, transcripts, and images.
  • SNoW learning program
  • The ?-TBL Homepage - Logic Programming Tools for Transformation-Based Learning
  • ROOT: An Object-Oriented Data Analysis Framework
  • CAQDAS Networking Project - Computer Assisted Qualitative Data Analysis Software
  • Suffix sort
  • Nb - a graphical user interface for annotating the discourse structure of spoken dialogue, monologue, and text.
  • GATE - General Architecture for Text Engeneering.
  • TiMBL: Tilburg Memory Based Learner
  • MtRecode - The Multext character translation program
  • Evalb - A bracket scoring program. It reports precision, recall, non crossing and tagging accuracy for given data.
  • The OC1 decision tree software system
  • IND Version 2.0 - creation and manipulation of decision trees from data
  • Paai’s text utilities
  • Shoebox 3.0 for Windows and Macintosh - A database program oriented to the needs of a field linguist’s dictionary.
  • Teaching materials for statistical NLP by Chris Brew, Language Technology Group, Human Communication Research Centre, University of Edinburgh
  • Introducing environmentalism and post-fordism into NLP (NeuroTran)
  • Tools for Estonian Language
  • Dan Melamed’s Page - Simulated Annealing Program, XTAG morpholyzer post-processors for English Stemming, Good-Turing Smoothing Software, 150 miscellaneous text processing tools, 75 text statistics and bitext geometry tools.
  • TOOLDIAG: Pattern recognition toolbox
  • The DN2 Home Page - DN2 is an intelligent self-relating free format database system which accepts data in human text format, and retrieves it in response to human requests, like Where is London?
  • Software Announcements
  • Tools for drawing and graphically editing trees
  • Paul Nation’s vocabulary programs
  • syllable prediction code (a simple lisp function)
  • Pratt - a pattern discovery tool
  • XGobi - A system for multivariate data visualization.
  • NODElib - Neural Optimization Development Engine library

 

Related Posts
  • Natural language processing
    Natural language processing Natural language processing (NLP) is a subfield of artificial intelligen...
  • THE MEANING AND FUTURE OF THE SEMANTIC WEB
    LIFEBOAT FOUNDATION SPECIAL REPORT MINDING THE PLANET: THE MEANING AND FUTURE OF THE SEMANTIC WEB...
  • A Web That Thinks Like You
    A Web That Thinks Like You "Semantic Web" software from startup Radar Networks could help transform...

Last 5 posts by admin

  • 语义网: Web 3.0为何盖不过 Web 2.0 的风头 - August 10th, 2007
  • 语义网-中国传媒科技 - August 6th, 2007
  • Web3.0即将粉墨登场 语义网让网络更聪明 - August 6th, 2007
  • Natural language processing - July 31st, 2007
  • NLP常用信息资源 - July 31st, 2007

你可能感兴趣的:(BREW,Web,Matlab,Windows,Apple)