Here are a few basic scripts and tools for Natural Language Processing.
Toy Examples and Scripts
- tf-idf: Example for converting documents into document vectors using tf-idf for weights
- Document Similarity: Example for computing the pairwise document similarity between documents in a corpus
Knowledge Representation Software
- NL pipeline: The complete system that takes as input a text, parses it, converts it to logical form, and then derives inferences from it. The user needs to provide axioms in axioms.txt for deriving inferences from the system.
- Lisp to XML: This software is a simple perl script that takes as input a lisp file and converts it to XML format. I have used this specifically to convert Charniak parse output into XML format. It requires the perl module called XML::TWIG to run.
- Parse tree binarizer: This software is a perl script that takes as input an XML version of a parse tree, and converts it into a binary version. It requires the perl module XML::TWIG. This software is based on the Penn Treebank set of tags. For other tags, some rewrites need to be made to map the tags to the Penn Treebank form.
Knowledge Interpretation Software
- Mini-TACITUS: This software is an implementation of an abductive engine based on the paper by Jerry Hobbs. This software has been used successfully in previous projects (MOVER, LbR, Mobius-I, Mobius-II). I am currently working on extending this software to cover a large scale of axioms.
Visualization Software
- Logical Form Visualization Tool: This software is a visualization tool for LFToolkit which was written by Nishit Rathod. It requires Graphviz and perl to be pre-installed on your system.
- Interpretation Visualization: This software helps in visualizing the best interpretation generated by TACITUS.
Miscellaneous Software
- Charniak Parser for Windows: Several DARPA sponsored projects require executables to run on Windows machines. This is a standalone windows version of the Charniak Parser compiled using Ming32 libraries and Cygwin.