High-Accuracy, Ultra High-Speed Text Extraction Software@
DocCat is a filter program for Solaris/Linux/FreeBSD which extracts text information from Windows document files, such as MS-Word etc, with high accuracy and ultra-high speed.
The program is text extraction software in executable form and can be used in combination with NAMAZU (a full-text search engine) to create a full-text search system on an intranet, or with a mail server to read out documents attached to mail messages received by cellular phones, etc. |