Description
#Calchas requires a corpus of 100 documents to begin its analysis procedures effectively. The storage mechanism in
#Calchas is so thorough that the document files are
not required after entry into the system, even if the corpus size has not been
reached (and thus the documents have not been analysed yet).
By storing the document metadata,
#Calchas aims to cover the needs of data mining and
knowledge extension procedures. In effect,
#Calchas prides itself in providing the user with a
fast and effective method of storing only the most essential information of a
document. This information is then
used to draw useful conclusions about the document content and its relevance to
other documents and hence to facilitate retrieval and knowledge extension
features.
Another important reason for using document metadata for
further analysis purposes is the immense amelioration in result retrieval speed. In fact, the method used for document
indexing implies that the time-consuming task of document analysis and
summarisation only takes place once, during the document entry into the system. This means that any query on the
document’s contents will only need to search through the metadata that was
produced by the analysis mechanism.
Consequently, the quality of the query results relies heavily on the quality of
the analysis mechanism and this is where we mainly focused in the development
stage. To this effect,
#Calchas makes full use of the excellent capabilities
of SQL in terms of query speed and combines these with the versatility of a
programming language to create an efficient and usable environment for document
storage and retrieval.
#Calchas also includes an extremely useful feature, the “Relevance Network”. The Relevance Network is a
knowledge extension model that begins from a given document and branches out to
possibly relevant documents using a relevance algorithm that makes use of the
document metadata. The Relevance Network and the search module are clear
demonstrations of how the query result speed allows the implementation of
mechanisms that make better use of the corpus than existing full content search
engines. They show the immense
possibilities that #Calchas has, given the unique and
innovative algorithms it implements in its features