Calchas
#Calchas is a database-driven software package
which prides itself in being a completely automated digital library, able to cover
the document organisation needs of a single user or of an entire library.
The desktop version of #Calchas is a .NET application that runs on Windows platforms.
It is a database application and thus requires a computer running SQL Server (either
the local computer or a remote server). It can also run with SQL Server Express,
even though this is not advisable due to the expected data size.
The task that #Calchas aims to undertake is the analysis and storage of document
metadata, mainly keywords and summarisation. It extracts the text from the given
file (currently PDF, TXT and DOC files are supported) and passes it on to its analyser.
The analyser calculates term frequencies and uses these frequencies to retrieve
a set of sentences that summarise the text adequately. Even though no particular
mechanism is used to link these sentences grammatically, the selection algorithm
makes sure that the sentences selected generally form a coherent sequence. Both
keywords and summary sentences are stored onto the database for later retrieval.
The power of #Calchas against other electronic libraries is its thoroughness. #Calchas
takes a document from a simple text stream and transforms it into a machine-comprehensible
collection of information, from which the system can infer useful information. #Calchas
aims to be the first system that performs keyword extraction, automatic summarisation,
document storage and knowledge extension, combining all these capabilities in one
package and encompassing a friendly graphical user interface.
A very powerful tool in #Calchas is its parameterisation feature. The user is able
to perform the analysis and summarisation tasks using a set of parameters that are
used for some of the subroutines. We have included a default set of parameters which
we have tested extensively and found it to be the one offering the best performance.
However, the user can create different parameter sets and use them at will for different
documents.This parameterisation feature is an extremely useful tool for the savvy
user since it practically means that the results of the keyword extraction and automatic
summarisation can be tweaked to fit the particular needs of each document or document
set.