Data compression approach to monolingual GIRT task: An agnostic point of view

Research output: Contribution to conferencePaper

Abstract

In this paper we present a data-compression oriented approach to the information retrieval task in the scientific collection of GIRT. For this purpose we use a recently proposed general scheme for context recognition and context classification of strings of characters (in particular texts) or other coded information. Based on data-compression techniques, the key point of the method is the computation of a suitable measure of remoteness of two strings of characters. This measure of remoteness only reflects the distance in information between the two strings, i.e. the difference between the syntactic/structural elements of the sequences. The question we address is whether the informatic measure of remoteness between two sequences could account for their semantic distance. We have focused in particular on the monolingual GIRT tasks for German and English and we present here the results. It is worth stressing the generality and versatility of our information-theoretic method. It applies, in fact, to any kind of corpora of character strings, independent of the type of coding behind them. For texts, it is then language independent since it prescinds from any linguistic knowledge.
Original languageEnglish
Publication statusPublished - 2003
Event2003 Cross Language Evaluation Forum Workshop, CLEF 2003, co-located with the 7th European Conference on Digital Libraries, ECDL 2003 - Trondheim, Norway
Duration: 1 Jan 2003 → …

Conference

Conference2003 Cross Language Evaluation Forum Workshop, CLEF 2003, co-located with the 7th European Conference on Digital Libraries, ECDL 2003
CountryNorway
CityTrondheim
Period1/1/03 → …

    Fingerprint

All Science Journal Classification (ASJC) codes

  • Computer Science(all)

Cite this

Alderuccio, D., Bordoni, L., & Loreto, V. (2003). Data compression approach to monolingual GIRT task: An agnostic point of view. Paper presented at 2003 Cross Language Evaluation Forum Workshop, CLEF 2003, co-located with the 7th European Conference on Digital Libraries, ECDL 2003, Trondheim, Norway.