SemEval

SemEval
Academics
Disciplines: Natural Language Processing
Computational Linguistics
Semantics
Umbrella
Organization:
ACL-SIGLEX
Workshop Overview
Founded: 1998 (Senseval)
Latest: SemEval-2010
ACL @ Uppsala, Sweden
Upcoming: SemEval-2012
*SEM @ Montreal, Canada
History
Senseval-1 1998 @ Sussex
Senseval-2 2001 @ Toulouse
Senseval-3 2004 @ Barcelona
SemEval-2007 2007 @ Prague
SemEval-2010 2010 @ Uppsala
SemEval-2012 2012 @ Montreal

SemEval (Semantic Evaluation) is an ongoing series of evaluations of computational semantic analysis systems; it evolved from the Senseval word sense evaluation series. The evaluations are intended to explore the nature of meaning in language. While meaning is intuitive to humans, transferring those intuitions to computational analysis has proved elusive.

This series of evaluations is providing a mechanism to characterize in more precise terms exactly what is necessary to compute in meaning. As such, the evaluations provide an emergent mechanism to identify the problems and solutions for computations with meaning. These exercises have evolved to articulate more of the dimensions that are involved in our use of language. They began with apparently simple attempts to identify word senses computationally. They have evolved to investigate the interrelationships among the elements in a sentence (e.g., semantic role labeling), relations between sentences (e.g., coreference), and the nature of what we are saying (semantic relations and sentiment analysis).

The purpose of the SemEval exercises and SENSEVAL is to evaluate semantic analysis systems. Semantic Analysis" refers to a formal analysis of meaning, and "computational" refer to approaches that in principle support effective implementation[1].

The first three evaluations, Senseval-1 through Senseval-3, were focused on word sense disambiguation, each time growing in the number of languages offered in the tasks and in the number of participating teams. Beginning with the fourth workshop, SemEval-2007 (SemEval-1), the nature of the tasks evolved to include semantic analysis tasks outside of word sense disambiguation[2].

Contents

History

Early evaluation of algorithms for word sense disambiguation

From the earliest days, assessing the quality of word sense disambiguation (WSD) algorithms had been primarily a matter of intrinsic evaluation, and “almost no attempts had been made to evaluate embedded WSD components”[3]. Only very recently (2006) had extrinsic evaluations begun to provide some evidence for the value of WSD in end-user applications[4]. Until 1990 or so, discussions of the sense disambiguation task focused mainly on illustrative examples rather than comprehensive evaluation. The early 1990s saw the beginnings of more systematic and rigorous intrinsic evaluations, including more formal experimentation on small sets of ambiguous words[5].

Senseval to SemEval

In April 1997, a workshop entitled Tagging with Lexical Semantics: Why, What, and How? was held in conjunction with the Conference on Applied Natural Language Processing[6]. At the time, there was a clear recognition that manually annotated corpora had revolutionized other areas of NLP, such as part-of-speech tagging and parsing, and that corpus-driven approaches had the potential to revolutionize automatic semantic analysis as well[7]. Kilgarriff recalled that there was “a high degree of consensus that the field needed evaluation,” and several practical proposals by Resnik and Yarowsky kicked off a discussion that led to the creation of the Senseval evaluation exercises.[8]

List of Senseval and SemEval Workshops

SemEval Workshop framework

The framework of the SemEval/Senseval evaluation workshops emulates Message Understanding Conferences (MUCs) and other evaluation workshops ran by ARPA (Advanced Research Projects Agency, renamed the Defense Advanced Research Projects Agency (DARPA)).

Stages of SemEval/Senseval evaluation workshops[10]

  1. Firstly, all likely participants were invited to express their interest and participate in the exercise design.
  2. A timetable towards a final workshop was worked out.
  3. A plan for selecting evaluation materials was agreed.
  4. 'Gold standards' for the individual tasks were acquired, often human annotators were considered as a gold standard to measure precision and recall scores of computer systems. These 'gold standards' are what the computational systems strive towards. (In WSD tasks, human annotators were set on the task of generating a set of correct WSD answers(i.e. the correct sense for a given word in a given context)
  5. The gold standard materials, without answers, were released to participants, who then had a short time to run their programs over them and return their sets of answers to the organizers.
  6. The organizers then scored the answers and the scores were announced and discussed at a workshop

Semantic evaluation tasks

Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that looked into wider areas of semantics, such as Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms) and Senseval-3 explored performances of semantics analysis on Machine Translations.

As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.

Overview of Issues in Semantic Analysis

The SemEval exercises provide a mechanism for examining issues in semantic analysis of texts. The topics of interest fall short of the logical rigor that is found in formal computational semantics, attempting to identify and characterize the kinds of issues relevant to human understanding of language. The primary goal is to replicate human processing by means of computer systems. The tasks (shown below) are developed by individuals and groups to deal with identifiable issues, as they take on some concrete form.

The first major area in semantic analysis is the identification of the intended meaning at the word level (taken to include idiomatic expressions). This is word-sense disambiguation (a concept that is evolving away from the notion that words have discrete senses, but rather are characterized by the ways in which they are used, i.e., their contexts). The tasks in this area include lexical sample and all-word disambiguation, multi- and cross-lingual disambiguation, and lexical substitution. Given the difficulties of identifying word senses, other tasks relevant to this topic include word-sense induction, subcategorization acquisition, and evaluation of lexical resources.

The second major area in semantic analysis is the understanding of how different sentence and textual elements fit together. Tasks in this area include semantic role labeling, semantic relation analysis, and coreference resolution. Other tasks in this area look at more specialized issues of semantic analysis, such as temporal information processing, metonymy resolution, and sentiment analysis. The tasks in this area have many potential applications, such as information extraction, question answering, document summarization, machine translation, construction of thesauri and semantic networks, language modeling, paraphrasing, and recognizing textual entailment. In each of these potential applications, the contribution of the types of semantic analysis constitutes the most outstanding research issue.

Senseval and SemEval tasks overview

Senseval-1 & Senseval-2 focused on evaluation WSD systems on major languages that were available corpus and computerized dictionary. Senseval-3 looked beyond the lexemes and started to evaluate systems that looked into wider areas of semantics, viz. Semantic Roles (technically known as Theta roles in formal semantics), Logic Form Transformation (commonly semantics of phrases, clauses or sentences were represented in first-order logic forms) and Senseval-3 explored performances of semantics analysis on Machine Translations.

As the types of different computational semantic systems grew beyond the coverage of WSD, Senseval evolved into SemEval, where more aspects of computational semantic systems were evaluated. The tables below (1) reflects the workshop growth from Senseval to SemEval and (2) gives an overview of which area of computational semantics was evaluated throughout the Senseval/SemEval workshops.

Workshop No. of Tasks Areas of study Languages of Data Evaluated
Senseval-1 3 Word Sense Disambiguation (WSD) - Lexical Sample WSD tasks English, French, Italian
Senseval-2 12 Word Sense Disambiguation (WSD) - Lexical Sample, All Words, Translation WSD tasks Czech, Dutch, English, Estonian, Basque, Chinese, Danish, English, Italian, Japanese, Korean, Spanish, Swedish
Senseval-3 16 (including 2 cancelled tasks) Logic Form Transformation, Machine Translation (MT) Evaluation, Semantic Role Labelling, WSD Basque, Catalan, Chinese, English, Italian, Romanian, Spanish
SemEval-2007 19 (including 1 cancelled task) Cross-lingual, Frame Extraction, Information Extraction, Lexical Substitution, Lexical Sample, Metonymy, Semantic Annotation, Semantic Relations, Semantic Role Labelling, Sentiment Analysis, Time Expression, WSD Arabic, Catalan, Chinese, English, Spanish, Turkish
SemEval-2010 18 (including 1 cancelled task) Coreference, Cross-lingual, Ellipsis, Information Extraction, Lexical Substitution, Metonymy, Noun Compounds, Parsing, Semantic Relations, Semantic Role Labeling, Sentiment Analysis, Textual Entailment, Time Expressions, WSD Catalan, Chinese, Dutch, English, French, German, Italian, Japanese, Spanish

Areas of evaluation

The major tasks in semantic evaluation include the following areas of natural language processing. This list is expected to grow as the field progresses[11]. The following table shows the areas of studies that were involved in Senseval-1 through SemEval-2010:

Areas of Study Senseval-1 Senseval-2 Senseval-3 SemEval-2007 SemEval-2010
Coreference Resolution
Multi-lingual or Cross-lingual Lexical Substitution
Ellipsis
Keyphrase Extraction (Information Extraction)
Metonymy (Information Extraction)
Noun Compounds (Information Extraction)
Semantic Relation Identification
Semantic Role Labeling
Sentimental Analysis
Time Expression
Textual Entailment
Word sense disambiguation (Lexical Sample)
Word sense disambiguation (All-Words)
Word sense induction

See also

References

  1. ^ Blackburn, P., and Bos, J. (2005), Representation and Inference for Natural Language: A First Course in Computational Semantics, CSLI Publications. ISBN 1-57586-496-7.
  2. ^ Navigli, R. Word Sense Disambiguation: a Survey. ACM Computing Surveys, 41(2), ACM Press, 2009, pp. 1-69.
  3. ^ Palmer, M., Ng, H.T., & Hoa, T.D. (2006), Evaluation of WSD systems, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications, Text, Speech and Language Technology, vol. 33. Amsterdam: Springer, 75–106.
  4. ^ Resnik, P. (2006), WSD in NLP applications, in Eneko Agirre & Phil Edmonds (eds.), Word Sense Disambiguation: Algorithms and Applications. Dordrecht: Springer, 299–338.
  5. ^ Yarowsky, D. (1992), Word-sense disambiguation using statistical models of Roget’s categories trained on large corpora. Proceedings of the 14th Conference on Computational Linguistics, 454–60. http://dx.doi.org/10.3115/992133.992140
  6. ^ Palmer, M., & Light, M. (1999), Tagging with Lexical Semantics: Why, What, and How?| ACL SIGLEX workshop on tagging text with lexical semantics: what, why, and how? Natural Language Engineering 5(2): i–iv.
  7. ^ Ng, H.T. (1997), Getting serious about word sense disambiguation. Proceedings of the ACL SIGLEX Workshop on Tagging Text with Lexical Semantics: Why, What, and How? 1–7.
  8. ^ Philip Resnik and Jimmy Lin (2010). Evaluation of NLP Systems. In Alexander Clark, Chris Fox, and Shalom Lappin, editors. The Handbook of Computational Linguistics and Natural Language Processing. Wiley-Blackwellis. 11:271
  9. ^ Language Resources and Evaluation Volume 43, Number 2
  10. ^ Kilgarriff, A. (1998). SENSEVAL: An Exercise in Evaluating Word Sense Disambiguation Programs. In Proc. LREC, Granada, May 1998. Pp 581--588
  11. ^ SemEval Portal (n.d.). In ACLwiki. Retrieved August 12, 2010 from http://aclweb.org/aclwiki/index.php?title=SemEval_Portal

External links