SENSEVAL 3 tasks

Senseval 3

March 2004

Evaluation exercises for Word Sense Disambiguation

Organized by ACL-SIGLEX

Tasks | Data | Schedule | Organization | Workshop

Senseval 3 Tasks

Tasks

01. English all words
02. Italian all words
03. Basque lexical sample
04. Catalan lexical sample
05. Chinese lexical sample
06. English lexical sample
07. Italian lexical sample
08. Romanian lexical sample
09. Spanish lexical sample
10. Automatic subcategorization acquisition
11. Multilingual lexical sample
12. WSD of WordNet glosses
13. Semantic Roles
14. Logic Forms
15. Swedish lexical sample
16. Semantic roles for Swedish

The figures next to each task refer to the number of teams who responded to the call for interest in participation. Senseval-3 is still open to all. The call for participation will come out in February 2004.

English all words [64 teams]

As we did for Senseval2, we will tag approximately 5000 words of coherent Penn Treebank text with WN 1.7.1 tags. We will tag all of the predicating words and the head words of their arguments, and as many adjectives and adverbs as we can. We will do double-blind tagging with adjudication.

Coordinator: Martha Palmer mpalmer@cis.upenn.edu

Italian all words[7 teams]

In addition to the lexical sample task, we propose an "all words" task for Italian. Each participant will be provided with a relatively small set extracted from the Italian Treebank, consisting of about 5000 words. The content words (nouns, verbs, and adjectives and a small set of proper nouns) will be semantically tagged according to the sense repository of ItalWordNet. Participants to the Italian All Words task can obtain "ItalWordNet for Senseval-3" from ELDA (Evaluations and Language resources Distribution Agency) by contacting Ms Valérie Mapelli at mapelli@elda.fr, who will inform you on the licensing and delivery procedure

Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)

Basque lexical sample[8 teams]

We propose a "Lexical-Sample" task for Basque in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*senses+7*multiwords) and a comparatively very large set of unlabelled examples (ten times more when possible) for around 40 words. The test set will be comprised with one third of 75+15*senses+7*multiwords. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Catalan, English, Italian, Romanian, Spanish) in order to share around 10 of the target words.

Coordinator: Eneko Agirre eneko@si.ehu.es

Catalan lexical sample[8 teams]

We propose a "Lexical-Sample" task for Catalan in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory, which is specially developed for the task, will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Basque, English, Italian, Romanian, Spanish) in order to share around 10 of the target words.

Coordinators:
Lluís Màrquez (lluism@lsi.upc.es)
M. Antonia Marti (amarti@ub.edu),
Mariona Taule (mtaule@uoc.edu)

Chinese lexical sample[16 teams]

The mainland Chinese lexical sample task will consist of three sets of data: dictionary, training data, and test data. The dictionary will contain entries for 20 different Chinese words. For each word, several senses will be defined based on HowNet knowledge base. For each sense, the dictionary entry will list: an id for the sense, a part of speech tag, a definition, and an English translation, as well as some additional information regarding the sense distinctions. Training data will consist of 20-100 examples per word, with more examples for words with larger number of senses. Two sets of training data will be provided: one with part of speech tagging information included, and one without. A part of speech tagging system will be also provided. Evaluation data will consist of about half the number of examples in the training data.

Coordinators:
PengYuan Liu, pyliu@mtlab.hit.edu.cn

English lexical sample[65 teams]

The goal of this task is to create a framework for the evaluation of systems that perform Word Sense Disambiguation. The data will be collected via the Open Mind Word Expert (OMWE) interface. To ensure reliability, we collect at least two tags per item, and conduct inter-tagger agreement and replicability tests. Previously performed evaluations have proved the high quality and usefulness of the OMWE data. By the time Senseval-3 will take place, we estimate to have enough data for about 60 ambiguous nouns, adjectives, and verbs. Part of the test data will be created by lexicographers from the Department of Linguistics at UNT. Another part of the test data will be extracted from the sense tagged corpus collected over the Web. We will use WordNet 1.7.1 as sense inventory for nouns and adjectives, and Wordsmyth for verbs. We will provide sense maps to enable both fine grained and coarse grained evaluations.

A mapping between Worsmyth and WordNet verb entries is now available, and it is included in the English lexical sample training/test data distribution.

Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Adam Kilgarriff, Adam.Kilgarriff@itri.brighton.ac.uk
Tim Chklovski, timc@mit.edu

Italian lexical sample[11 teams]

We propose a "Lexical-Sample" task for Italian in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unsupervised systems can also participate, of course. The sense inventory, called "Italian MultiWordNet for Senseval-3" has been specially developed for the task. This task will be coordinated with other lexical-sample tasks (Basque, English, Catalan, Romanian, Spanish) in order to share around 10 of the target words. Participants in the Italian Lexical Sample task can get "Italian MultiWordNet for Senseval-3" for free, contacting Alessandro Vallin (vallin@itc.it), who will send the license agreement form and the information to download the resource.

Coordinators:
Nicoletta Calzolari (ILC-CNR, Pisa, Italy - glottolo@ilc.cnr.it)
Bernardo Magnini (ITC-irst, Trento, Italy - magnini@itc.it)

Romanian lexical sample[8 teams]

A lexical task for Senseval-3 that addresses the Romanian language. We will select about 50 words, covering all open class parts of speech, with various degrees of ambiguity, and for each such word collect a set of examples from a large Romanian corpus. The number of examples per word will be determined using the 15n+10m+75 formula used during Senseval-1 and Senseval-2 (n = number of senses, m = number of multi-word expressions). The senses and multi-word expressions for each ambiguous word will be taken from the new Romanian WordNet, or DEX (a widely recognized dictionary of the Romanian language). The data will be collected via the Open Mind Word Expert (Romanian edition). A comparatively very large set of unlabelled examples (ten times more, when possible) will be also provided. This task will be coordinated with other lexical-sample tasks (Basque, Catalan, English, Italian, Spanish) in order to share around 10 of the target words.

Coordinators:
Rada Mihalcea, rada@cs.unt.edu
Vivi Nastase, vnastase@site.uottawa.ca
Dan Tufis, tufis@racai.ro
Tim Chklovski, timc@mit.edu

Spanish lexical sample[18 teams]

[webpage]

We propose a "Lexical-Sample" task for Spanish in order to evaluate supervised and semi-supervised learning systems for WSD. Each participant will be provided with a relatively small set of labelled examples (2 thirds of 75+15*#senses) and a comparatively very large set of unlabelled examples (ten times more, when possible) for around 45 words. The test set will be comprised with one third of 75+15*#senses. We target at two types of participants: supervised systems (not using unlabelled data) and semi-supervised systems (those taking profit from the unlabelled data), but unspervised systems can also participate, of course. The sense inventory, which is specially developed for the task, will be manually linked to WordNet 1.6 (automatic links to WordNet 1.7 will be also provided). This task will be coordinated with other lexical-sample tasks (Basque, Catalan, English, Italian, Romanian) in order to share around 10 of the target words.

Coordinators:
Lluís Màrquez (lluism@lsi.upc.es),
M. Antonia Marti (amarti@ub.edu),
Mariona Taule (mtaule@uoc.edu)

Automatic subcategorization acquisition[35 teams]

[ webpage]

This task involves evaluating word sense disambiguation (WSD) systems in the context of automatic subcategorization acquisition. Our task will restrict to a set of 30 verbs. These are "hard" verbs: high in frequency and with multiple senses. The participants will be given the list of verbs in advance to allow a training phase (no training data will be made available). We will provide the test corpus. This will contain around 1000 instances of each verb, which the participants will be expected to annotate with WordNet 1.7.1 senses. After receiving the sense annotated data, we will map the detected WordNet senses to our senses, which are based on broad Levin style verb classes. We will feed the sense annotated data from each system to Anna Korhonen's subcategorization acquisition software. The acquired frames will be evaluated against manually obtained gold standard frames, which will yield a ranking of the WSD systems.

Coordinators:
Judita Preiss (Judita.Preiss@cl.cam.ac.uk)
Anna Korhonen (Anna.Korhonen@cl.cam.ac.uk)

Multilingual lexical sample[23 teams]

The goal of this task is to create a framework for the evaluation of systems that perform Machine Translation, with a focus on the translation of ambiguous words. The task will be very similar to the lexical sample task, except that rather than using the sense inventory from a dictionary we will follow the suggestion of Resnik and Yarowsky and use the translations of the target words into a second language as the "inventory". The contexts will be in English, and the tags for the target words will be their translations in a second language. We plan to select words with various degrees of "interlingual-ambiguity", to create a complete picture of the various problems that may appear in this task. At the moment, we plan on two pairs of languages, English-French, and English-Hindi, with an estimated number of about 50 ambiguous words per language pair. The data will be collected via the Open Mind Word Expert (bilingual edition).

Coordinators:
Tim Chklovski, timc@mit.edu
Rada Mihalcea, rada@cs.unt.edu
Ted Pedersen, tpederse@d.umn.edu
Amruta Purandare, pura0010@d.umn.edu

Word-Sense Disambiguation of WordNet Glosses [36 teams]

[webpage]

Trial data: available from the task webpage

In connection with WordNet 2.0 (George Miller et al.) and eXtended WordNet (XWN, Dan Moldovan et al.), a large number of the WordNet glosses are being hand-tagged. Each content word (noun, verb, adjective, and adverb) is being labelled with their WordNet senses. This manual effort is time-consuming and energy intensive. The Senseval-3 task is to perform this tagging automatically using all hand-tagged glosses from XWN as the test set, with the hand-tagging also serving as the gold standard for evaluation. The task will be performed as an "all-words" task, except that no context will be provided. However, it is expected that participants will make use of additional WordNet information (synset, the WordNet hierarchy, and other WordNet relations) in their disambiguation. This task is intended to promote the exploitation of ordinary dictionary definitions in machine-readable dictionaries.

Coordinator: Ken Litkowski (ken@clres.com)

Automatic Labeling of Semantic Roles [36 teams]

[webpage]

Trial data: available from the task webpage

Word-sense disambiguation has frequently been criticized as a task in search of a reason. Heretofore, the focus of disambiguation has been on the sense inventory and has not examined the major reason why we would have lexical knowledge bases: how the meanings would be represented and thus, available for use in natural language processing applications. An important baseline study for automatic labelling of semantic roles (following the FrameNet paradigm) has recently appeared in the literature ("Automatic Labeling of Semantic Roles" by Daniel Gildea and Daniel Jurafsky). The FrameNet project has put together a body of hand-labeled data and this study has put together a set of suitable metrics for evaluating the performance of an automatic system. The proposed Senseval-3 task would call for the development of systems to meet the same objectives as the Gildea and Jurafsky study. The data for this task would be a sample of the FrameNet hand-annotated data. Evaluation of systems would follow the metrics of the Gildea and Jurafsky study.

Coordinator: Ken Litkowski (ken@clres.com)

Identification of Logic Forms in English[26 teams]

[webpage] [mailing list]

Trial data: available from the task webpage

Automated reasoning is one major goal of humankind, but lately only little attention has been paid to the task of automatically creating reliable logic forms. Natural language based representations are more powerful when predicates are disambiguated. This task is complementary to the mainstream task in Senseval The goal is to transform English sentences into a first order logic notation. A predicate corresponds to each content word, conjunctions and prepositions and arguments have syntactic values. Guidelines and examples of logic form will be provided to participants. The performance of the systems will be evaluated at sentence and predicate level, using precision and recall measures determined against the gold standard, which will consist of logic forms created by human annotators.

Coordinator: Vasile Rus (vasile@cs.iusb.edu)

Swedish lexical sample [4 teams] canceled?

A lexical sample task for Swedish, similar in spirit with the Swedish task organized for Senseval-2.

Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se

Identification of Semantic Roles in Swedish[2 teams] canceled?

Organize a task based on "semantic roles", using labels such as "Agent", "Recipient", "Material", "Phenomenon", "Location" etc. In order to do this type of semantic role annotation there is a requirement for syntactic tagged texts which we are willing to provide from our treebank for the task (thus potential participants will use a uniform syntactic annotation).

Coordinator: Dimitrios Kokkinakis, Dimitrios.Kokkinakis@svenska.gu.se

Site maintained by Rada Mihalcea, hosted by University of North Texas