Resources and Tools – LaRI Group

Resources

Lexica

PAROLE-SIMPLE-CLIPS
It is a four-level general purpose lexicon that has been elaborated over three different projects. The kernel of the morphological and syntactic lexicons was built in the framework of the European project “Preparatory Action for Linguistic Resources Organisation for Language Engineering” (LE-PAROLE). The linguistic model and the core of the semantic lexicon were elaborated within the European project “Semantic Information for Multifunctional Plurilingual Lexica” (LE-SIMPLE). The phonological level of the description and the extension of the lexical coverage were produced in the context of the Italian project “Corpora e Lessici dell’Italiano Parlato e Scritto” (CLIPS). It comprises a total of 387,267 phonetic units, 53,044 morphological units (53,044 lemmas), 37,406 syntactic units (28,111 lemmas) and 28,346 semantic units (19,216 lemmas). It was encoded at the semantic level, in full accordance with the international standards set out in the PAROLE-SIMPLE model and based on EAGLES. Syntactic and semantic encodings were performed jointly with Thamus (Consortium for Multilingual Documentary Engineering), which is responsible for 25,000 extra entries.
SIMPLE LOD
It is the RDF serialization of all nouns extracted from the PAROLE-SIMPLE-CLIPS lexicon. Lexical entries are serialized in Lemon, while semantic relations are modeled according to the SIMPLE OWL.
ItalWordNet LOD
– datahub: http://datahub.io/dataset/iwn
– ilc: http://www.languagelibrary.eu/owl/italWordNet15/schema/synset
GeodomainWordNet
– datahub: http://datahub.io/dataset/geodomainwn
– ilc per l’inglese: http://www.languagelibrary.eu/owl/geodomainWN/eng/geonames-synset
– ilc per l’italiano: http://www.languagelibrary.eu/owl/geodomainWN/ita/geonames-synset
GeoNames ontology concepts, with their English labels and glosses, in Italian have been transformed into a WordNet-like resource, and have been duly linked to the generic WordNets of both languages. This resource is published in RDF according to the W3C and Lemon schema.
Sentiment Lexicon LOD
https://github.com/opener-project/public-sentiment-lexicons/tree/master/propagation_lexicons/it (in LMF format)
The Italian Sentiment Lexicon was developed in a semi-automated way from ItalWordNet starting from a list of 1,000 manually checked seeds. It contains 24,293 lexical entries annotated with positive/negative/neutral polarity.

Domain Terminologies

FiscalDB
SindacDB
MARITERM

Ontologies

IMAG-Act
It is a cross-linguistic ontology of action. Using spoken corpora, 1,010 high-frequency action concepts have been identified and visually represented with prototypical scenes. The ontology allows the definition of cross-linguistic correspondences between verbs and actions in English, Italian, Chinese and Spanish. Thanks to the visual representation of the action concepts identified, IMAG-Act can be potentially extended to any language.

Tools

Lexical Databases

ItalWordNet (under maintenance)
È un sistema di interrogazione online a ItalWordNet (il WordNet italiano), una versione aggiornata della banca dati italiana di EuroWordNet. La banca dati ItalWordNet è stata prodotta nell’ambito del programma nazionale italiano denominato “SI-TAL”. Contiene un totale di 49.360 synset.

Knowledge Extraction Tools

PANACEA WebServices
They are services developed within the European project named “PANACEA” and hosted at ILC-CNR. They allow for the automatic construction of language resources and provide format converters, pos-taggers, dependency parsers, lexicon acquisition tools (MultiWord and subcategorization extractors, lexicon mergers). Tutorials for the use of these services and the composition of work-flows are available here.

(work in progress)