Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike
Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Standard
Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. / Jongejan, Bart; Dalianis, Hercules.
Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. Vol. 1 Association for Computational Linguistics, 2009. p. 145-153.Research output: Chapter in Book/Report/Conference proceeding › Article in proceedings › Research › peer-review
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike
AU - Jongejan, Bart
AU - Dalianis, Hercules
N1 - Conference code: 47
PY - 2009
Y1 - 2009
N2 - We propose a method to automatically train lemmatization rules that handle prefix, infix and suffix changes to generate the lemma from the full form of a word. We explain how the lemmatization rules are created and how the lemmatizer works. We trained this lemmatizer on Danish, Dutch, English, German, Greek, Icelandic, Norwegian, Polish, Slovene andSwedish full form-lemma pairs respectively.We obtained significant improvements of 24 percent for Polish, 2.3 percent for Dutch, 1.5 percent for English, 1.2 percent for German and 1.0 percent for Swedish compared to plain suffix lemmatization using a suffix-only lemmatizer.Icelandic deteriorated with 1.9 percent. We also made an observation regarding the number of produced lemmatization rules as a function of the number of training pairs.
AB - We propose a method to automatically train lemmatization rules that handle prefix, infix and suffix changes to generate the lemma from the full form of a word. We explain how the lemmatization rules are created and how the lemmatizer works. We trained this lemmatizer on Danish, Dutch, English, German, Greek, Icelandic, Norwegian, Polish, Slovene andSwedish full form-lemma pairs respectively.We obtained significant improvements of 24 percent for Polish, 2.3 percent for Dutch, 1.5 percent for English, 1.2 percent for German and 1.0 percent for Swedish compared to plain suffix lemmatization using a suffix-only lemmatizer.Icelandic deteriorated with 1.9 percent. We also made an observation regarding the number of produced lemmatization rules as a function of the number of training pairs.
KW - Faculty of Humanities
KW - lemmatisering morfologi affiks
KW - lemmatization morphology affix
M3 - Article in proceedings
SN - 978-1-932432-61-9
VL - 1
SP - 145
EP - 153
BT - Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP
PB - Association for Computational Linguistics
Y2 - 2 August 2009 through 7 August 2009
ER -
ID: 14093025