TreeTagger - a language independent part-of-speech tagger
Deutsche 
Version dieser Seite 
The TreeTagger is a tool for annotating text with part-of-speech and lemma 
information which has been developed within the TC project at the 
Institute for Computational Linguistics of the University of Stuttgart. The 
TreeTagger has been successfully used to tag German, English, French, Italian, 
Spanish, Bulgarian, Greek and old French texts and is easily adaptable to other 
languages if a lexicon and a manually tagged training corpus are available. 
Sample output: 
  
  
    | word | pos | lemma | 
  
    | The | DT | the | 
  
    | TreeTagger | NP | TreeTagger | 
  
    | is | VBZ | be | 
  
    | easy | JJ | easy | 
  
    | to | TO | to | 
  
    | use | VB | use | 
  
    | . | SENT | . | 
The tagger is described in the following two papers: 
Download
Executable code for Sparc workstations, Linux and Windows PCs 
and Macs as well as parameter files for English, German, Italian, Spanish, 
Bulgarian, French and old French can be downloaded via the links below. 
The French and the Italian parameter files are provided by Achim 
Stein. 
The English parameter file was trained on the PENN treebank and uses the English morphological database created 
by Karp, Schabes, Zaidel and Egedi. 
The Spanish parameter file was trained on the Spanish CRATER 
corpus and uses the Spanish lexicon of the CALLHOME corpus of the LDC. 
The Bulgarian parameter file was trained by Julien Nioche on the Bulgarian Treebank. It uses a UTF-8 
encoding. 
This software is freely available for research, education and evaluation. For 
commercial licenses and for licenses for C programming interface, please contact 
Helmut Schmid (at FirstName.LastName@ims.uni-stuttgart.de). 
Please read the license terms, 
before you download the software! By downloading the software, you agree to the 
terms stated there. 
The following steps are necessary to install the TreeTagger (see below for 
the Windows version): 
  - Download the tagger package for your system (Sparc-Solaris, 
  PC-Linux, 
  Mac 
  OS-X). 
  
- Download the tagging 
  scripts into the same directory. 
  
- Download the parameter files for your system (Sparc-Solaris, 
  PC, 
  Mac). 
  
- Download the installation script install-tagger.sh. 
  
- Open a terminal window and run the installation script in the directory 
  where you have downloaded the files: 
 sh install-tagger.sh
- Make a test, e.g. 
 echo 'Hello world!' | 
  cmd/tree-tagger-english
 or
 echo 'Das ist ein Test.' | 
  cmd/tagger-chunker-german
If you have difficulties with the 
installation, have a look at the installation 
hints (kindly provided by Joachim Wagner).Parameter files for Sparc-Solaris and Mac OS-X (Latin1 
character set) 
  - English 
  parameter file (3045 kByte, gzip compressed) 
  
- German 
  parameter file (7012 kByte, gzip compressed) 
  
- small 
  German parameter file (2415 kByte, gzip compressed) 
  
- French 
  parameter file (2375 kByte, gzip compressed) 
  
- Italian 
  parameter file (5484 kByte, gzip compressed) 
  
- Spanish 
  parameter file (918 kByte, gzip compressed) 
  
- Bulgarian 
  parameter file (603 kByte, gzip compressed) 
  
- German 
  chunker parameter file (52 kByte, gzip compressed) 
 Note: The German 
  tagger parameter file is needed, as well.
- English 
  chunker parameter file (82 kByte, gzip compressed) 
 Note: The English 
  tagger parameter file is needed, as well.
Parameter files for PC (Linux and Windows, Latin1 character 
set)
  - English 
  parameter file (2945 kByte, gzip compressed) 
  
- German 
  parameter file (6642 kByte, gzip compressed) 
  
- small 
  German parameter file (2340 kByte, gzip compressed) 
  
- French 
  parameter file (2336 kByte, gzip compressed, information 
  about this file) 
  
- Italian 
  parameter file (3238 kByte, gzip compressed, information 
  about this file) 
  
- Spanish 
  parameter file (899 kByte, gzip compressed) 
  
- Bulgarian 
  parameter file (579 kByte, gzip compressed) 
  
- German 
  chunker parameter file (52 kByte, gzip compressed) 
 Note: The German 
  tagger parameter file is needed, as well.
- English 
  chunker parameter file (82 kByte, gzip compressed) 
 Note: The English 
  tagger parameter file is needed, as well.
A Windows 
version of the TreeTagger is also available. The parameter files have to be 
downloaded separately.Tagsets 
Here is some information about the tagsets used in the parameter files: 
  - English (Penn-Treebank tagset) 
 The tagset used by the 
  TreeTagger is a refinement of this tagset where the second letter of the verb 
  part-of-speech tags distinguishes between "be" verbs (B), "have" verbs (H) and 
  other verbs (V).
- German 
  (in German) 
  
- French (in French) 
  
- Italian 
  
- Spanish 
  
- Bulgarian 
  
Links