Punctuation #
Universal #
Definition from de UD website
Punctuation marks are non-alphabetical characters and character groups used in many languages to delimit linguistic units in printed text.
Punctuation is not taken to include logograms such as $, %, and §, which are instead tagged as SYM. (Hint: if it corresponds to a word that you pronounce, such as dollar or percent, it is SYM and not PUNCT.)
Spoken corpora contain symbols representing pauses, laughter and other sounds; we treat them as punctuation, too. In these cases it is even not required that all characters of the token are non-alphabetical. One can represent a pause using a special character such as #, or using some more descriptive coding such as [:pause].
Examples
- Period: .
- Comma: ,
- Parentheses: ()
french #
TODO
Overview #
Specific Pattern #
haitien #
TODO