Welcome to Syntactic Universal Dependencies Guidelines (SUD) #
About Syntactic Universal Dependenciers (SUD) #
SUD is an annotation scheme for syntactic dependency treebanks, and has a nearly perfect degree of two-way convertibility with the Universal Dependencies scheme (UD). Contrary to UD, it is based on syntactic criteria (favoring functional heads) and the relations are defined on distributional and functional bases. This web site centralizes the information necessary to understand the annotation in SUD and to annotate from sratch. You can found the guidelines here
An Example of SUD annotation #
To see more examples, head over to the universal SUD guidelines.
Data #
Descriptions #
In version 2.11 of SUD data, released in November 2022:
- 7 corpora are built in the SUD format (called Native SUD)
- 234 corpora are automatically converted to SUD from the corresponding UD data (version 2.11)
The full release SUD 2.11 contains 241 corpora. Note that UD 2.11 has 243 corpora but two corpora cannot be released in the SUD version, because of their CC license which contain the ND (NoDerivative) flags:
- UD_Japanese-Modern →
License: CC BY-NC-ND 3.0
- UD_Portuguese-CINTIL →
License: CC BY-NC-ND 4.0
Download all corpora #
Download the full set of 241 SUD corpora: sud-treebanks-v2.11.tgz.
Native SUD corpora #
In the table below, the 7 native SUD corpora are given. Note that each corresponding UD version is obtained by automatic conversion.
Corpus | Files | Grew-match |
---|---|---|
SUD_Beja-NSC | 2.11 – latest | 2.11 – latest |
🆕
SUD_Chinese-PatentChar | 2.11 – latest | 2.11 – latest |
SUD_French-GSD | 2.11 – latest | 2.11 – latest |
SUD_French-ParisStories | 2.11 – latest | 2.11 – latest |
SUD_French-Rhapsodie | 2.11 – latest | 2.11 – latest |
SUD_Naija-NSC | 2.11 – latest | 2.11 – latest |
🆕
SUD_Zaar-Autogramm | 2.11 – latest | 2.11 – latest |
Conversion from UD #
- 234 corpora of SUD 2.11 are converted from UD. The version of the data and tools used:
- input data: version 2.11 of UD corpora
- Grew conversion rules system:
tag
v2.11
of the conversion system - tools: Grew version 1.10.0, libcaml-grew version 1.10.0 and libcaml-conll version 1.13.2
Access to each corpus #
In the table below, for each corpus you can access to the Grew-match query system.
Corpus | Grew-match |
---|---|
🆕 Abaza-ATB | [ Query] [ Relations] |
Afrikaans-AfriBooms | [ Query] [ Relations] |
Akkadian-PISANDUB | [ Query] [ Relations] |
Akkadian-RIAO | [ Query] [ Relations] |
Akuntsu-TuDeT | [ Query] [ Relations] |
Albanian-TSA | [ Query] [ Relations] |
Amharic-ATT | [ Query] [ Relations] |
Ancient_Greek-Perseus | [ Query] [ Relations] |
Ancient_Greek-PROIEL | [ Query] [ Relations] |
Ancient_Hebrew-PTNK | [ Query] [ Relations] |
Apurina-UFPA | [ Query] [ Relations] |
Arabic-NYUAD | [ Query] [ Relations] |
Arabic-PADT | [ Query] [ Relations] |
Arabic-PUD | [ Query] [ Relations] |
Armenian-ArmTDP | [ Query] [ Relations] |
Armenian-BSUT | [ Query] [ Relations] |
Assyrian-AS | [ Query] [ Relations] |
Bambara-CRB | [ Query] [ Relations] |
Basque-BDT | [ Query] [ Relations] |
Beja-NSC (Native) | [ Query] [ Relations] |
Belarusian-HSE | [ Query] [ Relations] |
Bengali-BRU | [ Query] [ Relations] |
Bhojpuri-BHTB | [ Query] [ Relations] |
Breton-KEB | [ Query] [ Relations] |
Bulgarian-BTB | [ Query] [ Relations] |
Buryat-BDT | [ Query] [ Relations] |
Cantonese-HK | [ Query] [ Relations] |
Catalan-AnCora | [ Query] [ Relations] |
Cebuano-GJA | [ Query] [ Relations] |
Chinese-CFL | [ Query] [ Relations] |
Chinese-GSD | [ Query] [ Relations] |
Chinese-GSDSimp | [ Query] [ Relations] |
Chinese-HK | [ Query] [ Relations] |
🆕 Chinese-PatentChar | [ Query] [ Relations] |
Chinese-PUD | [ Query] [ Relations] |
Chukchi-HSE | [ Query] [ Relations] |
Classical_Chinese-Kyoto | [ Query] [ Relations] |
Coptic-Scriptorium | [ Query] [ Relations] |
Croatian-SET | [ Query] [ Relations] |
Czech-CAC | [ Query] [ Relations] |
Czech-CLTT | [ Query] [ Relations] |
Czech-FicTree | [ Query] [ Relations] |
Czech-PDT | [ Query] [ Relations] |
Czech-PUD | [ Query] [ Relations] |
Danish-DDT | [ Query] [ Relations] |
Dutch-Alpino | [ Query] [ Relations] |
Dutch-LassySmall | [ Query] [ Relations] |
English-Atis | [ Query] [ Relations] |
English-ESL | [ Query] [ Relations] |
English-EWT | [ Query] [ Relations] |
English-GUM | [ Query] [ Relations] |
English-GUMReddit | [ Query] [ Relations] |
English-LinES | [ Query] [ Relations] |
English-PUD | [ Query] [ Relations] |
English-Pronouns | [ Query] [ Relations] |
Erzya-JR | [ Query] [ Relations] |
Estonian-EDT | [ Query] [ Relations] |
Estonian-EWT | [ Query] [ Relations] |
Faroese-OFT | [ Query] [ Relations] |
Faroese-FarPaHC | [ Query] [ Relations] |
Finnish-FTB | [ Query] [ Relations] |
Finnish-PUD | [ Query] [ Relations] |
Finnish-TDT | [ Query] [ Relations] |
Finnish-OOD | [ Query] [ Relations] |
French-FQB | [ Query] [ Relations] |
French-FTB | [ Query] [ Relations] |
French-GSD (Native) | [ Query] [ Relations] |
French-ParTUT | [ Query] [ Relations] |
French-PUD | [ Query] [ Relations] |
French-Sequoia | [ Query] [ Relations] |
French-ParisStories (Native) | [ Query] [ Relations] |
French-Rhapsodie (Native) | [ Query] [ Relations] |
Frisian_Dutch-Fame | [ Query] [ Relations] |
Galician-CTG | [ Query] [ Relations] |
Galician-TreeGal | [ Query] [ Relations] |
German-GSD | [ Query] [ Relations] |
German-HDT | [ Query] [ Relations] |
German-LIT | [ Query] [ Relations] |
German-PUD | [ Query] [ Relations] |
Gothic-PROIEL | [ Query] [ Relations] |
Greek-GDT | [ Query] [ Relations] |
Guajajara-TuDeT | [ Query] [ Relations] |
Guarani-OldTuDeT | [ Query] [ Relations] |
Hebrew-HTB | [ Query] [ Relations] |
Hebrew-IAHLTwiki | [ Query] [ Relations] |
Hindi_English-HIENCS | [ Query] [ Relations] |
Hindi-HDTB | [ Query] [ Relations] |
Hindi-PUD | [ Query] [ Relations] |
Hittite-HitTB | [ Query] [ Relations] |
Hungarian-Szeged | [ Query] [ Relations] |
Icelandic-PUD | [ Query] [ Relations] |
Icelandic-Modern | [ Query] [ Relations] |
Icelandic-IcePaHC | [ Query] [ Relations] |
Indonesian-GSD | [ Query] [ Relations] |
Indonesian-PUD | [ Query] [ Relations] |
Indonesian-CSUI | [ Query] [ Relations] |
🆕 Irish-Cadhan | [ Query] [ Relations] |
Irish-IDT | [ Query] [ Relations] |
Irish-TwittIrish | [ Query] [ Relations] |
Italian-ISDT | [ Query] [ Relations] |
Italian-MarkIT | [ Query] [ Relations] |
Italian-ParTUT | [ Query] [ Relations] |
Italian-PoSTWITA | [ Query] [ Relations] |
Italian-TWITTIRO | [ Query] [ Relations] |
🆕 Italian-ParlaMint | [ Query] [ Relations] |
Italian-PUD | [ Query] [ Relations] |
Italian-Valico | [ Query] [ Relations] |
Italian-VIT | [ Query] [ Relations] |
Japanese-BCCWJ | [ Query] [ Relations] |
Japanese-BCCWJLUW | [ Query] [ Relations] |
Japanese-GSD | [ Query] [ Relations] |
Japanese-GSDLUW | [ Query] [ Relations] |
Japanese-Modern | [ Query] [ Relations] |
Japanese-PUD | [ Query] [ Relations] |
Japanese-PUDLUW | [ Query] [ Relations] |
Javanese-CSUI | [ Query] [ Relations] |
Kaapor-TuDeT | [ Query] [ Relations] |
Kangri-KDTB | [ Query] [ Relations] |
Karelian-KKPP | [ Query] [ Relations] |
Karo-TuDeT | [ Query] [ Relations] |
Kazakh-KTB | [ Query] [ Relations] |
Khunsari-AHA | [ Query] [ Relations] |
Kiche-IU | [ Query] [ Relations] |
Komi_Permyak-UH | [ Query] [ Relations] |
Komi_Zyrian-IKDP | [ Query] [ Relations] |
Komi_Zyrian-Lattice | [ Query] [ Relations] |
Korean-GSD | [ Query] [ Relations] |
Korean-Kaist | [ Query] [ Relations] |
Korean-PUD | [ Query] [ Relations] |
Kurmanji-MG | [ Query] [ Relations] |
Latin-ITTB | [ Query] [ Relations] |
Latin-LLCT | [ Query] [ Relations] |
Latin-Perseus | [ Query] [ Relations] |
Latin-PROIEL | [ Query] [ Relations] |
Latin-UDante | [ Query] [ Relations] |
Latvian-LVTB | [ Query] [ Relations] |
Ligurian-GLT | [ Query] [ Relations] |
Lithuanian-ALKSNIS | [ Query] [ Relations] |
Lithuanian-HSE | [ Query] [ Relations] |
Livvi-KKPP | [ Query] [ Relations] |
Low_Saxon-LSDC | [ Query] [ Relations] |
Madi-Jarawara | [ Query] [ Relations] |
Makurap-TuDeT | [ Query] [ Relations] |
🆕 Malayalam-UFA | [ Query] [ Relations] |
Maltese-MUDT | [ Query] [ Relations] |
Manx-Cadhan | [ Query] [ Relations] |
Marathi-UFAL | [ Query] [ Relations] |
Mbya_Guarani-Dooley | [ Query] [ Relations] |
Mbya_Guarani-Thomas | [ Query] [ Relations] |
Moksha-JR | [ Query] [ Relations] |
Munduruku-TuDeT | [ Query] [ Relations] |
Naija-NSC (Native) | [ Query] [ Relations] |
Nayini-AHA | [ Query] [ Relations] |
Neapolitan-RB | [ Query] [ Relations] |
🆕 Nheengatu-CompLin | [ Query] [ Relations] |
North_Sami-Giella | [ Query] [ Relations] |
Norwegian-Bokmaal | [ Query] [ Relations] |
Norwegian-Nynorsk | [ Query] [ Relations] |
Norwegian-NynorskLIA | [ Query] [ Relations] |
Old_Church_Slavonic-PROIEL | [ Query] [ Relations] |
Old_East_Slavic-Birchbark | [ Query] [ Relations] |
Old_East_Slavic-RNC | [ Query] [ Relations] |
🆕 Old_East_Slavic-Ruthenian | [ Query] [ Relations] |
Old_East_Slavic-TOROT | [ Query] [ Relations] |
Old_French-SRCMF | [ Query] [ Relations] |
Old_Russian-RNC | [ Query] [ Relations] |
Old_Russian-TOROT | [ Query] [ Relations] |
Old_Turkish-Tonqq | [ Query] [ Relations] |
Persian-Seraji | [ Query] [ Relations] |
Persian-PerDT | [ Query] [ Relations] |
Pomak-Philotis | [ Query] [ Relations] |
Polish-LFG | [ Query] [ Relations] |
Polish-PDB | [ Query] [ Relations] |
Polish-PUD | [ Query] [ Relations] |
Portuguese-Bosque | [ Query] [ Relations] |
Portuguese-GSD | [ Query] [ Relations] |
🆕 Portuguese-PetroGold | [ Query] [ Relations] |
Portuguese-PUD | [ Query] [ Relations] |
Romanian-ArT | [ Query] [ Relations] |
Romanian-Nonstandard | [ Query] [ Relations] |
Romanian-RRT | [ Query] [ Relations] |
Romanian-SiMoNERo | [ Query] [ Relations] |
Russian-GSD | [ Query] [ Relations] |
Russian-PUD | [ Query] [ Relations] |
Russian-SynTagRus | [ Query] [ Relations] |
Russian-Taiga | [ Query] [ Relations] |
Sanskrit-UFAL | [ Query] [ Relations] |
Sanskrit-Vedic | [ Query] [ Relations] |
Scottish_Gaelic-ARCOSG | [ Query] [ Relations] |
Serbian-SET | [ Query] [ Relations] |
🆕 Sinhala-STB | [ Query] [ Relations] |
Skolt_Sami-Giellagas | [ Query] [ Relations] |
Slovak-SNK | [ Query] [ Relations] |
Slovenian-SSJ | [ Query] [ Relations] |
Slovenian-SST | [ Query] [ Relations] |
Soi-AHA | [ Query] [ Relations] |
South_Levantine_Arabic-MADAR | [ Query] [ Relations] |
Spanish-AnCora | [ Query] [ Relations] |
Spanish-GSD | [ Query] [ Relations] |
Spanish-PUD | [ Query] [ Relations] |
Swedish-LinES | [ Query] [ Relations] |
Swedish-PUD | [ Query] [ Relations] |
Swedish_Sign_Language-SSLC | [ Query] [ Relations] |
Swedish-Talbanken | [ Query] [ Relations] |
Swiss_German-UZH | [ Query] [ Relations] |
Tagalog-TRG | [ Query] [ Relations] |
Tagalog-Ugnayan | [ Query] [ Relations] |
Tamil-TTB | [ Query] [ Relations] |
Tamil-MWTT | [ Query] [ Relations] |
Tatar-NMCTT | [ Query] [ Relations] |
Teko-TuDeT | [ Query] [ Relations] |
Telugu-MTG | [ Query] [ Relations] |
Thai-PUD | [ Query] [ Relations] |
Tupinamba-TuDeT | [ Query] [ Relations] |
Turkish-Atis | [ Query] [ Relations] |
Turkish-BOUN | [ Query] [ Relations] |
Turkish-FrameNet | [ Query] [ Relations] |
Turkish-GB | [ Query] [ Relations] |
Turkish-IMST | [ Query] [ Relations] |
Turkish-Kenet | [ Query] [ Relations] |
Turkish-PUD | [ Query] [ Relations] |
Turkish-Penn | [ Query] [ Relations] |
Turkish-Tourism | [ Query] [ Relations] |
Turkish_German-SAGT | [ Query] [ Relations] |
Ukrainian-IU | [ Query] [ Relations] |
Umbrian-IKUVINA | [ Query] [ Relations] |
Upper_Sorbian-UFAL | [ Query] [ Relations] |
Urdu-UDTB | [ Query] [ Relations] |
Uyghur-UDT | [ Query] [ Relations] |
Vietnamese-VTB | [ Query] [ Relations] |
Warlpiri-UFAL | [ Query] [ Relations] |
Welsh-CCG | [ Query] [ Relations] |
Western_Armenian-ArmTDP | [ Query] [ Relations] |
🆕 Western_Sierra_Puebla_Nahuatl-ITML | [ Query] [ Relations] |
Wolof-WTB | [ Query] [ Relations] |
🆕 Xavante-XDT | [ Query] [ Relations] |
Xibe-XDT | [ Query] [ Relations] |
Yakut-YKTDT | [ Query] [ Relations] |
Yoruba-YTB | [ Query] [ Relations] |
Yupik-SLI | [ Query] [ Relations] |
🆕 Zaar-Autogramm | [ Query] [ Relations] |
Conversion from UD to SUD #
This page describes the process used in the conversion from UD to SUD. It also explains how this can be adapted to languages specificities.
The main sequence #
Onf (eud_to_ud)
: Remove all enhanced annotation; the conversion supposes that the input is in basic UD format. Note that it can be safely applied to basic UD, the annotations are left unchanged.Onf (idioms)
: Add the features encoding of idioms in SUD; namely, featuresExtPos
,PhraseType
,InTitle
andInIdiom
(see Idioms and titles). Note that relations are not changed here.specific_expr_init
: Add an explicit node for eachExtPos
. TODO: give detail and an example.Onf (sub_relations)
: Transform UD relations with subtypes into the SUD equivalent.Onf (rel_extensions)
: Transform remaining UD subtypes (not handled insub_relations
) intodeep
SUD feature. For instance, the Polishcop:locat
is transformed intocop@locat
.Onf (relations)
: Transform main UD relation into the SUD equivalent (exceptcase
,aux
,mark
andcop
, see next step).reverse_relations.main
: Reverse relationscase
,aux
,mark
andcop
. See below for detail about reversing relations.Move the dependents of a conjunction from the left conjunct to the right conjunct. Dependencies
conj
,discourse
,parataxis
andpunct
are not moved.Onf (shared_left_conj-dep)
Onf (unshared_left_conj-dep)
Onf (minimize_right_conj-dep)
Onf (add_conj_emb)
: Mark embeddedconj
relations with the extensionemb
.Onf (chained_relations)
: Dependencies of typeconj
, andflat:*
grouped into a bouquet are reorganised into a chain.specific_expr_close
: Remove specific nodes and edges introduced by the dual packagespecific_expr_init
.Onf (unk_rel)
: Rename all non-SUD relations tounk
(backoff package).
Defining rules for reversing relations is tricky mainly for two reasons:
- When more than one relations to be reversed have the same head, the order of the reverse operations produced different output. Some mechanism to describe the wanted order is necessary.
- When reversing a relation from
N
toM
into a relation fromM
toN
, we have to decide for each dependent ofN
if it should be lifted up toM
or if it should stay onN
.
Choosing the order when reversing relations #
To constraint the order, a numeric level is given to each edge to be reversed and then:
- edge with the smallest level have higher priority
- if two edges have the same level and are on the same side of the head, the closest one has higher priority
- if two edges have the same level and are on both sides of the head, the one after the head has higher priority.
By default, the 4 relations case
, cop
, aux
and mark
(and their subtypes) are given the level 10.
We give below examples of conversions with multiple reversing of relations.
In Japanese or in German, the default rules are applied.
The order can be changed by adding different levels to specific relations before calling the strategy reverse_relations.main
(see examples below for French and Wolof).
Japanese #
In Japanese all UD relations case
, cop
, aux
and mark
are left-headed. The constraint 2 applies.
German #
In German, there are many cases with edges on both sides. Contraint 3 applies here:
French #
In French, levels are set to:
case
orcase:*
→ 10cop
orcop:*
→ 20aux:caus
oraux:pass
→ 30aux
oraux:*
(≠aux:caus
oraux:pass
) → 40mark
ormark:*
→ 50
From the UD annotation:
The universal conversion produces:
And the conversion with the French specific levels (see GitHub):
Wolof #
In Wolof, the lemma na must always be the head of the whole structure, so it must be the last relation to be reversed. This can be specified with a rule:
rule na {
pattern { e: V -[aux]-> A; A[lemma="na"] }
commands { e.level = 100 }
}
From the UD annotation:
The universal conversion produces:
And the conversion with the new na
rule produces (see
GitHub):
More examples of na as the head of a double aux
construction:
Grew-match.
Lifting dependencies #
TODO
Publications #
Papers about the SUD annotation scheme and SUD annotated corpora #
Sylvain Kahane, Bernard Caron, Emmett Strickland, Kim Gerdes Annotation guidelines of UD and SUD treebanks for spoken corpora: a proposal in TLT 2021.
Sylvain Kahane, Martine Vanhove, Rayan Ziane, Bruno Guillaume. A morph-based and a word-based treebank for Beja in TLT 2021.
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. Starting a new treebank? Go SUD! Theoretical and practical benefits of the Surface-Syntactic distributional approach in DepLing 2021.
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. Improving Surface-syntactic Universal Dependencies (SUD): surface-syntactic relations and deep syntactic features in TLT 2019.
Kim Gerdes, Bruno Guillaume, Sylvain Kahane, Guy Perrier. SUD or Surface-Syntactic Universal Dependencies: An annotation scheme near-isomorphic to UD in UDW 2018.
Other publications related to SUD #
Some linguistic arguments in favor of SUD can be found in the Glossa article:
- Timothy Osborne, Kim Gerdes The status of function words in dependency grammar: A critique of Universal Dependencies (UD)
Comparing syntactic complexity and cognitive constraint of SUD and UD:
- Yan, Jianwei, and Haitao Liu. Which annotation scheme is more expedient to measure syntactic difficulty and cognitive demand?. Presented at Quasy, SyntaxFest 2019.
SUD principles #
SUD is a Surface-syntax Universal Dependencies scheme. SUD follows the Surface syntax criteria (favoring functional heads) and can be automatically converted to the UD scheme. This page describes the universal principles used in SUD and the tagset. Some annotations are shared with UD. See details below.
SUD relations overview #
The picture below describes:
- :blue_square: in blue: the hierarchy of relations specific to SUD
- :green_square: in green: the relations shared with UD
- :orange_square: in orange: the UD relations not used in SUD
- :white_large_square: The light-blue boxes at the bottom correspond to the deep syntactic features.
Common principles between UD and SUD #
Please refer to UD for these aspects:
The tagset for the Part Of Speech follows the UD one. SUD shares a number of syntactic relations with UD too, the list of which is given below (links to UD related page are given):
vocative
,
compound
,
dislocated
,
discourse
,
appos
,
det
,
clf
,
conj
,
cc
,
flat
,
parataxis
,
orphan
,
goeswith
,
reparandum
,
punct
.
However, we must stress that there are some differences between the usage of some of these relations in UD and SUD. Namely, the relations appos
, conj
and reparandum
are only used when analysing written texts. When analysing oral texts, we use instead the relations conj:appos
, conj:coord
and conj:dicto
respectively (which are specific to SUD). We will explain the details in the section below.
Correspondences between UD and SUD #
SUD is a dependency-based annotation scheme. Annotation choices rely on surface-syntactic distributional criteria, while at the same time attempting to maintain convertibility with the UD annotation scheme as much as possible.
SUD represents an alternative rather than a competitor to UD, and was designed in such a way that the two can convey the same informational content. The two schemes enjoy a nearly perfect degree of two-way convertibility, meaning that conversions between the two schemes can take place without informational loss in most cases. Because of this, correspondences between the two are most often regular and predictable.
Correspondences between SUD and UD relationships are impacted by several key properties. Firstly, SUD annotations are less redundant and more economical than UD annotations. For example, we can see in the table below that SUD uses a single subj
relation which comprises both the nsubj
(nominal subject) and csubj
(clausal subject) relationships in UD. However, the information provided by UD’s distinction between nominal and clausal subjects is not lost in under the simpler SUD scheme: the differentiation can be recovered automatically from the POS of the subject and its context, though how this context is taken into account depends on the language. In total, a subset of 17 UD relations (nsubj
, csubj
, obj
, iobj
, obl
, xcomp
, ccomp
, amod
, nmod
, nummod
, advmod
, acl
, advcl
, aux
, cop
, case
, mark
) is replaced by three major relations in SUD: subj
, comp
, mod
, as well as udep
to a marginal extent.
In addition to its more economical set of labels, SUD also diverges from UD in the sense that it does not systematically label content words as heads. Instead, SUD treats adpositions, subordinating conjunctions, auxiliaries, and copulas as heads. This is because SUD identifies surface syntactic heads using the main criterion that they determine the distribution of the syntactic unit in question. For example, the SUD scheme would identify the preposition to in the sentence Peter talked to Mary as a head, since it determines the distribution of Mary. The UD scheme would label Mary as a head based on the fact that it is a content word. Because of this difference, the direction of certain syntactic relationships is reversed between SUD and UD. This namely applies to the SUD relationships aux
, cop
, case
, and mark
, which are also highlighted in bold in the correspondence table below. You may also find more information about this aspect of SUD relations on the
general principles section.
Table of correspondences between UD and SUD #
nsubj | subj |
csubj | |
aux | comp:aux |
cop | comp:pred |
xcomp | comp:obj |
case | |
mark | |
obj | |
ccomp | |
ccomp | comp:obl |
obl | |
iobj | |
nmod | udep |
obl, acl | mod |
advcl | |
advmod | |
amod | |
nummod | |
fixed | encoded in node features (see here) |
det | det |
nummod |
Example of a sentence annotated in SUD (above) and UD (below). #
General SUD principles #
SUD differs from UD in several general principles. The main differences with respect to UD are the following:
The definition of relations is based on the syntactic position and not on semantic relations or the category of the dependents. In other words, two units that commute and exclude each other occupy the same position and must have the same function.
Functional heads (instead of lexical heads): The head of a unit is the distributional head, that is, the element that control the distribution of the unit. This points out the functional head in most cases. For instance, the adposition to is the head of to Mary because Mary and to Mary do not have the same distribution (at all).
In some cases, this criterion does not give a clear situation because two words have head features. In this case, a second gradual criterion comes into play where we prefer to give the status of dependent to the one that changes less the distribution of the unit. According to this principle, a coordinative conjunction such as and does not govern the conjunct following it, because and Mary, and red, or and is sleeping occupy completely different positions. In the same way, the determiner is analyzed as a dependent of the noun because nouns partly control the distribution of a combination determiner-noun (this morning can work as a modifier of a verb contrary to this boy).
SUD relations are organized in a taxonomic hierarchy: A relation that is the daughter of another one inherits its syntactic properties with the addition of specific properties. Indeed, sometimes, we cannot take into account all possible distinctions, either because of the conversion from different treebanks not containing enough information, or because a sentence does not allow to make a clear decision. A way of naming a daughter of a relation
R
is to add an extensionEXT
toR
, calling this new relationR:EXT
.It is possible to distinguish between arguments and modifiers: Although this distinction involves semantic criteria (an argument of a lexical unit L is an obligatory participant in the semantic description of L), we consider that it is hard to avoid, because especially for verb dependents, most language have special functions.
A multiple coordination, such as John, Mary and Peter, is analyzed as a chain instead of a bouquet: One of the main argument for the chain-analysis is that it reduces the dependency length. See the page dedicated to coordination.
There is a strict distinction between surface-syntactic relations and deep-syntactic features expressed as extensions of syntactic relation names using the
@
symbol.
UD relations that are not used in SUD:
nsubj
,
csubj
,
obj
,
iobj
,
obl
,
xcomp
,
ccomp
,
amod
,
nmod
,
nummod
,
advmod
,
acl
,
advcl
,
aux
,
cop
,
case
,
mark
.
These 17 relations are replaced by three major relations in SUD –
subj
,
comp
,
mod
(subject, complement, modifier) – with possible sub-relations, in addition to the general
udep
(underspecified dependency) to a more marginal extent. The key differences between SUD and UD as well as a table summarizing the most frequent correspondences may be consulted
here.
SUD has 4 specific syntactic relations and a few extended relations:
SUD deep features #
In SUD, dependency relations are designed to describe syntactic surface relations. Information related to deep syntax or semantics is given on dependencies with deep features which are extensions to dependency label introduced by the @
symbol.
The main deep features are:
@agent
,
@caus
,
@expl
,
@lvc
,
name
,
@pass
,
@relcl
,
@tense
,
@scrap
@x
.
Tutorials #
You can find some exercices to practice the SUD annotations here.
We recommand you to use the platform Arborator Grew for your annotation. You can find the documentation here.
We then encourage you to use Grew-Match to visualise your annotations and to analyse your corpus.