Idiom and Titles

Idiom and Title #

Universal #

SUD offers several traits which allow annotators to mark idiomatic expressions and titles while still preserving the internal syntactic relationships between their various components. We have decided to distinguish these two categories from Multi-Word Expressions (MWEs), which represent a broader category which also includes named entities.

For our purposes, “titles” refer to any title given to a film, book, painting, or other work of art, such as Planet of the Apes, Dark Side of the Moon, American Gothic, or Super Mario Bros. However, this excludes other named entities like events, holidays or locations, such as The Gulf War, Good Friday, or The Eiffel Tower.

Idioms, meanwhile, refer to any figurative expression ranging from classic examples like kick the bucket to to extremely common phrases like in general whose precise meaning cannot directly be deduced from its constituents. Pronominal verbs, such as those common in Romance languages, are also treated as idioms.

Idioms and titles are annotated in the following way:

  • The head of the idiom or title contains the feature Idiom=Yes or Title=Yes

  • The head also contains an “external part of speech” feature (ExtPos) which denotes the element’s function within the wider sentence. Please note that all titles will carry the ExtPos value of PROPN.

  • The remaining components of the element will contain the feature InIdiom=Yes or InTitle=Yes.

This approach has several advantages. By marking these categories with features rather than a fixed relation, we are able to preserve its internal syntactic relationships.  

Grew pattern :

  • pattern { N[Idiom] }
  • pattern { N[Title] }

NB: Until version 2.8, the feature PhraseType=Idiom was used for the head of idioms (now replaced by Idiom=Yes) and the feature PhraseType=Title was used for the head of titles (now replaced by Title=Yes)

With internal syntactic relations #

English English

Spanish

Without internal syntactic relations #

When there is no clear internal syntactic structure, the relation unk is used.

English

French

french #

TODO

Overview #

Specific Pattern #

haitien #

TODO

Overview #

Specific Pattern #