mixxt

Sign up here for free!

Welcome to mixxt!

ISO standardisation

link this post written on 11/02/2010
  • To top
Han Sloetjes:
E.g. I can imagine that 'as' elements could be used for a 'tier' (Keith, is that your suggestion?)

Yes. Hopefully things make more sense with my reply to Steve above. Every domain will have their own concept of tiers, layers, sets, etc. I know Nancy hates when I say this, but at some level it is up to the application to specify what domain they live in and what they think the @type URIs mean.

Han Sloetjes:

A node is now only a container for a sequence of links, each of which can link to multiple regions. 'as' can contain a sequence of 'a' elements, which are required to refer to either a node or an edge. So, an annotation can be linked to a time interval via a node. I can't see how an annotation can refer to another annotation (in case that might be needed). An 'edge' links two nodes, which now comes down to linking two (complex) regions.

Linking annotations is done by linking nodes with edges in the graph. I will admit this is a bit cumbersome, but in GrAF edges are first class citizens of the data model and get their own element.

Han Sloetjes:

The 'a' element has an 'as' attribute which seems to be superfluous; either the 'a' element is contained in an 'as' element and belongs to that set or it is a direct child of 'graph' and is not part of a set?

The <as> element and @as attribute are meant to be used interchangably. We will have to specify in the prose which takes precedence.
The author has edited this post (on 11/02/2010)
link this post written on 12/02/2010
  • To top
Keith, thanks for the updated data model, that makes things a bit clearer. I can see now that both Nodes and Edges can have many associated Annotations and each Annotation can have multiple feature structures. I'm still not clear on what having multiple annotations means since storing more than one piece of information can be achieved via feature structures. There is a case for storing eg. both the text of a word and its POS, or even two alternate POS tags; is that the use case for multiple annotations?

Keith Suderman:


Steve Cassidy:

If each Node has many Annotations, they are differentiated only by their label, should that be a type? How do I find the POS Annotation if one exists? (Give me all the POS annotations on this Node).


Do you mean replace @label with @type? GrAF tries to limit the use of @type attributes because GrAF itself has no concept on "annotation type"; not all annotation formats have a concept of annotation type. Any type information should come from an external schema, DCR, or type system description.

However, annotations are also distinguished by the annotation set they belong to. So, for example, a node may have multiple "token" annotations with each belonging to a different annotation set.



I think it depends how you conceive of "annotation type", I think all annotation systems use annotation types, only that some can only contain one type that is always implicit. It's when you are merging annotations from different sources that you need the type system, so that you can know that these are Biber POS tags and those are syllables.

So I think what you're saying is that the pointer to an AnnotationSet is a de-facto type reference for the annotation - if I have a group of annotations of a given type (from a given source) then I put them in an AnnotationSet and they inherit the type of that set (which has a URI type attribute). That may well be sufficient and it clarifies for me what an AnnotationSet is (but then I'm not sure it should have layers). If my interpretation is correct, I might be happier if it had a different name and if the relation from Annotation weren't 'member'.

How then is an Annotation part of more than one AnnotationSet - if the set defines the type of the annotation?


I'll leave other questions to another post.

Steve


link this post written on 12/02/2010
  • To top
Keith Suderman:


Steve Cassidy:

The main question is how Annotations are linked to Nodes. Annotation seems to be just a thin wrapper around a feature structure - why is it needed in that case? It adds only a label, not an id. What is the label in an annotation?


Yes, annotations are just thin wrappers around a feature structure, think of the label as an XML element name and the feature structure as the element attributes.


You mean an element name in the sense that I might embed some markup in a text to add annotation - I might write <TIMEX3> to mark a temporal expression or I might say <NP> to say that something's a noun phrase. I'm saying "this bit here is an X" where X is the label. So, it can be like a type "this bit is a word" and it can be like data "this bit here is a noun" - do we want to distinguish the two?

I'm trying to see how I'd convert a multimodal annotation to use this. I have a series of Phonemic segments, they have labels associated with them for Phoneme and Stress, what do I say the label of the annotation is? It's easy if I use a feature structure since I say {Phoneme: A, Stress: W} but I have to set a label. Phoneme isn't right, that's a type reference so I'm left choosing one of the two properties to be the 'primary' label. Is this what's intended?

Keith Suderman:


The reasons for including the annotation element are:

1. The <fs> element doesn't allow for any attributes other than @type. It is unlikely we could convince the ISO Feature Structure people to add any other attributes we needed. GrAF already "tweaks" the ISO FS standard to allow the f element to contain a @value attribute, and that was a huge fight. Using a simple wrapper around feature structures eliminates that problem.


As far as I can see, you could include zero or more feature structures as a property of GraphElement, use the type attribute to denote the AnnotationSet and you don't need any more attributes - except for label which as I said above, I don't really understand.

Keith Suderman:

2. Consistency and ease of processing. When parsing a feature structure we do not have to distinguish between features that are annotations (this is a token) and features that are features of annotations (that token's ID is X).


I don't really get this, are you saying that 'this is a token' is done by the label? If so, it's a type reference, not what I was thinking above.

Keith Suderman:

3. Having multiple annotations with the same name is difficult to represent.


I still don't see how you differentiate between annotations with the same name - there's got to be something different about them - they have a different type (AnnotationSet) most likely, you can do that with the type attribute on the feature structure.

Keith Suderman:

4. It provides for a simpler representation when an annotation does not have any features.


This is where it's just got a label? Ok, but that's just a shorthand in the serialisation rather than a fundamentally difference in the model.

Steve
link this post written on 21/02/2010
  • To top
Hi all,

Thanks Keith for the Graph Data Model. here are my spontaneous reactions as a linguist:

- in the media "folder" there must be some representation of a timeline, which is THE essential element for phonological annotations
- each annotation in the annotations "folder" needs to be linked directly to this timeline
- to be honest, I have no idea what the function and use of the entire Graphs "folder" is. I can see that you need edges and nodes to describe the data format of XML, but if - if I understood this correctly - the GraphDataModel is supposed to model the nature of annotations, the modelling of a data format is not necessary. Annotations can be modelled without reference to a specific data format.
- like Steve, I am not sure why the category AnnotationSet is necessary in the annotations "folder". Layers contain annotations, that's all you need to model them I believe.

Best,
Ulrike
link this post written on 27/06/2011
  • To top

Hi Keith

 

On internal evidence, the RelaxNG and other schemas you linked to from here are created from a TEI ODD. Huzzah! but where is the source of that ODD? 

 

best wishes

 

Lou

 

link this post written on 27/06/2011
  • To top

Hi Lou,

 

Of course I use ODD, I love ODD/Roma when it comes to schema authoring!

 

However, I haven't put the ODD files themselves online as the standard STILL isn't finalized.  I have a few more minor tweaks (mostly renaming things in the header) and once everything has been finalized and approved I will put all the files online.

 

And while I am here… this is a little late, but:

 

Ulrike Gut:
Hi all, Thanks Keith for the Graph Data Model. here are my spontaneous reactions as a linguist: 

- in the media "folder" there must be some representation of a timeline, which is THE essential element for phonological annotations 

- each annotation in the annotations "folder" needs to be linked directly to this timeline 

It is possible to model timelines with anchors/regions; an "instant" would be a region defined by one anchor, and an "interval" (or timeline) would be modeled as a region defined by two or more anchors. More complex models could be constructed by linking anchors/regions together.

 

- to be honest, I have no idea what the function and use of the entire Graphs "folder" is. I can see that you need edges and nodes to describe the data format of XML, but if - if I understood this correctly -

I think this is your point of confusion, the graph part of the data model has nothing to do with XML or the specific data format.

 

One of the primary goals of the GrAF data model is the separation of the data model for annotations from the data model for the artifact being annotated.  The "graph" part of the data model is then used to act as a bridge between the two and is used to express the relationship between annotations and the artifact.

 

Hopefully, once GrAF is finalized, and there is more support, this will be completely transparent to users. Users should no more have to worry about the GrAF data model than users today have to worry about XML Infosets just because their authoring tool saves its data as XML.

 

- like Steve, I am not sure why the category AnnotationSet is necessary in the annotations "folder". Layers contain annotations, that's all you need to model them I believe.

The AnnotationSet element is one of things being renamed to reduce confusion. GrAF AnnotationSpaces (nee AnnotationSets) are more similar to XML Namespaces than a grouping mechanism such as layers/tiers (although, putting annotations in different namespaces is a form of grouping). AnnotationSpaces are used solely to resolve naming conflicts if two sets of annotations, from different sources, use the same name.  For example, the ANC has received tokenizations from several sources, each with slightly different rules for what a "token" is.  By placing the tokens in different AnnotationSpaces we can distinguish between the various token annotations, their source, and the semantics for each.

 

Keith

  • Statistics: 26 Posts | 6705 Visits

Sign in here

Not a member of this network?

Alternative logins

You can use an account of a third party.

Network details

  • Search for:

  • Network name

    Corpus Phonology
    Creating, searching, archiving and sharing spoken language corpora for phonological research

  • Your host is

    Ulrike Gut

  • Created on

    02/08/2009

  • Members

    178

  • Language

    English