Sign up here for free!

Welcome to mixxt!



Outcomes of Workshop July 2009 in Augsburg

Linguists' wish-list concerning tools and their developers

1. Mutual awareness
  • Take a broad range of heterogeneous linguists into account
    • Need for tool developers to have the users tool chains explicit

2. Interoperability
  • Need for interchange between existing tool chains – making tools talking to each other

3. Developments
  • Information about automatic updates (and what they include) of tools
    • Don't replicate tool infrastructures
    • Develop niche tools, e.g. task / domain specific tools ("time aligned transcription for community X")
    • Have common basic operative principles for working with tools

4. Standardisation
  • Standardize accepted vocabularies (LDC), but allow for extensions

5. Communication and information
  • Contact point linguistics <> developers
    • One contact address (on the web, a URL) for resources
    • Have a non-regional "global" body / publication avenue / ... (e.g. UN or ....?), non-tool specific dissemination means for resources (tools and data).
    • Announce your resources on the LinguistList webpage
    • Developers should provide link to discussion group, bug tracking, Wiki... in tool

6. Help with using the tools
  • Video tutorials, manuals, Wiki... for linguistis for using and developing tools

Requirements from TEI, for acceptable format & vocabulary for the cross-community

  • TEI webpage should have graphical UI for annotators/linguists to use.
  • need of cross-community review of TEI categories and what’s nested inside what to make sure:
    • is nested reasonably for both deep research and industrial time/cost constraints and requirements
  • Need for end-to-end scenarios
    • This will facilitate adoption by the broader community
    • TEI could define a set of prepared DTDs targeted to particular types of transcription work, as a recommendation to users, and to facilitate adoption
  • List of concepts workshop participants would like tobe able to label with TEI tags:
    • feature, segment, syllable, onset, coda, prosodic word, utterance, turn, paratone
    • intonation phrase, pitch accents, contours, breaks
    • tone (lexical, grammatical, morphological), phonetic surface, underlying phonology, surface phonology
    • pitch range, key, reset
    • ATR harmony, nasality
    • pauses, filled and unfilled
    • variants, errors
    • repairs, hesitations
    • lexical accent, prominence, stress
    • types of voicing, laryngeal phenomena, jitter (should not be shift features)
    • translation, gloss
    • articulatory data as well as auditory
    • neurolinguistic data
      • be able to associate, e.g., ERP measures with points in time
      • need tags to describe whether something is a transcription of an audio file vs. “transcription” of brain response measures
    • word frequency
    • gestures
    • spectral tilt measures (energy distribution over the spectrum)
    • degree of certainty by labeller
    • labeler observations
  • open questions:
    • doesTEI have techniques for dealing with line crossing?

Requirements concerning future annotations stores and integration of tools

  • Format
    • Have derefencable persistent identifiers for your resources
    • Re-use of existing standards (HTTP, ...)
    • Provide mechanisms for version information
    • Provide a baseline of metadata for resources
    • Provide mechanisms for additional annotations / corrections
    • Have a lightweight & scalable implementation
    • Have a schema & a data model describing the interrelations between resources in the data store
    • Ability to switch between annotating and querying
    • Linkage between search results and editable corpora
    • Having the right balance between openess and closeness of the (schema and model) design

  • Access and usability
    • Have an offline & component for the annotation store
    • Variable licence of texts and annotation
    • Gathering access permission issues from research & industry
    • Have a simple & scalable interface for users
    • Well-defined interfaces with varying data structures below
    • Provide localized user interface
    • Provide easily accessible query mechanisms

  • Administration
    • Have a non-regional "global" body for hosting the annotation store
    • Co-ordinate with research bodies / researchers who create data stores for other kinds of data

Last changed by Ulrike Gut on 03/08/2009 at 17:24

Sign in here

Not a member of this network?

Alternative logins

You can use an account of a third party.


How does this work? Just click on edit to change this text!

Network details

  • Search for:

  • Network name

    Corpus Phonology
    Creating, searching, archiving and sharing spoken language corpora for phonological research

  • Your host is

    Ulrike Gut

  • Created on


  • Members


  • Language