Challenge
The Netherlands Graduate School of Linguistics required a
Typological Database
System to provide access to multiple semi-structured
typological databases through a single
unified interface.
Their objective was to facilitate the discovery and verification
of typological implications through
sophisticated cross-database searches.
Solution
NetKernel was used to combine various databases underneath a
unified declarative querying mechanism that presents the resulting information
in XML documents.
The resulting XML documents and information about the query results sets
are presented visually to the user through a flexible web interface.
Result
NetKernel provided a fertile, flexible development environment
supporting rapid prototyping and subsequent development of the
Typological Database System.
Netherlands Graduate School of Linguistics Typological Database System (TDS)
The TDS project
The TDS project is managed by a research group in the Netherlands Graduate School
of Linguistics (LOT) and includes members from the universities of
Amsterdam, Leiden, Nijmegen, and Utrecht.
The initial phase commenced in September 2000, implementation in May 2004,
and completion is scheduled for May 2007.
Funding is provided by the Netherlands Organization
for Scientific Research (NWO) and the participating universities.
The Typological Database System
The field of linguistic typology is defined as:
The study of the similarities and differences between languages, regardless of any genetic relation, and the resulting categorization
of language into 'types'.[1]
Typological linguistic research involves the collection of information about linguistic phenomena for a
representative sample of the languages of the world. Many researchers have stored these collections
in a digital form.
The purpose of the TDS project is to make the typological collections of participating
institutes available through one interface and to allow sophisticated searches across collection boundaries.
At the lower levels the system presents the classic data integration
and data warehousing challenges.
At a higher level, closer to the user, the system resembles a realization of the
vision for the semantic web.
Concepts represented in the various databases are often subject to the
colliding world views of both the collection owners and the individual user.
Knowledge representation technologies (e.g. ontologies)
describe the sometimes partial relationships between concepts envisioned
to help the user find interesting information.
The challenge for the TDS project is to bridge
the gap between the semantic level and the collected data while
maintaining the integrity of the semantics intact.
Figure 1: TDS Architecture
The typology collections are static by nature.
The parts under development are commonly only released
to the outside world after the primary user group has published about the topic.
Thus the TDS receives periodic snapshots of information.
These snapshots are converted to XML documents using a common format,
and where possible common concepts.
The documents are subsequently merged
into a single XML document based on a common key, i.e. the SIL language code[2] or
a TDS specific code.
A future enhancement is to migrate to a native XML database to increase XQuery performance.
The typology documents, together with meta-data describing their semantic contents, forms
the basis of the TDS web interface.
This interface, consisting of the data integration, querying and navigation subsystems,
is served by NetKernel.
"NetKernel allows us to describe the various interactions between the user and the database
in a declarative
manner" says Menzo Windhouwer the Principal System Designer, who concludes
"this makes changing the system easy."
When asked about the key factors had contributed to the success of the project Windhouwer stated,
"Out-of-the-box, NetKernel provides us with a wide range of state-of-the-art
XML technologies without the need to learn APIs and delve into the inner details of the
libraries providing this functionality."
The first prototype system went online[3] during 2005.
The next phase will see the overhaul of the system in response to the user feedback.
And will also focus more on performance with the switch to an XML database driven XQuery engine.
Further integration with semantic web technology will be employed to allow the building
and answering of semantically sound queries.
"We’re confident that NetKernel is a fertile
development and deployment environment," enthuses Windhouwer,
"it is allowing us to easily refactor the system whilst continually
improving and extending it."
References
[1] OLAC Linguistic Subject Vocabulary, Open Language Archives Community (OLAC)
http://www.language-archives.org/REC/field.html#typology
[2] Ethnologue, Languages of the World, Summer Institute of Linguistics (SIL)
http://www.ethnologue.com/
[3] Online Typological Database System (TDS)
http://languagelink.let.uu.nl/tds/
or http://www.hum.uva.nl/tds/