NetKernel Case Study: Typographical Database

Typographical Database

Challenge

The Netherlands Graduate School of Linguistics required a Typological Database System to provide access to multiple semi-structured typological databases through a single unified interface.

Their objective was to facilitate the discovery and verification of typological implications through sophisticated cross-database searches.

Solution

NetKernel was used to combine various databases underneath a unified declarative querying mechanism that presents the resulting information in XML documents. The resulting XML documents and information about the query results sets are presented visually to the user through a flexible web interface.

Result

NetKernel provided a fertile, flexible development environment supporting rapid prototyping and subsequent development of the Typological Database System.

Netherlands Graduate School of Linguistics Typological Database System (TDS)

The TDS project

The TDS project is managed by a research group in the Netherlands Graduate School of Linguistics (LOT) and includes members from the universities of Amsterdam, Leiden, Nijmegen, and Utrecht. The initial phase commenced in September 2000, implementation in May 2004, and completion is scheduled for May 2007. Funding is provided by the Netherlands Organization for Scientific Research (NWO) and the participating universities.

The Typological Database System

The field of linguistic typology is defined as:

The study of the similarities and differences between languages, regardless of any genetic relation, and the resulting categorization of language into 'types'.[1]

Typological linguistic research involves the collection of information about linguistic phenomena for a representative sample of the languages of the world. Many researchers have stored these collections in a digital form. The purpose of the TDS project is to make the typological collections of participating institutes available through one interface and to allow sophisticated searches across collection boundaries.

At the lower levels the system presents the classic data integration and data warehousing challenges. At a higher level, closer to the user, the system resembles a realization of the vision for the semantic web. Concepts represented in the various databases are often subject to the colliding world views of both the collection owners and the individual user. Knowledge representation technologies (e.g. ontologies) describe the sometimes partial relationships between concepts envisioned to help the user find interesting information. The challenge for the TDS project is to bridge the gap between the semantic level and the collected data while maintaining the integrity of the semantics intact.

TDS architecture
Figure 1: TDS Architecture

The typology collections are static by nature. The parts under development are commonly only released to the outside world after the primary user group has published about the topic. Thus the TDS receives periodic snapshots of information. These snapshots are converted to XML documents using a common format, and where possible common concepts. The documents are subsequently merged into a single XML document based on a common key, i.e. the SIL language code[2] or a TDS specific code. A future enhancement is to migrate to a native XML database to increase XQuery performance.

The typology documents, together with meta-data describing their semantic contents, forms the basis of the TDS web interface. This interface, consisting of the data integration, querying and navigation subsystems, is served by NetKernel. "NetKernel allows us to describe the various interactions between the user and the database in a declarative manner" says Menzo Windhouwer the Principal System Designer, who concludes "this makes changing the system easy." When asked about the key factors had contributed to the success of the project Windhouwer stated, "Out-of-the-box, NetKernel provides us with a wide range of state-of-the-art XML technologies without the need to learn APIs and delve into the inner details of the libraries providing this functionality."

The first prototype system went online[3] during 2005. The next phase will see the overhaul of the system in response to the user feedback. And will also focus more on performance with the switch to an XML database driven XQuery engine. Further integration with semantic web technology will be employed to allow the building and answering of semantically sound queries. "We’re confident that NetKernel is a fertile development and deployment environment," enthuses Windhouwer, "it is allowing us to easily refactor the system whilst continually improving and extending it."

References

[1] OLAC Linguistic Subject Vocabulary, Open Language Archives Community (OLAC) http://www.language-archives.org/REC/field.html#typology

[2] Ethnologue, Languages of the World, Summer Institute of Linguistics (SIL) http://www.ethnologue.com/

[3] Online Typological Database System (TDS) http://languagelink.let.uu.nl/tds/ or http://www.hum.uva.nl/tds/

1060 and NetKernel are respectively registered trademark and trademark of 1060 Research Limited
© 2002-2008, 1060 Research Limited