Okay, slow day today, as I took some time out to get my talk ready for this afternoon. In the meantime, it gives me a chance to reflect on a few ideas. For example, I’ve seen many talks this week on data sharing, and had some very juicy discussion over dinner last night with Bryan. Two key challenges seem to stand out in this work: (1) the lack of shared ontology between scientific sub-communities who want to share datasets, and (2) the inevitable separation of data from commentaries on that data (which includes problems of knowledge provenance, and meta-data management). The latter problem is endemic because the data sources and the commentary sources are in different communities, operate asynchronously, and often have very different goals.
There seem to be plenty of people considering the technical aspects of these problems, exploring the use of ontology description languages (e.g. OWL and its relatives), and markup languages (e.g. the many application schemas of GML). But very little emphasis has been placed on the human side of this problem. First, the sociological problem of ontological drift has been ignored – the problem that objects gradually change their meaning as they pass between difference communities. (Chalmers gives an overview of various responses to this problem).
A key problem here is that scientific communities have collaborated reasonably effectively in previous centuries through the use of boundary objects (first characterized by Leigh Star). Boundary Objects are…
… both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use. They may be abstract or concrete. They have different meanings in different social worlds but their structure is common enough to more than one world to make them recognizable means of translation. [Wikipedia defn]
Examples include taxonomies, maps, scientific methods, etc. I talked a little with Leigh about boundary objects in the early 90’s, and I observed that boundary objects are very effective because they represent a minimal shared understanding – just enough so that different communities have some common frames of reference, but no so much than anyone has to work hard on ensuring they have the same mental models.
The problem is that boundary objects fool us into thinking that we share much more meaning than we really do. When we try and embed our boundary objects into computer systems (thinking that we have a shared semantics), we bind the boundary objects to a particular interpretation, and hence they lose their plasticity. As soon as we do this, they are no longer boundary objects. They are brittle reflections of the real boundary objects. Whereas before, the objects themselves could be adapted to each communities needs, now the communities themselves must do the adapting, to the fixed definitions of these frozen objects. No wonder everyone finds this hard!
So what do we do? I don’t really know, but I have some ideas. Lots of small local ontologies, with loose flexible mappings, between them created on the fly by the communities that use them, social tagging style? Or heavier weight tools from psychology for teasing out mappings between terminologies and concepts used by different expert communities, such as repertory grids (e.g. Shaw and Gaines developed a technique applying Rep Grids for exploring conflicting terminology). Or perhaps more flexible ontology languages built over paraconsistent logics? I’ve played with all these ideas in the past, and think they all have some value. How to exploit them in practical data sharing systems remains a big open question.