A Science Analogy

In sketching an architecture for a flexible data manager it's helpful to look at other (artificial) systems that have been successful as adaptive classifiers. The pre-eminent such system is the scientific enterprise itself. Scientists try to understand complex systems (the universe, the human mind, plate tectonics, whatever) just as data managers would try to understand a complex system (a human user, the universe of data, other data managers).

Scientists work in an open society. Experiments can be secret but no result will be accepted until it is published and thoroughly replicated and investigated by any other scientist who cares to do so. Publication itself only happens after a few other scientists in the area accept the result. Similarly, anyone can add anything to their personal data manager, but if they hope to have those additions used by others they must show that the additions will be useful to others.

All scientists have their own slightly modified views of what's important and how to look for it, and no scientist knows very much about the current scientific worldview or has enough time or knowledge to fathom it all. Similarly, no data manager can know very much about the problem under study because the problems are so much more complex than the systems trying to solve them.

Each scientist's personality alters perceptions of what's important. So two scientist can see exactly the same phenomenon and notice different things about it. In a sense, they are striking the phenomena against previously-built structures in their mind and looking for sparks. Wherever the sparks fly is where they focus their attention, ignoring all other possible dimensions of the phenomena under study. Similarly, data managers needn't all have the same structure; it's likely to be beneficial for several of them to collaborate to solve a classification problem.

Scientists have a reasonably well-accepted shared worldview and they suggest hypotheses to explain discrepancies between the predictions of their worldview and the complex system under study. Scientists also need to communicate with each other to share information and test hypotheses. Further, there are many scientists, each with their own agendas clumped into cooperating, communicating, and competing units. The units themselves clump into cooperating, communicating, and competing units.

Scientists also clump into groups in a different way--some worry mostly about generating new hypotheses (the theorists), some worry mostly about testing those hypotheses (the experimentalists), some worry mostly about integrating those hypotheses into the worldview (the generalists). Through this complex web of activity, scientists build up an understanding of the universe they inhabit. And they have been remarkably effective considering the difficulty of the task.

This sort of system is not restricted to scientists organizing themselves to attack complex problems, it is similar to other adaptive systems on other levels of complexity. For example, the molecules inhabiting a cell act in similar ways. No single molecule is in charge in the cell. No single molecule grades the other molecules on their performance. No single molecule alone carries out the metabolism of the cell, and so on.

This series of observations leads to two analogies: one is to the structure of the data manager: it should be composed of multiple independent actors, each of which is extremely limited, and so on, but the other is that programming a data manager can also be composed of multiple independent actors, each of which is limited, and so on. Both the structure and the programming of data managers can be modeled along these lines.