The KnownSpace Datamanager
"Halfway To Anywhere"

Linking Pages Into a Space

The design of today's operating systems dates from a time when computers were much less powerful than today. Consequently, page attributes are few, fixed, and uniform over the set of all pages (typically: name, location, date, size, and type). To make things simple for the computer, pages are taken to be black boxes, users as sources of random commands, and the desktop as a tree. Users are therefore forced to label each black box and place that black box in a fixed location in a strict (and effectively linear) hierarchy.

Instead, the system should try to relieve the user of some of that burden by:

* Analyzing pages so that the system as well as the user has some idea of the many attributes of the pages, and so some idea of the relative similarity of different pages.
* Embedding pages in a memorable, perceptible, and interactive space so that page similarity can be indicated visually.
* Analyzing the user's behavior so that each search is personalized to the user's demonstrated interests.
* Autonomously accumulating new candidate pages so that the user is aided with finding new information.
To accomplish these aims the system must analyze and store much more information about its pages, its user, and its desktop than operating systems do today. It must build a database.

Each page within the database has a number of attributes attached to it. These attributes record the page's many categories, how the user tends to use the page, and where the page's displayed form fits in a space of other pages.

Here are some example attributes:

Attributes associated with the page's category:
* what type of page it is---text, image, audio, video, webpage, executable, composite page, a cluster of pages, and so on,
* how big it is,
* when it was created, downloaded, or saved,
* whether another page within the database points to it,
* what other pages or types of pages are associated with the page in content, structure, type, or date,
* if it's an executable page, when last it was executed and what page it was executed on,
* if it's a webpage, whether it points to other pages within the database and if so, which pages it points to,
* if it's a composite page, whether it contains images, and if so, how big they are, and how many there are,
* if it's a text page, what proper names it contains, if any, what email addresses, http references, phone numbers, and so on,
* if it's a text page, what title it has, if any, and what its word distribution is.

Attributes associated with the user's use of the page:
* the name of the page, whether assigned by the user or generated by the system,
* any comments the user has attached to the page,
* any page clusters the user puts the page in,
* how often the user has looked at it, and for how long,
* how often its been used in searches,
* what other pages or types of pages the user examines at around the same time as this page,
* whether it was created by the user, saved by the user, fetched by the system, or saved by some other program (and if so, which program),
* what program last accessed it, and when,
* how often it's been printed, and when (if ever),
* if it has been mailed, or faxed, or posted, and if so, to whom, and when,
* if it was edited or otherwise altered, when last it was altered and by which program, and where its earlier versions are,
* how much it has changed since it was last altered, if ever.

Attributes associated with the page's display on the screen:
* What its displayed form looks like (color, shape, appearance, transparency, z-order, blink rate, associated animations, sounds, and so on),
* where its displayed form is normally located on the screen,
* whether the user has ever moved its displayed form around on the screen, and if so, when and to where,
* what other displayed forms are close to its displayed form on the screen.

Attributes of the attributes of a page:
* how many attributes a page has attached to it,
* a list of the attributes attached to a page,
* which attributes are more important than which other attributes in different situations,
* which of the attributes are multi-valued (like the list of pages that point to a page), which are single-valued (like the page's size), which are continuous (the time the page was created), and which are discrete (the number of words on the page).

There could be tens of thousands of pages for larger databases, and some of those pages might have hundreds of attributes, each of which may have dozens of values. The sheer size of the database alone is a major issue. The problem is further compounded by the dynamism and non-uniformity of the database:

* Any two pages could differ in the set of attributes attached to them (for example, an executable and a text page might have little in common).
* An attribute attached to a particular page might have its own set of values that are not shared with the same attribute of another page (for example, two webpages will likely point to two totally different sets of other webpages).
* At any time the system could add new attributes to any page (for example, a new attribute might be that the page was just used in a search).
* At any time the system could add new pages (with entirely new attributes).