DataIO package

One problem with the design of the data access package is that we don't know, a priori, the format of the data or it's source. It could be stored in different file formats, or in some kind of database (with a unique command syntax), or somewhere out on the internet. One would like to be able to handle all of these possibilities. Additionally, in order for the framework to support different problem domains, it needs some way of identifying what the data is.
Our solution to these problems is to define an abstract interface that represents what NVisF expects from the data source. Then, for each different source, a concrete class implements the functionality of that interface in a sensible way for that source. Currently only reading operations have been defined, and are contained in the DataReader interface. This interface defines three types of methods: connection methods, metadata methods, and data access methods.

Connection methods deal with the state of the connection to the data source. Put simply, data can be read from an open connection, but not from a closed connection: you have to open before you can read. How openConnection(source) is implemented can vary; for example, a file reader might read the entire contents into memory, or might simply open the file for read access and wait for requests in order to only read the requested data. The implementing class decides the best way to provide the data.
Classes implementing the DataReader interface are required to construct a TypeInfo object that describes their data. Data is assumed to be object-oriented; that is, there are data objects that have a name and a set of fields, each of which has a name, a type, and a value appropriate to that type. This structure is represented by the TypeInfo object that contains a list of ObjectInfo objects, one for each type of data object. Each ObjectInfo object has a list of DataField objects that describe the fields defined for that data object type. Currently the only supported data field types are SCALAR_FIELD_ELEMENT (that is, a float) and VECTOR_FIELD_ELEMENT (an array of floats) - this is obviously extensible. Clients can request a copy of the TypeInfo object through the DataReader interface.
Data access occurs via the methods getScalar(predicate, dataField) and getVector(predicate, dataField) (additional methods would be needed to support other data types). The dataField parameter specifies which data field to get (names are available from the metadata in the TypeInfo object). The predicate is used to select a subset of the available data. For example, some meaningful predicates are "all gas particles with temperature over 5000 degrees" or "all star particles within 100 kpc of this point". Supporting predicates requires some sort of query language to define your requests. Since we haven't selected a query language yet, real predicates aren't supported and the only valid predicates are a name of a data object, which returns data for all objects of that type. Support for real queries will come later.
An improvement planned in the next version is to define standard data types, such as POSITION, VELOCITY, etc. This standardization helps with name resolution between application writers and authors of data readers.

Since a DataReader provides access to all different kinds of data sources, there are a wide variety of potential errors. Currently these errors are caught and reprocessed into one of the provided DataIOException classes. Some subclasses exist, for example an OpenConnectionException and NoConnectionException used when trying to do something not possible in the current connection state. Exceptions that don't fit a provided category yield a plain DataIOException. One exception (heh) to this is that errors in processing a predicate produce a ParseException, which is part of the standard Java language.


Home   NVisF   Design
This page last updated Jan 2 2001.