Examples of persistence mechanism evolution
We wanted to see what we would have to do to make IRF use a storage
mechanism different from the basic flat file one currently used. The
two chosen examples were the freely available object databases
storedObjects and Ozone. In both cases we were dealing with the then
(March 2000) current versions of software which is rapidly changing,
so our comments may no longer apply. In any case our findings are
preliminary and are not intended either as a final endorsement or
criticism of these two packages.
Ease of use
Ozone
Ozone can be downloaded from http://www.ozone-db.org and is
100% pure Java.
It installs neatly in a directory with gnumake, assuming the directory
is empty before. It won't install with the regular Sun make utility
with a cryptic error message.
A small tutorial helps for the first steps with Ozone. This system can
be used to develop a persitence based application, but the early
design of the objects must take this into account: objects to be
stored using Ozone have to extend OzoneObject, and will then
necessarily have a proxy. Writing code is tricky with Ozone: an
application is always the client of a server (no possibility of
embedding the server in the client), and these two parts are going to
execute code, so it can be hard to tell whether some code will be
executed in the client or in the server (and it sometimes may be in
both of them at different steps of the program).
In Ozone an interface class exposes all the methods for every object
and you use it without knowing if you're using the real class (that
you have to write) or the proxy class (which is automatically
generated by an Ozone tool).
storedObjects
storedObjects can be downloaded from http://www.jdbms.org and
is 100% pure Java.
storedObjects is very basic to install (javac *.java in the source
directory). It contains lots of example programs and lots of
documentation, but no tutorial (it will one day). At first sight, it
really looks like the commercial product ObjectStore by ObjectDesign,
or at least like version 3. Associated with the projects
files is a schema file on which you run the Schema Generator that
creates two DB representation files (one for the server, MAIN.DBM, and
one for the client, CLIENT.DBM). Afterwards, no post- or preprocessor
is needed to run the application. The server can be embedded in the
client thanks to the "fakeTCP" mode, so that both ends run in the same
JVM.
Performance / suitability
Ozone is based on Java serialization and reflection. Thus, it runs
into the problems we had at the beginning with IRF:
efficiency. Creating, saving, restoring objects is very slow. It can
be improved with the same kind of tricks we used for IRF (tuned
serialization), but there is no guarantee it would go much
further.
storedObjects is very fast. At first sight, about 100 times
faster than Ozone for the same kind of test application. It also
provides several possibilities of indexing the objects with field
values, managing referential integrity, etc.
Achievements
In both cases, we was able to get a test application that manages a
little index containing pseudo indexing features. In both cases,
retrieval works. We had no chance to test whether indexing
possibilities of the two ODBMSs could be used because we have a bigger
issue first: as the indexes we want to be able to manage are huge, we
need a way to have them only partially in memory. That brings the
need for on-demand materialization.
With Ozone, We couldn't write code that would allow us to restore
partially an index in memory, restore the parts needed and get rid of
them when necessary When an Index was restored, it was loaded
entirely. If the memory wasn't big enough, the application
crashed.
storedObjects is designed to allow people to use on-demand
materialization. Thus, we enquired about how we could use it. It
appears that this on-demand materialization didn't work properly with
arrays (heavily used in Vectors and Hashtables). We submitted a bug
report and received a small patch that didn't prove to be enough. So
the storedObjects research stopped with this finding, but as soon as
this problem is fixed, they could be resumed.
What about IRF ?
In no case have we been able to get beyond the test application
with one of the two ODBMSs tried.
With Ozone, the data organization really looks like the one we used to
have in IRF: an interface implemented by both the proxy and the real
class. In order to use it with IRF, the first step would be to cleanly
define which classes need a proxy. Only instances of those classes could
then be easily restored, just knowing their names (which means there will
be a name per object, i.e., more than an OID). Indexes, Documents, and
DEs seem to be the most appropriate classes to have proxies. The other
classes currently with proxies in IRF (IndexingFeatures, FeatureLists)
could be managed directly by Ozone. Ozone will itself ensure unicity of
restored proxies and the restoration of proxy-less objects. This shows the
main drawback for the use of Ozone with IRF: Ozone needs to be taken in
consideration from the beginning of the design phase, and thus doesn't
permit easily to add persistence to an already existing application
With storedObjects, the use of proxies is transparent. You never know
which kind of object you're using, actually because the server manages
this aspect. You can restore objects knowing their OID, or you can
query the collection to find the one you need (knowing, for example,
its class and a field content). In order to correctly manage partial
restoration of objects, we think we would have to define our own Vector
class (like FeatureList), because the storedObjects collections may
not match this need. Otherwise, sO provides us with a hashtable class
that could be enough (no need for redefinition like in IRF,
getActualKey can be emulated with sO indexing/querying facilities).
The work could then be easier than with Ozone. Only problem: unlike
with Ozone, an application with sO always works through a DBClient.
This class allows the user to talk to the DBServer, asking it to
store, restore, delete objects. Thus, all classes managing persistence
projectwide must be aware of this connection. The need for a sOBroker
then begins to appear, keeping track of this collection. There
shouldn't be any need for a broker per class but just a general one, a
bit like like the PersistentObjectManager currently works in IRF.
And now ?
storedObjects seems the best solution for me. On the user
mailing list, after we made our problem known, the project manager
exposed ways that could allow sO to work with huge numbers
of objects. It may be implemented in the future, and the array partial
restoration problem should be addressed quickly, we hope.
The main goal of those tests were to implement a different storage
mechanism to see if IRF easily accepts this change. It couldn't
actually be tested, but at least we discovered the way two (very)
different ODBMS can work. We think IRF should accept such a change
quickly, provided we make it a little more dynamic (kind of
registration for brokers, proxies, etc, so that there's only a class
to change to get a new storage mechanism, not a group of them).
Other References
To get a closer look at what have been done with storedObjects, which
has been the main test, you can read Using
storedObjects for IRF
Last updated: Tuesday, 01-Aug-2000 12:34:28 UTC
Date created: Monday, 31-Jul-00
For further information contact Paul Over (over@nist.gov) with
copy to Darrin Dimmick (ddimmick@nist.gov)