Use of storedObjects
The purpose of this document is to show how a different Persistent
Storage Engine could be linked to Irf.
Would sO be OK for IRF ?
Is partial materialization available ?
Yes, thanks to the parameter DB.FLAT for the restore() method.
When an object has been restored this way, its inner references can't
be resolved, they first have to be restored with restoreMore(). With
a trick (getOID() first), restore() can be called on this reference
and completely restores the hierarchy from this point, because
restoreMore() only rematerializes one level of indirections (same as
restore() with DB.FLAT).
arrays ?
They are correctly handled by sO, they are even optimized, so
that you can for instance index fields of classes stored within the
array.
Vectors, Hashtables ?
At first, IRF was using regular Java Vectors and
Hashtables. But efficiency matters made us switch to IrfHashtables and
FeatureLists. Thus, we don't have to handle the regular classes
anymore. But it may be nice to be able to use them with a different
storage mechanism, so how does so manage with them ? First, both of
them are handled, but it looks like there's a problem for the use we
would make of them in IRF (Vectors used as values IN
hashtables). Storing redefined Vectors (ie FeatureLists) in regular
Hashtables isn't a problem. Redefining Hashtables seems to be: first
drafts compile but crash at runtime.
Efficiency ?
Definitely very fast. A little overhead at the beginning and
at the end, due to the DB management, but then it's nearly as fast as
direct work in RAM.
Current State
An application managing an index in which Docs containing Des can be
indexed with an IF class.
To Do:
-
Redefine hashCode() and equals()
in the Des and Docs (in the so test application) so that they behave
the same as in IRF.
Done. No problem.
-
Add DeInterns.
Actually, instead of appearing as keys, they will appear as
values in the hashtable, with a reference to the IrfVector, or they
will even be incorporated to IrfVector (only score and
numberOfFeatures to manage) if IoAddrIntern can be handled the same
way.
-
Test retrieval after those features are added.
without DeIntern, it already works fine: redefinition of
hashCode() and equals() was enough.
-
Write an efficient getActualKey().
Can be avoided with the method described above.
-
Avoid complete materialization during score computation (it works
currently in IRF thanks to the OID comparison)?
Still to do.
-
Test restore/restoreMore
OK. These two methods are going to allow the previous topic to
be completed.
All the files are available in the directory, they are:
-
De.java: A very basic implementation of a DataElement. It mainly
consists in a String.
-
Doc.java: Very simple De collection.
-
IF.java: The IndexingFeature class.
-
Index.java: Contains the complete hierarchy contained for IRF in
Index, IdxIntern, PDKC.
-
IrfHashtable.java: A translation for sO of the IrfHashtable.
-
IrfVector.java: A dedicated Vector.
-
MAIN.SETUP: The schema file.
-
SmallIrf.java: A small application that creates Docs with Des, stores
them in an index and then retrieves them, eventually after shutdown
(hopefully).
What would the architecture be to use sO for IRF?
Storing an object in sO and restoring it is quite
straightforward, but means talking to the local DBClient. This one
should either be in a kind of broker or in the IrfManager. It may also
be the PersistentObjectManager that would take care of this aspect,
restoring the objects in a flat or regular way depending on the
buildX() method called. Thus, it would also be very close to the
current architecture. The main difference would be in the proxy
mechanism, and there are two possibilities:
Completely remove the proxies.
Then, the real objects must manage
themselves their own state (knowing if every field is around or not)
with extra code. This solution works, I think, but isn't very nice and
doesn't follow the usual philosophy we had for IRF.
Keep the proxies.
They just have a kind of flag to know whether
they are around, completely around (recursively) or partially
around. They talk to broker that make the DBClient get the objects
needed. This solution fits most in the current architecture. The code
for the proxies may have to evolve a little, but real objects can
stay, and the global communication (proxy->broker->DB->realobject) is
the same.
A few useful command lines.
rm *.class ; rm *_DB.java ; rm MAIN.DB MAIN.DBM CLIENT.DBM
javac *.java ; java org.storedobjects.db.DBMagic;
rm MAIN.DB ; java SmallIrf create 2 2
java SmallIrf ret de1
java org.storedobjects.db.DBDebug > ! debug.txt
Last updated: Tuesday, 01-Aug-2000 12:34:41 UTC
Date created: Monday, 31-Jul-00
For further information contact Paul Over (over@nist.gov) with
copy to Darrin Dimmick (ddimmick@nist.gov)