The main principle behind this modelling strategy is to move as much as possible, the processing requirements to the batch process (run once) and away from the presentation process (run many, often and on demand).
This way we improve performance of batch processing and final consumer presentation.
Again object relationships considered as Type 1 are handled more than adequately by this strategy.
Whereas object relationships considered as Type 2 fall somewhat shorter than the ideal due to the requirement for temporal lookup in either batch or presentation.
The strategy of lookup once is not perfect but the more resource frugal – however maintenance of late reference data is an issue.
The implementation of speed objects may go some way to alleviating the downsides of Type 2 objects
A speed object is an object that is rebuilt at batch time either in its entirety or just for events in said batch – whichever is the more relevant.
Let us look again at the example that caused us an issue before
AccDHKey |
AccDHGen |
Tax point |
|
invoice |
Value |
c4ca4238a0b923820dcc509a6f75849b |
1 |
01/01/2010 |
|
1 |
100 |
c4ca4238a0b923820dcc509a6f75849b |
1 |
01/02/2010 |
|
2 |
20 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/03/2011 |
|
3 |
30 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/04/2011 |
|
4 |
50 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/05/2011 |
|
5 |
70 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/06/2011 |
|
6 |
30 |
If rather than looking up the AccDHGen we instead used the epoch (date level for ease) of the tax point.
AccDHKey |
TPepoch |
Tax point |
|
invoice |
Value |
c4ca4238a0b923820dcc509a6f75849b |
14610 |
01/01/2010 |
|
1 |
100 |
c4ca4238a0b923820dcc509a6f75849b |
14641 |
01/02/2010 |
|
2 |
20 |
c4ca4238a0b923820dcc509a6f75849b |
15034 |
01/03/2011 |
|
3 |
30 |
c4ca4238a0b923820dcc509a6f75849b |
15064 |
01/04/2011 |
|
4 |
50 |
c4ca4238a0b923820dcc509a6f75849b |
15094 |
01/05/2011 |
|
5 |
70 |
c4ca4238a0b923820dcc509a6f75849b |
15125 |
01/06/2011 |
|
6 |
30 |
This as with Type 1 consideration is a total disconnect of fact and dimensional objects in batch processing.
We have previously seen the Account object using epoch of instance rather than simple generation
Account no |
Address |
Startpoint |
DHkey |
Generation |
1 |
1 Acacia Avenue |
01/01/2000 |
c4ca4238a0b923820dcc509a6f75849b |
10957 |
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
14642 |
1 |
62 West Wallaby Street |
01/11/2010 |
c4ca4238a0b923820dcc509a6f75849b |
14914 |
1 |
52 Festive Road |
30/04/2011 |
c4ca4238a0b923820dcc509a6f75849b |
15093 |
A
possible speed object for these two fact and dimensional objects
would have the elements DHkey, Generation, TPEpoch
DHkey |
Generation |
TPepoch |
c4ca4238a0b923820dcc509a6f75849b |
10957 |
14610 |
c4ca4238a0b923820dcc509a6f75849b |
10957 |
14641 |
c4ca4238a0b923820dcc509a6f75849b |
14914 |
15034 |
c4ca4238a0b923820dcc509a6f75849b |
14914 |
15064 |
c4ca4238a0b923820dcc509a6f75849b |
15093 |
15094 |
c4ca4238a0b923820dcc509a6f75849b |
15093 |
15125 |
This speed table provides an equality relationship between the fact object and itself and an equality relationship between itself and the dimensional object.
The relationship between this speed object and the fact object may be one to many or one to one and the dimensional object would be one to one.
This strategy eliminates any temporal range based relationship in the presentation layer improving the efficiency of presentation layer and the resolution of final consumer queries.
As this is fact data driven (as with the buckets and dummies) a practical implementation would be to utilize the bucket expanding it to contain the epoch element of the fact instances.
Account no Hkey |
Account No |
Start date |
FctEpoch |
c81e728d9d4c2f636f067f89cc14862c |
1 |
01/02/2000 |
10988 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/02/2000 |
10988 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
3 |
02/03/2010 |
14670 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
01/02/2000 |
10988 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
01/02/2000 |
10988 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
3 |
02/04/2011 |
15065 |
The
bucket now has two uses
To identify failed relationships and generate a dummy instance for referential integrity
To build a speed table in the case of a relationship considered to be of Type 2.
The bucket contains all the fact object keys produced by the current batch so in its simplest form the strategy would be to add new instances to the speed object for contents of the bucket.
This does require a range relationship in the processing – but this is a one-time batch process rather than a many times presentation process.
Assuming the Dimensional object has completed processing and the presentation objects have been generated
Account no |
Address |
Startpoint |
DHkey |
Generation |
endpoint |
1 |
1 Acacia Avenue |
01/01/2000 |
c4ca4238a0b923820dcc509a6f75849b |
10957 |
14641 |
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
14642 |
14913 |
1 |
62 West Wallaby Street |
01/11/2010 |
c4ca4238a0b923820dcc509a6f75849b |
14914 |
15092 |
1 |
52 Festive Road |
30/04/2011 |
c4ca4238a0b923820dcc509a6f75849b |
15093 |
|
2 |
7 Block Lane |
01/01/2000 |
c81e728d9d4c2f636f067f89cc14862c |
10988 |
|
3 |
DUMMY |
02/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
0 |
14668 |
3 |
10 Upping Street |
01/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
14669 |
|
This results in the instances to be inserted into the speed object for the fact instances processed in the current batch.
Care must be taken to avoid duplication of instances in the speed object and the elimination of bad or outdated references.
It may be of some value to recreate the speed table from scratch should processing and batch contingency allow.