Type 2 dimensions are where the elements are intended to have a lifespan rather than be a single element for each natural key
This requires more than just the natural key to be applied to exist in the relationship between dimension objects and fact objects.
This does add an overhead to the fact processing as it cannot readily be done without reference to the dimension object.
e.g.
An object containing addresses of accounts
Account no |
Address |
Startpoint |
DHkey |
Generation |
1 |
1 Acacia Avenue |
01/01/2000 |
c4ca4238a0b923820dcc509a6f75849b |
1 |
2 |
7 Block Lane |
01/01/2000 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
3 |
DUMMY |
02/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
0 |
3 |
12 Acme Road |
03/04/2011 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
We can infer from
this object that the valid addresses are:
Account 1: 1 Acacia Avenue from 01/01/2000 until 01/02/2010 and 101 High Street from 02/02/2010 onwards
Account 2: 7 Block Lane from 01/01/2000 onwards
Account 3: DUMMY from 02/03/2010 until 02/04/2011 and 12 Acme Road from 03/04/2011 onwards
That is the generation is valid until the startpoint of the next generation.
(in this case the start point is the start of the date (00:00:00) and the end point is the end of the previous date (23:59:59) to the startpoint of the following generation)
This example is limited to date however a lower granularity of time would be the norm.
Current Time Event
The current time event is where an instance of a fact object is considered to have occurred at current time
That the dimensional reference is the latest at process time
The instance has no element that denotes the actual event time that is to be considered.
Consider the fact object
Account no |
invoice |
Value |
Value to date |
1 |
1 |
100 |
100 |
2 |
1 |
20 |
20 |
3 |
1 |
30 |
30 |
And the presented Dimension Object (current)
Account no |
Address |
Startpoint |
DHkey |
Generation |
2 |
7 Block Lane |
01/01/2000 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
3 |
No Fixed Abode |
02/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
In the same way as type 1 previously we hash the key elements of the fact instances
AccDHKey |
invoice |
Value |
Value to date |
c4ca4238a0b923820dcc509a6f75849b |
1 |
100 |
100 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
20 |
20 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
30 |
30 |
However we require more than just the hash key to identify the dimension instance when considering type 2 reference
In this case we MUST reference the dimension object
And look up the generation and apply that to the facts also – if we fail to find a generation as in the case of poor referential data then we apply a generation of zero – in the knowledge that this instance will be generated by the bucket process.
So we get as a result the fact instances of:
AccDHKey |
AccDHGen |
invoice |
Value |
Value to date |
c4ca4238a0b923820dcc509a6f75849b |
2 |
1 |
100 |
100 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
1 |
20 |
20 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
1 |
30 |
30 |
Subsequently the resultant fact object would relate to the base dimension (i.e. not the presented latest) by Dimensional key and generation with respect to type 2.
This same fact object is related to the presented latest dimension object by Dimensional key only with respect to type 1.
This way we can view the data as both type 1 and type 2 without any reprocessing.
This also allows for What if? Scenarios to be applied to the data without reprocessing.
Point in Time Event
A point in time event is where a fact instance has a specified point in time when we are to consider it occurred.
This is somewhat more difficult and requires more processing
Considering the same dimensional object we need to build an object where we can relate the event time to a point in time on the dimensional history.
A temporal dimension object such as the following:
Account no |
Address |
Startpoint |
DHkey |
Generation |
endpoint |
1 |
1 Acacia Avenue |
01/01/2000 |
c4ca4238a0b923820dcc509a6f75849b |
1 |
01/02/2010 |
2 |
7 Block Lane |
01/01/2000 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
|
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
|
3 |
No Fixed Abode |
02/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
02/04/2011 |
3 |
12 Acme Road |
03/04/2011 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
2 |
|
And the fact object:
Account no |
invoice |
Tax point |
Value |
Value to date |
1 |
1 |
01/02/2000 |
100 |
100 |
2 |
1 |
01/02/2000 |
20 |
20 |
3 |
1 |
04/04/2011 |
30 |
30 |
Where there is no end point that can be considered to be forever (or a significant time in the future)
Again we hash the key and relate that hashed key to the temporal dimension by dimensional key and event point in time (tax point) being between startpoint and endpoint
Giving us
AccDHKey |
AccDHGen |
Tax point |
|
invoice |
Value |
Value to date |
c4ca4238a0b923820dcc509a6f75849b |
1 |
01/02/2000 |
|
1 |
100 |
100 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
01/02/2000 |
|
1 |
20 |
20 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
2 |
04/04/2011 |
|
1 |
30 |
30 |
Again we can relate this fact object to the base dimension as type 2 or the presented latest dimension as type 1.
The temporal dimension can be discarded post processing.
Hash Key and Generation Together
Additionally it may be of some use to hash the key again together with the generation
This has the advantage of a single relationship between fact object and dimension object.
For instance:
Dimensional object
Account no |
Address |
Startpoint |
DHkey |
Gen |
AccDHGenKey |
1 |
1 Acacia Avenue |
01/01/2000 |
c4ca4238a0b923820dcc509a6f75849b |
1 |
15ab8357abeb6eacf1b591a1b5b1aedd |
2 |
7 Block Lane |
01/01/2000 |
c81e728d9d4c2f636f067f89cc14862c |
1 |
eaf418aefb7a9c97a6abeff9e7d827a0 |
1 |
101 High Street |
02/02/2010 |
c4ca4238a0b923820dcc509a6f75849b |
2 |
80cf219b4993391f6a6a0c08096c5ecb |
3 |
DUMMY |
02/03/2010 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
0 |
c2f7da772962b648f4f7eea8c791a118 |
3 |
12 Acme Road |
03/04/2011 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
1 |
24eb93af4cdb15fb538b115674a21f7f |
Fact Object
AccDHKey |
AccDHGenKey |
inv |
Val |
Val |
c4ca4238a0b923820dcc509a6f75849b |
80cf219b4993391f6a6a0c08096c5ecb |
1 |
100 |
100 |
c81e728d9d4c2f636f067f89cc14862c |
eaf418aefb7a9c97a6abeff9e7d827a0 |
1 |
20 |
20 |
eccbc87e4b5ce2fe28308fd9f2a7baf3 |
24eb93af4cdb15fb538b115674a21f7f |
1 |
30 |
30 |
This again adds some complexity – but the advantage is a single element relationship between the fact object and dimensional object for both type 1 and type 2 considerations.