Type 2 Dimensions

Type 2 dimensions are where the elements are intended to have a lifespan rather than be a single element for each natural key

This requires more than just the natural key to be applied to exist in the relationship between dimension objects and fact objects.

This does add an overhead to the fact processing as it cannot readily be done without reference to the dimension object.



e.g.

An object containing addresses of accounts

Account no

Address

Startpoint

DHkey

Generation

1

1 Acacia Avenue

01/01/2000

c4ca4238a0b923820dcc509a6f75849b

1

2

7 Block Lane

01/01/2000

c81e728d9d4c2f636f067f89cc14862c

1

1

101 High Street

02/02/2010

c4ca4238a0b923820dcc509a6f75849b

2

3

DUMMY

02/03/2010

eccbc87e4b5ce2fe28308fd9f2a7baf3

0

3

12 Acme Road

03/04/2011

eccbc87e4b5ce2fe28308fd9f2a7baf3

1



We can infer from this object that the valid addresses are:

Account 1: 1 Acacia Avenue from 01/01/2000 until 01/02/2010 and 101 High Street from 02/02/2010 onwards

Account 2: 7 Block Lane from 01/01/2000 onwards

Account 3: DUMMY from 02/03/2010 until 02/04/2011 and 12 Acme Road from 03/04/2011 onwards



That is the generation is valid until the startpoint of the next generation.



(in this case the start point is the start of the date (00:00:00) and the end point is the end of the previous date (23:59:59) to the startpoint of the following generation)



This example is limited to date however a lower granularity of time would be the norm.



Current Time Event



The current time event is where an instance of a fact object is considered to have occurred at current time

That the dimensional reference is the latest at process time

The instance has no element that denotes the actual event time that is to be considered.

Consider the fact object

Account no

invoice

Value

Value to date

1

1

100

100

2

1

20

20

3

1

30

30



And the presented Dimension Object (current)



Account no

Address

Startpoint

DHkey

Generation

2

7 Block Lane

01/01/2000

c81e728d9d4c2f636f067f89cc14862c

1

1

101 High Street

02/02/2010

c4ca4238a0b923820dcc509a6f75849b

2

3

No Fixed Abode

02/03/2010

eccbc87e4b5ce2fe28308fd9f2a7baf3

1



In the same way as type 1 previously we hash the key elements of the fact instances



AccDHKey

invoice

Value

Value to date

c4ca4238a0b923820dcc509a6f75849b

1

100

100

c81e728d9d4c2f636f067f89cc14862c

1

20

20

eccbc87e4b5ce2fe28308fd9f2a7baf3

1

30

30



However we require more than just the hash key to identify the dimension instance when considering type 2 reference

In this case we MUST reference the dimension object

And look up the generation and apply that to the facts also – if we fail to find a generation as in the case of poor referential data then we apply a generation of zero – in the knowledge that this instance will be generated by the bucket process.

So we get as a result the fact instances of:



AccDHKey

AccDHGen

invoice

Value

Value to date

c4ca4238a0b923820dcc509a6f75849b

2

1

100

100

c81e728d9d4c2f636f067f89cc14862c

1

1

20

20

eccbc87e4b5ce2fe28308fd9f2a7baf3

1

1

30

30



Subsequently the resultant fact object would relate to the base dimension (i.e. not the presented latest) by Dimensional key and generation with respect to type 2.

This same fact object is related to the presented latest dimension object by Dimensional key only with respect to type 1.



This way we can view the data as both type 1 and type 2 without any reprocessing.

This also allows for What if? Scenarios to be applied to the data without reprocessing.

Point in Time Event

A point in time event is where a fact instance has a specified point in time when we are to consider it occurred.

This is somewhat more difficult and requires more processing

Considering the same dimensional object we need to build an object where we can relate the event time to a point in time on the dimensional history.



A temporal dimension object such as the following:

Account no

Address

Startpoint

DHkey

Generation

endpoint

1

1 Acacia Avenue

01/01/2000

c4ca4238a0b923820dcc509a6f75849b

1

01/02/2010

2

7 Block Lane

01/01/2000

c81e728d9d4c2f636f067f89cc14862c

1


1

101 High Street

02/02/2010

c4ca4238a0b923820dcc509a6f75849b

2


3

No Fixed Abode

02/03/2010

eccbc87e4b5ce2fe28308fd9f2a7baf3

1

02/04/2011

3

12 Acme Road

03/04/2011

eccbc87e4b5ce2fe28308fd9f2a7baf3

2




And the fact object:



Account no

invoice

Tax point

Value

Value to date

1

1

01/02/2000

100

100

2

1

01/02/2000

20

20

3

1

04/04/2011

30

30



Where there is no end point that can be considered to be forever (or a significant time in the future)



Again we hash the key and relate that hashed key to the temporal dimension by dimensional key and event point in time (tax point) being between startpoint and endpoint

Giving us

AccDHKey

AccDHGen

Tax point


invoice

Value

Value to date

c4ca4238a0b923820dcc509a6f75849b

1

01/02/2000


1

100

100

c81e728d9d4c2f636f067f89cc14862c

1

01/02/2000


1

20

20

eccbc87e4b5ce2fe28308fd9f2a7baf3

2

04/04/2011


1

30

30



Again we can relate this fact object to the base dimension as type 2 or the presented latest dimension as type 1.

The temporal dimension can be discarded post processing.


Hash Key and Generation Together

Additionally it may be of some use to hash the key again together with the generation

This has the advantage of a single relationship between fact object and dimension object.



For instance:

Dimensional object

Account no

Address

Startpoint

DHkey

Gen

AccDHGenKey

1

1 Acacia Avenue

01/01/2000

c4ca4238a0b923820dcc509a6f75849b

1

15ab8357abeb6eacf1b591a1b5b1aedd

2

7 Block Lane

01/01/2000

c81e728d9d4c2f636f067f89cc14862c

1

eaf418aefb7a9c97a6abeff9e7d827a0

1

101 High Street

02/02/2010

c4ca4238a0b923820dcc509a6f75849b

2

80cf219b4993391f6a6a0c08096c5ecb

3

DUMMY

02/03/2010

eccbc87e4b5ce2fe28308fd9f2a7baf3

0

c2f7da772962b648f4f7eea8c791a118

3

12 Acme Road

03/04/2011

eccbc87e4b5ce2fe28308fd9f2a7baf3

1

24eb93af4cdb15fb538b115674a21f7f



Fact Object

AccDHKey

AccDHGenKey

inv

Val

Val

c4ca4238a0b923820dcc509a6f75849b

80cf219b4993391f6a6a0c08096c5ecb

1

100

100

c81e728d9d4c2f636f067f89cc14862c

eaf418aefb7a9c97a6abeff9e7d827a0

1

20

20

eccbc87e4b5ce2fe28308fd9f2a7baf3

24eb93af4cdb15fb538b115674a21f7f

1

30

30


This again adds some complexity – but the advantage is a single element relationship between the fact object and dimensional object for both type 1 and type 2 considerations.

 

Contents   Next