Thursday, May 1, 2008

Common indexing techniques for Datawarehouses

B-Tree Index: Two representations (rowid and bitmap) are implemented at the leaves of the index depending on the cardinality of the data.

Advantages
- It speeds up knownqueries.
- It is well suited for high cardinality.
- The space requirement is independent of the cardinality of the indexed column.
- It is relatively inexpensive when we update the indexed column since individual rows are locked.

Disadvantages
- It performs inefficiently with low cardinality data
- It does not support ad hoc queries. More I/O operations are needed for a wide range of queries.
- The indexes can not be combined before fetching the data. ·

Databases that implement this type of indexing
Most of commercial products (like Oracle, Informix, Red Brick) implement this type of indexing

Pure Bitmap Index: An array of bits is utilized to represent each unique column value of each row in a table, setting the bits corresponding to the row either ON(valued 1) or OFF(valued 0). The equality encoding scheme is used.

Advantages
- It is well suited for lowcardinality columns.
- It utilizes bitwise operations.
- The indexes can be combined before fetching raw data.
- It uses low space
- It works well with parallel machine.
- It is easy to build.
- It performs efficiently with columns involving scalar functions (e.g., COUNT).
- It is easy to add new indexed value.
- It is suitable for OLAP.

Disadvantages
- It performs inefficiently with high cardinality data.
- It is very expensive when we update index column. The whole bitmap segment of the updated row is locked so the other row can not be updated until the lock is released.
- It does not handle spare data well.

Databases that implement this type of indexing
- Oracle
- Informix
- Sybase
- Informix
- Red Brick
- DB2

Encoded Bitmap Index: The index is the binary Bit-Sliced Index built on the attribute domain

Advantages
- It uses space efficiently.
- It performs efficiently with wide range query.

Disadvantages
- It performs inefficiently with equality queries.
- It is very difficult to find a good encoding scheme.
- It is rebuilt every time when a new indexed value for which we run out of bit to represent is added.

Databases that implement this type of indexing
- DB2

Bitmap Join Index: The index is built by restriction of a column on the dimension table in the fact table.

Advantages
- It is flexible.
- It performs efficiently.
- It supports star queries.

Disadvantages
- The order of indexed column is important.

Databases that implement this type of indexing
- Oracle
- Informix
- Red Brick

Projection Index: The index is built by storing actual values of column(s) of indexed table.

Advantages
- It speeds up the performance when a few columns in the table are retrieved.

Disadvantages
- It can be used only to retrieve raw data (i.e., column list in selection).

Databases that implement this type of indexing
- Sybase