OLTP, OLAP Database Modelling on a Kaggle Dataset
In this article, I took a Kaggle dataset and tried to design data models for OLTP (Online transaction processing) and OLAP (Online analytical processing) databases.
The dataset I chose for this purpose is the credit card transactions which are synthetically generated. It contains the transaction, customer and merchant details.
https://www.kaggle.com/kartik2112/fraud-detection
What is an OLTP Database?
- The data here are basically from active/operational systems or from applications in real time
- It is designed for fast write performance (large number of short, online, real time transactions like insert, update and delete)
- Completely normalized (usually in 3NF)
- Example: Used in bank transactions, ticket booking systems etc.
From our example dataset, we can safely derive the below design for OLTP database
One customer can have one and only one address, while one address can point to one or many customers who can possibly live in the same address.
One address can have one and only one city, while a city can have many addresses.
One customer can initiate one or many transactions, while one transaction must have one and only one customer.
One Transaction must have one and only one merchant, while a merchant can be involved in one or many transactions.
What is an OLAP Database?
- The data here are basically from the OLTP databases
- Effectiveness measured in the terms of response time (deals with historical data)
- Data stored in multi-dimensional schemas (star, snowflake, galaxy/fact constellation etc.)
- Main use cases can be business intelligence reporting, business planning etc. For example, the OLAP database in our case below, can answer the question, what was the total transaction amount of a customer “xyz” from a city “abc”, four years ago?