Source to Target Mappings
Date Dim¶
Table Name: date_dim
Table Type: Dimension
Description: Each row in this table represents one calendar day.
Load Frequency: Static (load once)
Source: Loaded once when the DW is deployed from a DDL file.
Target Schema: | Target Table: | Target Column: | Target Datatype: | Key: | SCD Type: | Transformation: | Note: |
---|---|---|---|---|---|---|---|
dw_mystore | date_dim | date_key | date | PK | N/A | Using the date datatype for the key as it will allow range restricted scans on fact tables. | |
dw_mystore | date_dim | day_date | date | NK | N/A | Source column is repeated here for user convenience as primary keys are often hidden in BI presentation layers. | |
dw_mystore | date_dim | month_seq | int | N/A | |||
dw_mystore | date_dim | week_seq | int | N/A | |||
dw_mystore | date_dim | quarter_seq | int | N/A | |||
dw_mystore | date_dim | year | int | N/A | |||
dw_mystore | date_dim | dow | int | N/A | |||
dw_mystore | date_dim | moy | int | N/A | |||
dw_mystore | date_dim | dom | int | N/A | |||
dw_mystore | date_dim | qoy | int | N/A | |||
dw_mystore | date_dim | fy_year | int | N/A | |||
dw_mystore | date_dim | fy_quarter_seq | int | N/A | |||
dw_mystore | date_dim | fy_week_seq | int | N/A | |||
dw_mystore | date_dim | day_name | varchar(9) | N/A | |||
dw_mystore | date_dim | quarter_name | varchar(6) | N/A | |||
dw_mystore | date_dim | holiday | varchar(1) | N/A | |||
dw_mystore | date_dim | weekend | varchar(1) | N/A | |||
dw_mystore | date_dim | following_holiday | varchar(1) | N/A | |||
dw_mystore | date_dim | first_dom | int | N/A | |||
dw_mystore | date_dim | last_dom | int | N/A | |||
dw_mystore | date_dim | same_day_ly | int | N/A | |||
dw_mystore | date_dim | same_day_lq | int | N/A | |||
dw_mystore | date_dim | current_day | varchar(1) | N/A | |||
dw_mystore | date_dim | current_week | varchar(1) | N/A | |||
dw_mystore | date_dim | current_month | varchar(1) | N/A | |||
dw_mystore | date_dim | current_quarter | varchar(1) | N/A | |||
dw_mystore | date_dim | current_year | varchar(1) | N/A | |||
dw_mystore | date_dim | dw_insert_date | timestamp | N/A | SYSDATE on Insert | SYSDATE on Insert | |
dw_mystore | date_dim | dw_update_date | timestamp | N/A | SYSDATE on Update | SYSDATE on Update |
Time Dim¶
Table Name: time_dim
Description: Each row in this table represents one second.
Load Frequency: Static (load once)
Source: Loaded once when the DW is deployed from a DDL file.
Target Schema: | Target Table: | Target Column: | Target Datatype: | Key: | SCD Type: | Transformation: |
---|---|---|---|---|---|---|
dw_mystore | time_dim | time_key | int | PK | N/A | |
dw_mystore | time_dim | hour | int | NK | N/A | |
dw_mystore | time_dim | minute | int | NK | N/A | |
dw_mystore | time_dim | second | int | NK | N/A | |
dw_mystore | time_dim | am_pm | varchar(2) | N/A | ||
dw_mystore | time_dim | shift | varchar(20) | N/A | ||
dw_mystore | time_dim | sub_shift | varchar(20) | N/A | ||
dw_mystore | time_dim | meal_time | varchar(20) | N/A | ||
dw_mystore | time_dim | dw_insert_date | timestamp | N/A | SYSDATE on Insert | |
dw_mystore | time_dim | dw_update_date | timestamp | N/A | SYSDATE on Update |
Customer Dim¶
Table Name: customer_dim
Table Type: Dimension
Description: Each row in this table represents a customer.
Load Frequency: Dynamic
Source Tables: store_customer, store_customer_address
Source Table Joins: FROM stg_mystore.v_store_customer c
LEFT OUTER JOIN stg_mystore.v_store_customer_address a
ON c.address_id = a.address_id
Transformation Notes:
All the columns from the store_customer table are treated as SCD1 in the target.
All the columns from the store_customer_address table are treated as SCD2 in the target.
Null source values are transformed to 'Unknown' or '#' in the target, depending on the target column width. Null source dates/timestamps are transformed to '01/01/1000'.
The source data lake tables are HUDI tables. HUDI returns timestamps in the Unix time (epoch) format. The Unix time columns are converted via a SQL function and are encapsulated in views: v_store_customer and v_store_customer_address.
Source Data Lake Layer: | Source Database: | Source Table: | Source Column: | Source Datatype: | Target Schema: | Target Table: | Target Column: | Target Datatype: | Key: | SCD Type: | Transformation: | Note: |
---|---|---|---|---|---|---|---|---|---|---|---|---|
dw_mystore | customer_dim | customer_key | int | PK | N/A | Create new surrogate key for new records and SCD2 changes. | ||||||
clean | ara_clean_data_ |
store_customer | customer_id | string | dw_mystore | customer_dim | customer_id | varchar(16) | NK | N/A | ||
clean | ara_clean_data_ |
store_customer | salutation | string | dw_mystore | customer_dim | salutation | varchar(10) | 1 | The full history is updated on SCD1 columns for a change in the source. |
||
clean | ara_clean_data_ |
store_customer | first_name | string | dw_mystore | customer_dim | first_name | varchar(20) | 1 | |||
clean | ara_clean_data_ |
store_customer | last_name | string | dw_mystore | customer_dim | last_name | varchar(30) | 1 | |||
clean | ara_clean_data_ |
store_customer | birth_country | string | dw_mystore | customer_dim | birth_country | varchar(20) | 1 | |||
clean | ara_clean_data_ |
store_customer | email_address | string | dw_mystore | customer_dim | email_address | varchar(50) | 1 | |||
clean | ara_clean_data_ |
store_customer | birth_date | date | dw_mystore | customer_dim | birth_date | date | 1 | |||
clean | ara_clean_data_ |
store_customer | gender | string | dw_mystore | customer_dim | gender | varchar(10) | 1 | if 'M' then 'Male' if 'F' then 'Female' else gender |
||
clean | ara_clean_data_ |
store_customer | marital_status | string | dw_mystore | customer_dim | marital_status | varchar(10) | 1 | if 'D' then 'Divorced' if 'M' then 'Married' if 'S' then 'Single' if 'U' then 'Unknown' if 'W' then 'Widowed' else marital_status |
||
clean | ara_clean_data_ |
store_customer | education_status | string | dw_mystore | customer_dim | education_status | varchar(10) | 1 | |||
clean | ara_clean_data_ |
store_customer | purchase_estimate | bigint | dw_mystore | customer_dim | purchase_estimate | int | 1 | |||
clean | ara_clean_data_ |
store_customer | credit_rating | string | dw_mystore | customer_dim | credit_rating | varchar(10) | 1 | |||
clean | ara_clean_data_ |
store_customer | buy_potential | string | dw_mystore | customer_dim | buy_potential | varchar(10) | 1 | |||
clean | ara_clean_data_ |
store_customer | vehicle_count | bigint | dw_mystore | customer_dim | vehicle_count | int | 1 | |||
clean | ara_clean_data_ |
store_customer | lower_bound | bigint | dw_mystore | customer_dim | income_band_lower_bound | int | 1 | |||
clean | ara_clean_data_ |
store_customer | upper_bound | bigint | dw_mystore | customer_dim | income_band_upper_bound | int | 1 | |||
clean | ara_clean_data_ |
store_customer | customer_datetime | date | dw_mystore | customer_dim | start_date | date | 1 | f_from_unixtime(customer_datetime) | ||
clean | ara_clean_data_ |
store_customer_address | address_id | string | dw_mystore | customer_dim | address_id | varchar(16) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | address_datetime | string | dw_mystore | customer_dim | address_date | timestamp | 2 | f_from_unixtime(address_datetime) | ||
clean | ara_clean_data_ |
store_customer_address | street | string | dw_mystore | customer_dim | street | varchar(20) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | city | string | dw_mystore | customer_dim | city | varchar(60) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | county | string | dw_mystore | customer_dim | county | varchar(30) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | state | string | dw_mystore | customer_dim | state | varchar(2) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | zip | string | dw_mystore | customer_dim | zip | varchar(10) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | country | string | dw_mystore | customer_dim | country | varchar(20) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | gmt_offset | string | dw_mystore | customer_dim | gmt_offset | numeric(5) | 2 | |||
clean | ara_clean_data_ |
store_customer_address | location_type | string | dw_mystore | customer_dim | location_type | varchar(20) | 2 | |||
dw_mystore | customer_dim | scd_start_date | timestamp | N/A | On SCD2 change: - store_customer.customer_datetime on first record. - store_address.address_datetime on the new record |
|||||||
dw_mystore | customer_dim | scd_end_date | timestamp | N/A | On SCD2 change: - store_address.address_datetime on the existing current record - '31/12/2999' on the new record |
|||||||
dw_mystore | customer_dim | scd_current_flag | varchar(1) | N/A | On SCD2 change: - 'N' on existing record - 'Y' on new record |
|||||||
dw_mystore | customer_dim | dw_insert_date | timestamp | N/A | SYSDATE on Insert | |||||||
dw_mystore | customer_dim | dw_update_date | timestamp | N/A | SYSDATE on Update |
Sale Fact¶
Table Name: sale_fact
Table Type: Fact
Description: Each row in this table represents a single line item from a sale.
Load Frequency: Dynamic
Source Tables:
Lake: store_sale
DW: date_dim, time_dim, customer_dim
Source Data Lake Layer: | Source Database: | Source Table: | Source Column: | Source Datatype: | Target Schema: | Target Table: | Target Column: | Target Datatype: | Key: | Transformation: | Note: |
---|---|---|---|---|---|---|---|---|---|---|---|
N/A | Sourced from DW dw_mystore schema | date_dim | date_key | date | dw_mystore | sale_fact | sold_date_key | date | Surrogate key lookup on: SELECT NVL(dd.date_key, TO_DATE('01-JAN-1900', 'DD-MON-YYYY')) FROM clean_store_sale ss LEFT OUTER JOIN dw_mystore.date_dim dd ON TRUNC(ss.sale_datetime) = dd.day_date |
||
N/A | Sourced from DW dw_mystore schema | time_dim | time_key | int | dw_mystore | sale_fact | sold_time_key | int | Surrogate key lookup: SELECT NVL(td.time_key, -1) FROM store_sale ss LEFT OUTER JOIN dw_mystore.time_dim td ON EXTRACT(hour from ss.sale_datetime) = td."hour" AND EXTRACT(min from ss.sale_datetime) = td."minute" AND EXTRACT(sec from ss.sale_datetime) = td."second" |
||
clean | ara_clean_data_ |
store_sale | sale_datetime | timestamp | dw_mystore | sale_fact | sold_date | timestamp | PK | ||
N/A | Sourced from DW dw_mystore schema | customer_dim | customer_key | int | dw_mystore | sale_fact | customer_key | int | Surrogate key lookup: SELECT NVL(cd.customer_key, -1) FROM store_sale ss LEFT OUTER JOIN dw_mystore.customer_dim cd ON ss.customer_id = cd.customer_id AND ss.sold_datetime >= cd.scd_start_date AND ss.sold_datetime < cd.scd_end_date |
||
clean | ara_clean_data_ |
store_sale | ticket_id | bigint | dw_mystore | sale_fact | ticket_id | varchar(20) | PK | ||
clean | ara_clean_data_ |
store_sale | item_id | bigint | dw_mystore | sale_fact | item_id | int | PK | ||
clean | ara_clean_data_ |
store_sale | quantity | bigint | dw_mystore | sale_fact | quantity | int | |||
clean | ara_clean_data_ |
store_sale | wholesale_cost | decimal(15,2) | dw_mystore | sale_fact | wholesale_cost | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | list_price | decimal(15,2) | dw_mystore | sale_fact | list_price | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | sales_price | decimal(15,2) | dw_mystore | sale_fact | sales_price | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | ext_discount_amt | decimal(15,2) | dw_mystore | sale_fact | ext_discount_amt | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | ext_sales_price | decimal(15,2) | dw_mystore | sale_fact | ext_sales_price | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | ext_wholesale_cost | decimal(15,2) | dw_mystore | sale_fact | ext_wholesale_cost | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | ext_list_price | decimal(15,2) | dw_mystore | sale_fact | ext_list_price | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | ext_tax | decimal(15,2) | dw_mystore | sale_fact | ext_tax | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | coupon_amt | decimal(15,2) | dw_mystore | sale_fact | coupon_amt | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | net_paid | decimal(15,2) | dw_mystore | sale_fact | net_paid | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | net_paid_inc_tax | decimal(15,2) | dw_mystore | sale_fact | net_paid_inc_tax | decimal(15,2) | |||
clean | ara_clean_data_ |
store_sale | net_profit | decimal(15,2) | dw_mystore | sale_fact | net_profit | decimal(15,2) | |||
customer_dim | dw_insert_date | timestamp | SYSDATE on Insert | ||||||||
customer_dim | dw_update_date | timestamp | SYSDATE on Update |