Copy table from AWS Redshift to Hive ORC ACID table using Apache NIFI

Ebeb
May 24, 2023

Below flow is a fast way to copy a table data from AWS Redshift or any other RDBMS to Hive ORC ACID table using Apache NIFI.

NIFI flow to copy AWS Redshift to Hive ORC ACID table.
  1. QueryDatabaseTableRecord: Set the Record Writer to ParquetRecordSetWriter
  2. DeleteHDFS: This will clean out the old external Hive Parquet table files from prior runs.
  3. PutHDFS: This will store the table rows as a HDFS parquet file
  4. ReplaceText: Set the below values

Search Value=.*

Replacement Value= INSERT OVERWRITE TABLE db.managed_orc_acid_table SELECT * FROM db.external_parquet_table;

Replacement Strategy= Always Replace

Evaluation Mode=Entire text

5. PutHive3QL: Use a Hive3ConnectionPool

Thats all your data will be ingested very fast into Hive ORC ACID table which can be queried immediately as it is an ACID table.

--

--