Below flow is a fast way to copy a table data from AWS Redshift or any other RDBMS to Hive ORC ACID table using Apache NIFI.
- QueryDatabaseTableRecord: Set the Record Writer to ParquetRecordSetWriter
- DeleteHDFS: This will clean out the old external Hive Parquet table files from prior runs.
- PutHDFS: This will store the table rows as a HDFS parquet file
- ReplaceText: Set the below values
Search Value=.*
Replacement Value= INSERT OVERWRITE TABLE db.managed_orc_acid_table SELECT * FROM db.external_parquet_table;
Replacement Strategy= Always Replace
Evaluation Mode=Entire text
5. PutHive3QL: Use a Hive3ConnectionPool
Thats all your data will be ingested very fast into Hive ORC ACID table which can be queried immediately as it is an ACID table.