Cloud Dataflow and DataPrep – ETL on GCP
Also Read BigQuery Basics
DataFlow - Data Pipeline Service - DeDuping, Grouping, Aggregating - Transformations
- ETL, Open Source using Beam
- Intelligently scale to millions of QPS
- Stream and Batch Processing
- Sink can be BigQuery or BigTable
- Source can be Kafka, Avro, DataStore or Pub Sub
DataPrep - Other ETL Tools
- Visual UI - no programming required. Recipe based
- Explore, clean and prep data for Machine Learning
- Suggests Ideal Data Transformation
- Data Prep sources from BigQuery (or Cloud Storage) and feeds that to Dataflow which can then dump that to BigQuery (or Cloud Storage) again.
Leave a Reply