Also Read BigQuery Basics

DataFlow - Data Pipeline Service - DeDuping, Grouping, Aggregating - Transformations

  • ETL, Open Source using Beam
  • Intelligently scale to millions of QPS
  • Stream and Batch Processing
  • Sink can be BigQuery or BigTable
  • Source can be Kafka, Avro, DataStore or Pub Sub

DataPrep - Other ETL Tools

  • Visual UI - no programming required. Recipe based
  • Explore, clean and prep data for Machine Learning
  • Suggests Ideal Data Transformation
  • Data Prep sources from BigQuery (or Cloud Storage) and feeds that to Dataflow which can then dump that to BigQuery (or Cloud Storage) again.
data prep
data prep