An example (or project idea) of data engineering: twitter API to stream through kafka channel which is stored on hadoop, do text analysis to know sentiment, can be done on spark ecosystem too

  • Collect data from various internal and external sources
  • Transform data into usable formats
  • Load data into convenient and controllable locations for other teams to use
  • Build and maintain infrastructure
  • Basically, a software engineer who knows some data
  • More verbosely, you could call it someone who helps organizations structure and get access to their data, with the speed and scalability they need, and enable teams to deliver great insights and analytics from that data.

Data Lake

Satori