Pandas NumPy Matplotlib Scikit

ELK Open-source stack which consists of Elasticsearch, Logstash and Kibana

Tableau

  • Analyze the data to provide business teams with data-driven insights
  • Validate or invalidate experiments via data analysis
  • Attempts to glean useful info (data driven insights, often to do things like increase sales or optimize drug dosage)

https://support.microsoft.com/en-us/office/find-and-remove-duplicates-00e35bea-b46a-4d5d-b28e-66a552dc138d

Perform EDA (exploratory data analysis)

  • Find independent and dependent variable
  • Independent variable is the cause
  • Dependent variable is the effect (or outcome)

  • Look for correlation between variables
  • Is it categorical variables (example: blood type) or continuous variables (example: population)?
  • Are the samples independent or is it a time series?

spaghetti plot

Mode.com

  • Upload some data (beware, it gets uploaded to public)
  • Make query
  • Build notebook