The frequentist approach

Photo by Anna Nekrashevich from Pexels

Hypothesis testing is a common statistical tool used in research and data science to support the certainty of findings. The aim of testing is to answer how probable an apparent effect is detected by chance given a random data sample. This article provides a detailed explanation of the key concepts in Frequentist hypothesis testing using problems from the business domain as examples.

A hypothesis is often described as an “educated guess” about a specific parameter or population. Once it is defined, one can collect data to determine whether it provides enough evidence that the hypothesis is true.

In hypothesis testing…


How understanding the grammar of graphics will help you build any plot with ggplot2

Photo by Hasan Almasi on Unsplash

ggplot2 is a widely used graphics library in R and also part of the tidyverse. The tidyverse is a collection of R packages that share a common design philosophy, grammar, and data structures. Other packages in the tidyverse are, for example, dplyr or purrr.

The grammar of graphics

ggplot2 is built on the grammar of graphics. Hadely Wickham, one of the developers of ggplot2, described grammar to be

“the fundamental principles or rules of an art or science”.

Therefore, the grammar of graphics is a way to describe and create a wide range of plots. …


Hands-On Tutorial On Treating Outliers — Winsorizing and Imputation

Photo by RF._.studio from Pexels

An Exploratory Data Analysis (EDA) is crucial when working on data science projects. Understanding your underlying data, its nature, and structure can simplify decision making on features, algorithms or hyperparameters. A critical part of the EDA is the detection and treatment of outliers. Outliers are observations that deviate strongly from the other data points in a random sample of a population.

In two previously published articles, I discussed how to detect different types of outliers using well-known statistical methods. One article focuses on univariate and the other on multivariate outliers.

In this final post, I want to discuss how to…


Hands-On Tutorial On Multivariate Outliers

Photo by LegioSeven from Pexels

An Exploratory Data Analysis (EDA) is essential when working on data science projects. Understanding your underlying data, its nature, and structure can simplify decision making on features, algorithms, and hyperparameters. One crucial part of the EDA is the detection of outliers. Outliers are observations that are far away from the other data points in a random sample of a population.

In a previously posted article, I introduced statistical methods to detect univariate outliers commonly used in practice. In this post, I want to discuss what multivariate outliers are, how they can be detected, and visualized during EDA. …


Freshen up your resume, optimize your job hunt and master the initial interview

Photo by mentatdgt from Pexels

In my job search process at the beginning of this year, I had the opportunity to meet a recruitment specialist from a renowned business school. Although his usual work involves dealing with business students, he shared some valuable advice with me, which, I believe, could benefit any job seeker. Please note that the examples I will use for illustration in this article are tailored to my personal job hunt in data science but can be exchanged by other roles.

Tips for your resume

1. Have a short and customized “About me” section

At the top of your CV, implement an “About me” section consisting of 2–3 sentences where you briefly describe yourself, state…


Hands-On Tutorial On Univariate Outliers

Image by Will Myers on Unsplash

An Explorative Data Analysis (EDA) is crucial when working on data science projects. Knowing your data inside and out can simplify decision making concerning the selection of features, algorithms, and hyperparameters. One essential part of the EDA is the detection of outliers. Simply said, outliers are observations that are far away from the other data points in a random sample of a population.

Because in data science, we often want to make assumptions about a specific population. Extreme values, however, can have a significant impact on conclusions drawn from data or machine learning models. …


From building your brand and interview tips to the essential technical skills

Photo by Angelique Rademakers from Reshot

There are many free resources on the web to become a better data scientist. One of them being podcasts. Podcasts are a useful source to learn from professionals with experience in the field, technical hacks, or to get used to the data science jargon. Most episodes are between 30–60 minutes and, therefore, a great companion for your daily trip to work or university, while working out or just in between.

You can find some great collections of the best podcasts in data science, engineering, and business already on Medium. While I find some of these podcasts very technical, often covering…

Alicia Horsch

Data Scientist / Idea sharing / Learning & Personal Growth

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store