STEM

Data cleansing challenge: non-ASCII characters

Non-ASCII characters can pose challenges in data cleansing for several reasons: Therefore, it’s a good practice to standardize or normalize text data to ASCII when possible, or ensure correct handling of non-ASCII characters. This helps to maintain data integrity and simplifies subsequent data processing tasks. Superscripts, subscripts, or “special” characters often look like ascii characters […]

Read More
STEM

Comparing and merging lists in Excel, Python

Identifying anomalies, duplicates, and updating data necessitates comparing information from various sources. Accurate execution of these tasks is crucial, whether working solely with spreadsheets or using a mix of tools and languages like databases and web services. In this post, I will demonstrate various methods for comparing lists of identical or differing sizes across different […]

Read More
STEM

Data Normalization & Rescaling

Normalizing data is a common task in many applications, especially when working with large datasets, machine learning, or statistical analysis. There are two common statistical methods for normalization: Min-Max Scaling, Standardization or Z-score Normalization. But there are other ways too, which I will demonstrate in the examples below. 1. Min-Max Scaling (Normalizes Data to Between […]

Read More
STEM

Comparing Apples with Oranges?

The familiar saying, “Comparing apples to oranges,” suggests that it’s illogical to compare two distinct items. However, in the realm of statistics, such comparisons are not only possible but sometimes necessary. By establishing a uniform standard or metric, we can evaluate items that, at first glance, appear incomparable. In this article, I present some straightforward […]

Read More
Back To Top
+