Introduction With the summer of soccer, beautiful football having come to an end, let’s dive into some team and player statistics and assay the tournament. I have collected various stats and facts, and organized them into a digestible format as a tribute to the fantastic players and the tournament. In this post, I’ll cover some […]
Tag: datascience
Division, Floor Division (Python)
In Python, the single forward slash ‘/’ performs a floating-point division unless you’re using Python 2.x,in which case it performs an integer division with integer operands. The double forward slash //, on the other hand, forces a floor division operator which performs an integer division and returnsthe largest integer less than or equal to the […]
GENETIC QUIRKS OF THE WORLD 2024
Here are some interesting facts (oddities?) from around the world and their associated visuals in Excel. Data source: worlpopulationreview.com Data rounded to nearest single decimal digit when applicable.
Generate Bar Codes, QR Codes in Excel: Quick & Easy Way
In this post, I’ll show you a quick, easy, and a free method that you can use today to generate QR codes, bar codes in UPC-A, UPC-E formats, and custom bar codes based on the product information you enter in Excel as text. How it works Two things are at play here that make it […]
How to calculate streaks in Excel
Streaks refer to trends in the data. These can be linear, exponential, damped, seasonal, irregular/random, stationary, or cyclical. Streaks are important for several reasons: There isn’t any built-in function in Excel for calculating streaks, but there different ways we can make Excel do some of that heavy-lifting by using a combination of functions such as […]
Data cleansing challenge: non-ASCII characters
Non-ASCII characters can pose challenges in data cleansing for several reasons: Therefore, it’s a good practice to standardize or normalize text data to ASCII when possible, or ensure correct handling of non-ASCII characters. This helps to maintain data integrity and simplifies subsequent data processing tasks. Superscripts, subscripts, or “special” characters often look like ascii characters […]
Comparing and merging lists in Excel, Python
Identifying anomalies, duplicates, and updating data necessitates comparing information from various sources. Accurate execution of these tasks is crucial, whether working solely with spreadsheets or using a mix of tools and languages like databases and web services. In this post, I will demonstrate various methods for comparing lists of identical or differing sizes across different […]
How much does it cost to retire in each state?
Recently, I collected data on cost of living, and average longevity in every state + D.C. From the data, I derived the COLI, or Cost of Living Index, which is then normalized to 100 (where 100 represents the national average cost of living). Additionally, using data from Bureau of Labor Statistics (BLS), I populated by […]
Data Normalization & Rescaling
Normalizing data is a common task in many applications, especially when working with large datasets, machine learning, or statistical analysis. There are two common statistical methods for normalization: Min-Max Scaling, Standardization or Z-score Normalization. But there are other ways too, which I will demonstrate in the examples below. 1. Min-Max Scaling (Normalizes Data to Between […]
Comparing Apples with Oranges?
The familiar saying, “Comparing apples to oranges,” suggests that it’s illogical to compare two distinct items. However, in the realm of statistics, such comparisons are not only possible but sometimes necessary. By establishing a uniform standard or metric, we can evaluate items that, at first glance, appear incomparable. In this article, I present some straightforward […]