I’m not a statistician by profession or training. However, I find it fascinating with even the basics under my belt and find plethora of statistic’s practical usage. Without it, we’re really ignorant. With it, we’re equipped but not always best educated either. I heard the phrase again and again, “Correlation does not equal causation!” and it was only some real-world experiences that this truth sunk in. We should always be aware of the ethical side of statistics because it’s really a “trick” used to take short-cuts on things that would take enormous even impractical amount of time/resources to find answers for. Statistics is there so you don’t have to spend large amount of $$ and time to conjure up something 😉 And yet, it’s a beautiful protection against blind faith.
Today, I’ll touch on one of the most critical and common applications of statistics that EVERYONE hears about almost every day, and even more so near elections. That is, things like “Margin or Error”, “Surveys”, “Polls”, etc.
Unfortunately, most people don’t understand what they’re saying, nor do they understand (which is even worse in my opinion) how much to trust the numbers and how they were calculated. Let’s start with the basic understanding that NOT ALL SURVEYS ARE EQUAL! Most of them are unfortunately spins to push an agenda or lean the audience toward a specific direction. But there are things we can do such as: find out how it was conducted, what parameters were used, who and where surveys were conducted. Without that knowledge, we might as well play LOTTO all day.
In the next couple of blogs in this statistics topic, I’ll describe I’ll try to unveil this magical process in laymen’s terms with real-world applications and examples. Then, I’ll take it further and actually write a program in Python, and for the non-tech types, I’ll explain how you can do some of these in Excel instead of by hand. So, please stay tuned for the follow-up blogs in this series and read them in order if you can.
First, let’s talk about how those surveys are made. Because surveying everyone would take too much time, and resources, it’s not practical. If you want to find out who everyone in your state is going to vote for in 2020, you’ll have to make a LOT of calls, send out paper and mail surveys, online surveys, emails, etc. etc. and after a LOT of frustrations, you may collect a pretty credible result, but most likely by then the election will be over 🙂
So, in reality, only a handful of people are surveyed for specific experiments. Of course, if you went to DNC rally, and surveyed 20 people in the attendees asking the question, “Will you vote for democrats in 2020?”, the answer should be overwhelmingly “yes”. However, that survey would not be of much real value and fairly representative. For it to be of real-use, ideally the survey takers will be chosen from a diverse demographic and geographic locations with about equal sized groups. Also, there should be acknowledgement that there will be errors, some will not respond in either/or format, and the fact that we will NEVER cover the ENTIRE population possible, just a tiny portion of them…called the sample size.
How do we find the Sample Size? There is a method to this madness. In the next blog, I explain the method and formula step by step and raise some questions and pose some answers.
This is part of a 3-part series on the topic. Please read the posts in the order for maximum clarity and context: