Monday, September 22, 2025
STEM

Find Superscripts, Subscripts, and Unicode in a text file (Python)

Occasionally, it becomes necessary to search for special characters like superscripts, subscripts, symbols, emojis, or any Unicode characters within a text document. This is crucial when working with data files that should not contain any such characters, unless they are explicitly required and managed. Most editors, including Word, lack a ‘Find’ feature that reveals all Unicode characters in a file without having to search for a specific known character. However, I need to be able to detect all such characters without prior knowledge of their presence in the document. In this post, I am sharing my Python code that offers this exact functionality.

This script opens the file with UTF-8 encoding, which can handle Unicode characters. It then uses a regular expression to find any character that is not a basic ASCII character (i.e., any character with a code value greater than 127). The findall() function returns a list of all matches. The script reads a given file line by line and prints the line numbers where a Unicode char was found. Below is the code:

Here’s a sample session output:



Interested in creating programmable, cool electronic gadgets? Give my newest book on Arduino a try: Hello Arduino!

Leave a Reply

Your email address will not be published. Required fields are marked *

Back To Top