Arrays, NumPy, Vectors, Tensors: 0 to 4+ Dimensions Demystified

In the domain of Python programming, data structures are the pillars of efficient and dynamic coding. From the straight-forward one-dimensional lists (arrays) to the complex realms of multidimensional arrays and tensors, Python provides a robust framework to handle and manipulate data. These are not just academic concepts; they are the building blocks of machine learning (ML/AI), scientific computing, deep neural network (AI) and many other real-world applications.

In this blog, I’ll try to demystify these concepts. I’ll provide examples starting with basic Python’s built-in lists, before venturing into the more powerful capabilities of numpy arrays, vectors, and tensors with comparable examples from one dimension (1D) to higher dimension structures such as 4D+.

Whether you’re a coding novice eager to understand the basics or an experienced developer looking to refine your knowledge, this post attempts to provide a comprehensive and practical insight into these essential data structures.

One-dimensional arrays (1D)

Basic Python 1D array (aka list):

We can declare an array in a single dimension using basic Python, without requiring any specialized library. For example: array_1D = [1, 2, 3, 4, 5]

Arrays are called lists in Python. In the declaration, array_1D’s shape is basically 5 elements; no rows or columns, but a sequence of numbers/elements in a single dimension (a linear data structure). We access any element using a zero-based index.

To access the third element in the list, which is 1, the statement would be: array_1D[0]. Similarly, to access the third element in the list, which is 3, the statement would be: array_1D[2].

If we simply print this list, its output will be: [1, 2, 3, 4, 5]

NumPy 1D array:

Python’s basic lists are versatile, built-in data structures that allow for the storage of mixed data types and provide intuitive operations like indexing, slicing, and appending. However, they are not optimized for numerical computations and can become inefficient when handling large datasets or performing mathematical operations. Numpy arrays are specifically designed for numerical and scientific computing. They offer a more compact memory layout, support for element-wise operations, and an extensive range of mathematical functions. numpy arrays are the go-to choice for tasks requiring fast, efficient numerical processing, while Python lists remain a more flexible option for general-purpose applications.

However, we can create the same list as above using numpy array, which requires importing the numpy library and using its specialized methods. This is how it’s done:

import numpy as np array_1D = np.array([1, 2, 3, 4, 5])

If we print the array and its shape with: print(f”{array_1D}\tShape:{array_1D.shape}”)
we get: [1 2 3 4 5] Shape:(5,)
Which means it has five elements. i.e. 1D array with 5 elements. 1D array doesn’t inherently have a concept of rows or columns, but in the 2D context, you could think of it as having 1 row with 5 elements if that helps. Like a basic Python list, it is still a sequence of numbers. We can access any element in this numpy array just like a Python list using the zero-based index. So, to access the third element in the list, which is 3, the statement would be: array_1D[2].

Two-dimensional arrays (2D)

Basic Python 2D array:

We can declare a two-dimensional Python array as this:

array_2D = [ [1, 2, 3], [4, 5, 6] ]

If we print this array, we get: [[1, 2, 3], [4, 5, 6]]

This means, we have two rows of numbers, each containing 3 elements. The first row contains elements 1, 2, 3. The second row has 4, 5, 6.

To print this array row, we can iterate with a for loop:
for row in array_2D: print(row)

and the output will be:
[1, 2, 3] [4, 5, 6]

To access a specific element in a 2D array (e.g., second row, third column), save it in a variable and print it, the statements would be: element = array_2D[1][2]; print(f"Element at row 2, column 3: {element}")

and the output will be: 6.

NumPy 2D array:

We can define the same 2D array using numpy with (assuming, we have already imported the numpy library as: import numpy as np):

array_2D = np.array([[1, 2, 3], [4, 5, 6]])

and if we print its content and shape with:
print("2D Array (numpy):") print(f"{array_2D}\tShape:{array_2D.shape}")

we get this output:
[[1 2 3] [4 5 6]] Shape:(2, 3)
— meaning, 2 rows, 3 columns in each row just as before.

If you have experimented with different values and shapes with 1D and 2D with both Python and Numpy arrays, you should be clear about their structures and how to access any element in them. Now, let’s move on to higher dimensions!

Three-dimensional arrays (3D)

Basic Python 3D array:

To declare a 3D array in Python, we can write: array_3D = [[[1, 2], [3, 4]], [[5, 6], [7, 8]]]

The same 3D array above can be re-written for readability as:
array_3D = [ [ [1, 2], [3, 4] ], [ [5, 6], [7, 8] ] ]

To access a specific element such as the second element from second block’s first list/array, save it in a variable and print it, we can write: element = array_3D[1][0][1]; print(f"Accessed 2nd element in second block (3D array): {element}")

and the output will be: 6

PRO TIP: Notice the number of the opening square brackets in the declaration of the array (either opening or closing ones, the numbers are same in both cases because they must match according to syntax) and you’ll notice there are three. That’s a giveaway that this is a 3D array!

Before we can access the elements in the 3-dimension, we need to visualize its structure. It helps to think of this 3D array as composed of blocks (or matrices) first, then lists (or rows) in each block (or matrix), and lastly number of elements (or columns) in each row. So, this structure has 2 blocks each containing a matrix (or two lists each); the first block contains two lists: [1,2] and [3,4]. The second block contains the lists: [5,6] and [7,8]. The first list of first block is [1,2], and the second list of second block is [7,8]. You get the idea.

Therefore, to access a specific element such as the second element from second block’s first list we need to address it as: array_3D[1][0][1]

The first [1] refers to the second block (remember, Python indexing is zero-based), second [0] means first list, the final [1] means the second element, which is: 6

NumPy 3D array:

The same structure is defined in numpy as: array_3D = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])

The conceptualization explained above can be applied to this structure as well, therefore, to access a specific element, the code will be exactly the same! So, here too: array_3D[1][0][1] accesses the element 6.

If you print it shape (see code examples above), its shape will be: (2, 2, 2). The first 2 refers to the number of blocks. The second 2 refers to the number of lists in each block. The third 2 refers to the number of elements in each row…which jibes exactly with what we saw with the 3D array we defined using basic Python lists.

Feeling brave? Great! Let’s continue to the 4th dimension. A 4D array is structuresd in four dimensions. Yes, we can still define this dimension using Python basic lists, and of course numpy as we’ll explore below. First, recognize that a 4D array is a collection of 3D arrays, and is commonly used in applications like image processing, simulations, or scientific computations.

Sure, we can create a structure resembling a 4D array using nested Python lists. However, Python lists don’t have built-in multidimensional functionality like numpy arrays, so it involves manual nesting. A Python list designed to represent a 4D array would be a list containing lists, which themselves contain lists, and so on—essentially creating a hierarchy of lists up to four levels deep.

Before I share examples though, I must warn you that the hierarchical structure of a 4D array can vary greatly depending on how the data is organized and what each dimension represents. In one scenario, we might have a 4D array with a deeper hierarchy, such as groups containing blocks, blocks containing matrices, matrices containing lists, and lists containing individual elements.

[group_1 -> block_1 -> matrix_1 -> list of elements]
[group_1 -> block_1 -> matrix_2 -> list of elements]
[group_1 -> block_2 -> matrix_2 -> list of elements]

[group_2 -> block_1 -> matrix_2 -> list of elements]

…etc.

In another case, we might simplify the hierarchy and have only blocks, each containing matrices, and so forth.

[block_1 -> matrix_1 -> list of elements]
[block_1 -> matrix_2 -> list of elements]
[block_2 -> matrix_1 -> list of elements]
…etc.

Keep this in mind as I share the examples below (the variations can be much more diverse, but grasping the concept is the key and thereby modifying the strucuture according to required use cases).

It’s very much like JSON, where the hierarchical structure isn’t fixed or universal but instead adapts to the specific data and its context. Just as JSON organizes information into nested key-value pairs or arrays tailored to represent the data model, a 4D array’s hierarchy is shaped by the semantics of the data it encodes.

Four-dimensional arrays (4D)

Basic Python 4D array:

Creating a 4D array in Python is essentially nested lists, a structure that’s a list of lists within lists, nested 4 levels deep. Each “outer list” acts like a block. Within each block, we have matrices (2D arrays). Each matrix contains rows and columns. This is a 4D Python array…array_4D = [[[[1, 2], [3, 4]], [[5, 6], [7, 8]]], [[[9, 10], [11, 12]], [[13, 14], [15, 16]]]]or, you could space them this way (they both are the same structure):
array_4D = [ [ [[1, 2], [3, 4]], [[5, 6], [7, 8]] ], [ [[9, 10], [11, 12]], [[13, 14], [15, 16]] ] ]

Again, you can count the number of open square brackets to understand the number of dimensions.

To access a specific element such as the second element from second block’s first list/array, you would write: array_4D[1][0][0][1]

Think of the indexing as this: [block_index][matrix_index][list_index][item_index]. And the output will be: 10

Similarly, access a specific element: second element from second block’s second list/array (still in Matrix 1 of Block 2 because each matrix has 2 lists): array_4D[1][0][1][1]

and the output will be: 12

However, as I stated earlier, not all 4D arrays have the same structure. Here’s another 4D array with a different structure:
second_array_4D = [
[
[
[1, 2], [3, 4]
],
[
[5, 6], [7, 8]
]
],
[
[
[9, 10], [11, 12]
],
[
[13, 14], [15, 16]
]
]
]

In this example, we have 2 groups at the highest level. Each group contains 2 blocks. Each block contains 2 lists. Each list contains 2 elements. To access the element 10, we have to write: second_array_4D[1][0][0][1]

NumPy 4D array:

In numpy, we can declare a 4D array as this:
array_4D = np.array([
[
[[1, 2], [3, 4]],
[[5, 6], [7, 8]]
],
[
[[9, 10], [11, 12]],
[[13, 14], [15, 16]]
]
])

The first dimension represents the number of “blocks” (2 blocks). The second dimension represents the number of 2D arrays within each block (e.g. 2 matrices per block). The third and fourth dimensions represent the rows and columns (or lists) within each matrix (e.g. 2 rows and 2 columns).

To access the element 10, we have to write: array_4D[1][0][0][1] (just as with a Python 4D array).

Phew! Okay, we got this…but I need to talk about Vectors and Tensors too! How are those related to Python arrays, numpy arrays, and all that???

Vectors

A vector is simply a 1D array (a normal list), that contains numbers but with context such as for representing magnitude and direction in space. A simple vector may look like this: vector = [1, 2, 3]

(the variable named ‘vector’ in this example can be named anything, ‘vector’ it’s not a keyword)

However, in mathematical terms, this is a 3D vector because it has three components, but it’s represented in Python as a 1D list. What the values mean [1,2,3] entirely depends of the context in which the vector is used. [1, 2, 3] could mean a point or direction in 3D Cartesian coordinates. e.g. 1 is x value (horizontal axis), 2 is y value (vertical axis), 3 is z value (z-axis for depth or height) value. Or, in velocity context, it could mean 1=velocity along x-axis, 2: velocity along y-axis, 3: velocity along z-axis…combined they may define the overall velocity vector. Or, in abstract representation. 1 could represent time, 2 could represent temperature, 3 could represent pressure. Or, it could be RGB values. In short, a vector doesn’t inherently have any meaning until we define the context.

Let’s say we want to define a vector for velocity of a car going east at 60 mph. We could simply declare: velocity = [60, 0] ; where 60 is the x-component which may be the direction of heading, 0 is the y-component meaning it’s moving purely eastward for example.

Therefore, If the car was moving northeast or such, we’d use trigonometry to break the velocity into components: x would speed X cos(theta); y would be speed X sin(theta). When working with a velocity vector, it describes the direction of motion relative to a reference axis (e.g., how far “northeast” the motion is from purely east). In polar coordinates, theta denotes the angular displacement of a point from the positive x-axis. As you can see, vectors are just sequences of numbers but its usage depends heavily on the context.

Tensors

A tensor generalizes matrices or vectors to higher dimensional arrays. A scalar is a 0D tensor. A vector is a 1D tensor. A matrix is a 2D tensor. This is important to remember.

A 3D tensor example:
tensor = [
[
[1, 2], [3, 4]
],
[
[5, 6], [7, 8]
]
]

Obviously, this tensore has 3 dimensions. The first dimension (outermost dimension) has 2 blocks . Second dimension is made of 2 matrices. Each block containing a matrix. Third dimension is where each matrix has 2 rows with 2 columns. We can print this tensor in iteration with:
for block in tensor: print(block)

and we will get:
[[1, 2], [3, 4]]
[[5, 6], [7, 8]]

To access a specific element such as 7, we will then need to write: tensor[1][1][0] (second block, second matrix, first element from list). As with vectors, tensor is not a keyword in Python.

Hmm…what to use?

Python lists lack features such as element-wise operations, broadcasting, or efficient memory handling that numpy arrays excel at. Using Python lists for multidimensional data structures works but can be cumbersome and inefficient for larger-scale computations. It’s typically better to opt for numpy arrays if you’re working with numerical or multidimensional data.

Suppose I want an array that contains multiple cars, and for each car, I want to store its max speed, range, fuel tank capacity, MPG, and current speed. Technically, I could use a Python list, a numpy array or a vector or a tensor or something else.

But the best choice depends on how we’ll work with this data.
– If the data is straightforward and we don’t need advanced mathematical operations, then we use a Python list with dictionary.
– If we need numerical computations (e.g., average speed or comparisons, etc), we use numpy array for efficient numerical operations (however, we’d lose descriptive labels as was with dictionary).
– If we want both structure and ease of use with labeled columns, we use pandas dataframe.
– If we plan to expand the data to higher dimensions (e.g., tracking multiple attributes over time), we use Tensor.

Here’s how we can use a dictionary for this purpose:
cars = [ {"max_speed": 150, "range": 400, "fuel_tank_capacity": 15, "MPG": 30, "current_speed": 60}, {"max_speed": 120, "range": 350, "fuel_tank_capacity": 12, "MPG": 25, "current_speed": 40}, {"max_speed": 180, "range": 500, "fuel_tank_capacity": 18, "MPG": 28, "current_speed": 90} ]

Then to access the third car’s max speed, we would write: cars[2]['max_speed'] # output: 180
To access the second car’s range, we would write: cars[1]['range'] # output: 350
and so on.

Or ,we could use a numpy array instead:

import numpy as np cars = np.array([ [150, 400, 15, 30, 60], # Car 1: [max_speed, range, fuel_tank_capacity, MPG, current_speed] [120, 350, 12, 25, 40], # Car 2 [180, 500, 18, 28, 90] # Car 3 ])

And get the first car’s max speed with: cars[0, 0]

Or, we can use pandas dataframe:
cars = pd.DataFrame({
“max_speed”: [150, 120, 180], # car 1, car 2, car 3 max speed
“range”: [400, 350, 500], # car 1, car 2, car 3 range
“fuel_tank_capacity”: [15, 12, 18], # car 1, car 2, car 3 fuel cap
“MPG”: [30, 25, 28], # # car 1, car 2, car 3 MPG
“current_speed”: [60, 40, 90] # car 1, car 2, car 3 current speed
})

And get the first car’s max speed with: cars.iloc[0]['max_speed']

Or, we can use Tensor (via numpy array):
cars_tensor = np.array([
[150, 400, 15, 30, 60], # Car 1: max speed, range, fuel_tank_capacity, MPG, current_speed
[120, 350, 12, 25, 40], # Car 2: max speed, range, fuel_tank_capacity, MPG, current_speed
[180, 500, 18, 28, 90] # Car 3: max speed, range, fuel_tank_capacity, MPG, current_speed
])

A 3D tensor could add another dimension (e.g., attributes at different times) such as with:
cars_tensor = cars_tensor.reshape((3, 5, 1)) — this is reshaped for demonstration without altering the actual data into a 3-dimensional array with 3 blocks, each block containing 5 rows, each row containing 1 column.

So, the original numpy array:
[ [150, 400, 15, 30, 60], [120, 350, 12, 25, 40], [180, 500, 18, 28, 90] ]

becomes reshaped to (3,5,1) as:
[ [ [150], [400], [15], [30], [60] # Block 1 ], [ [120], [350], [12], [25], [40] # Block 2 ], [ [180], [500], [18], [28], [90] # Block 3 ] ]

This extra dimension adds a new dimension which is often required for ML tasks (like feeding data into neural networks).

Additional Pointers

Python lists (which we use to build arrays without libraries) can contain a mix of data types, including numbers, characters, strings, and even other objects.

NumPy arrays can also contain mixed types (e.g. numbers and characters/strings in a list/matrix), but with a catch! NumPy is designed to work best with homogeneous arrays—arrays where all elements share the same data type (e.g., all integers, all floats, etc.). If we introduce mixed types like numbers and characters, NumPy automatically converts them into a common type, typically a string.

Vectors and Tensors typically contain only numerical values but can include non-numeric elements, like categorical data or strings, especially when used in machine learning or natural language processing (e.g., word embeddings). In libraries like NumPy or PyTorch, tensors and vectors are typically designed to work with numbers because numerical computations are their main purpose. But basic Python lists can be used to create generalized “tensors” with mixed data types.

Python lists, numpy arrays, tensors are both objects and data structures!
In Python, everything is an object.
As for Tensors, it depends on the library we use. In libraries like PyTorch or TensorFlow, tensors are objects because they are instances of tensor-specific classes (like torch.Tensor or tensorflow.Tensor).
As a data structure, tensors are essentially multidimensional arrays (generalizations of vectors and matrices) used for machine learning and deep learning.

This was a lot to take in! If you have read the whole post this far and digested it, you deserve a coffee break — or choose or your favorite item from an array 😉 I hope this was educational and helpful. Come back for a refresher, or for more fresh content on new topics! Thanks for reading!

To help me continue to produce and share more content and software that’s informative, and ad-free, please consider making a one-time or a recurring small donation by clicking here (securely transacted via PayPal™).