'Numpy automatically converting array of strings to array of numbers

I have an array of strings that is composed of number-like strings such as 010. I am trying to build a 2D numpy array by creating an empty numpy array, and then filling in the rows with my array of strings. However, it seems like whenever I assign a row in the numpy array, it converts the number-like strings into numbers. The main issue with this behavior is that I am losing leading zeroes from my strings.

I wrote a simple example to show what is happening:

import numpy as np

num_rows = 5
arr = ["010", "011", "111", "100", "001"]
np_arr = np.empty((num_rows, len(arr)), dtype=str)

for i in range(len(np_arr)):
    np_arr[i] = arr

print(np_arr)

The resulting output is:

[['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']
 ['0' '0' '1' '1' '0']]

vs. the expected output:

[['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']
 ['010' '011' '111' '100' '001']]

I do not understand this behavior and am hoping to find a solution to my problem and understand if this type conversion is being done by numpy or by Python. I have tried quite a few variations to this small example but have not found a working solution.

Thanks!



Solution 1:[1]

The issue is in the type of the array: you need to set an array-protocol type string, like <U3: if you change dtype=str to dtype='<U3' it will work.

Solution 2:[2]

Here's a solution:

num_rows = 5
arr = ["010", "011", "111", "100", "001"]

# Turn your array into a numpy array with dtype string.
n = np.array(arr, dtype=str)

# Repeat the row as many times as needed.
n = np.tile(n, num_rows).reshape(num_rows, len(n))

Let me know if you have any questions.

A note for the future is that in most cases, you can replace for loops with NumPy functions, which tend to be faster due to vectorisation.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 lemon
Solution 2 AJH