'Pandas: Unexpected behavior for apply function with torch.tensor()
I confused of the behavior of the panda.apply() function. I want to convert a column containing a list of int to a troch.tensor. Here is some sample code showing the behavior:
df_test = pd.DataFrame([3,3,3], columns=['value'])
df_test.value = df_test.value.apply(lambda x: [y for y in range(x)])
print(df_test)
# Output:
# value
# 0 [0, 1, 2]
# 1 [0, 1, 2]
# 2 [0, 1, 2]
print(df_test.value.apply(lambda x: torch.tensor(x)))
# Output:
# value
# 0 [tensor(0), tensor(1), tensor(2)]
# 1 [tensor(0), tensor(1), tensor(2)]
# 2 [tensor(0), tensor(1), tensor(2)]
print(df_test.value.apply(lambda x: x + [12]))
# Output:
# 0 [0, 1, 2, 12]
# 1 [0, 1, 2, 12]
# 2 [0, 1, 2, 12]
print(torch.tensor([1,2,3]))
# Output:
# tensor([1, 2, 3])
I would have expected, one tensor with three elements per row element, but instead the apply creates a list of tensors containing one element. For testing, I added an example that adds an element to the list, to ensure, that x is the list itself. As you can see it behaves as expected. Can anyone explain the behavior?
Is there a workaround? I don't want to use torch.tensor(df.values), since I need to apply the tensor transformation to multiple columns and want to keep them in the dataframe. Thanks!
Solution 1:[1]
The reason is that apply function converts implicitly a tensor to list because the type of df_test.value[0] is a list. When you convert a tensor to a list, here is a result:
print(df_test.value[0]) # list
x = torch.tensor([1,2,3])
print(list(x)) # convert a tensor to a list
[tensor(1), tensor(2), tensor(3)]
You expected tensor([1, 2, 3]) replacing each list in df_test["value"]. But do not forget the column type will be tensor, which is not valid type in pandas.
To solve this problem is to convert a dataframe to NumPy array and then to a tensor. Then you can do all your transformations and then convert it again to NumPy to pandas.
If you try this code:
df_test["new"]= torch.tensor([1,2,3])
type(df_test.new.dtype) # it is not tensor but NumPy which is implicit conversion
numpy.dtype[int64]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Phoenix |
