'Using astype on a koalas column gives strange result of datatype of column as <U0

I have a column in my koalas dataframe called purchase_date. In databricks notebook, with runtime as 10.3, when I do the following lines of code, I get the dtype of the purchase_date column as <U0. I am not able to understand why this is happenning.

My code which caused this is as follows (in Databricks runtime 10.3):

import databricks.koalas as ks

print("Datatype of purchase_date before astype:" , my_ks_dataframe['purchase_date'].dtype)  # Datatype of purchase_date before astype: object

# Using the astype
my_ks_dataframe['purchase_date'] = my_ks_dataframe['purchase_date'].astype('str') 

print("Datatype of purchase_date before astype:" , my_ks_dataframe['purchase_date'].dtype) # Datatype of purchase_date after astype: <U0

I am not sure why I see this behaviour in Databricks runtime 10.3. When I execute the same code in Databricks runtime 8.1, I get the desired datatype for purchase_date as object before and after astype usage.

# print result in Databricks runtime 8.1

Datatype of purchase_date before astype: object
Datatype of purchase_date after astype: object




Solution 1:[1]

Koalas is included on clusters running Databricks Runtime 7.3 through 9.1. For clusters running Databricks Runtime 10.0 and above, use Pandas API on Spark instead.

Reference :- https://docs.microsoft.com/en-us/azure/databricks/languages/koalas

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 PratikLad-MT