'concatenate 2 vaex dataframes causing columns issue

I am facing some issues concatenating 2 vaex data frames. When I concat both data frames, the column names are ignored.

First I read a CSV file using vaex

>>> import vaex as vx

>>> df = vx.read_csv("fl_name", header=None)

>>> df.column_names

['0', '1', '2', '3', '4', '5', '6']

Then I try to concat this data frame to an existing one

>>> df_original

['A', 'B', 'C', 'D', 'E', 'F', 'G']

To enable that, I matched the new data frame column names to existing ones

>>> df.column_names = df_original.column_names

>>> df_original.concat(df)

When I checked the resulting data frame columns, I got

['A', 'B', 'C', 'D', 'E', 'F', 'G', '0', '1', '2', '3', '4', '5', '6']

Is there any way to solve this issue and make vaex respect the column names?



Solution 1:[1]

After searching for some possible solutions, I found one in pandas.

The easiest way would be to read the file using the desired columns names

´´´df = vx.read_csv("fl_name", header=None, names=['A', 'B', 'C', 'D', 'E', 'F', 'G'])´´´

Doing that, the code will work properly.

Solution 2:[2]

I think the best way to do this is using the rename function and the vaex.concat function

import vaex

df1 = vaex.from_dict({
    "0":list(range(5)),
    "1":list(range(5)),
    "2":list(range(5)),
    "3":list(range(5)),
    "4":list(range(5))
})

df2 = vaex.from_arrays(
    A=list(range(5)),
    B=list(range(5)),
    C=list(range(5)),
    D=list(range(5)),
    E=list(range(5))
)

for old, new in zip(df1.get_column_names(), df2.get_column_names()):
    df1.rename(old, new)

vaex.concat([df2, df1])

Note that there have to be the same number of columns in df1 and df2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 alcarnielo
Solution 2 Ben Epstein