'Taking the column names from the first row that has less than x Nan's

I have data as follows:

import pandas as pd

url_cities="https://population.un.org/wup/Download/Files/WUP2018-F12-Cities_Over_300K.xls"
df_cities = pd.read_excel(url_cities)
print(df_cities.iloc[0:20,])

The column names can be found in row 15, but I would like this row number to be automatically determined. I thought the best way would be to take the first row for which the values are non-Nan for less than 10 items.

I combined this answer, to find this answer to do the following:

amount_Nan = df_cities.shape[1] - df_cities.count(axis=1)
# OR df.isnull().sum(axis=1).tolist()

print(amount_Nan)
col_names_index = next(i for i in amount_Nan if i < 3)
print(col_names_index)

df_cities.columns = df_cities.iloc[col_names_index]

The problem is that col_names_index keeps returning 0, while it should be 15. I think it is because amount_Nan returns rows and columns because of which next(i for i in amount_Nan if i < 3) works differently than expected.

The thing is that I do not really understand why. Can anyone help?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source