'Pandas: Create dataframe based on specific columns from another dataframe

I want to create the clinical dataframe with a sex column based on the Sex column in the raw_clinical_patient dataframe.

import pandas as pd

raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4) # Skip first 4 rows

clinical = pd.DataFrame()
clinical["sex"] = raw_clinical_patient.loc[:,"Sex"]
clinical["last_fu"] = raw_clinical_patient.loc[:,"Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]

Traceback:

KeyError: 'Sex'


Solution 1:[1]

It's case sensitive, so I think there probably is a sex column in your raw_clinical_patient data frame rather than a Sex column.

Solution 2:[2]

You may simply write

clinical=raw_clinical_patient[["Sex","Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]]
clinical.columns=['sex','last_fu'] #rename accordingly

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 OD1995
Solution 2 Grall