'Pandas: Create dataframe based on specific columns from another dataframe
I want to create the clinical
dataframe with a sex
column based on the Sex
column in the raw_clinical_patient
dataframe.
import pandas as pd
raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4) # Skip first 4 rows
clinical = pd.DataFrame()
clinical["sex"] = raw_clinical_patient.loc[:,"Sex"]
clinical["last_fu"] = raw_clinical_patient.loc[:,"Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]
Traceback:
KeyError: 'Sex'
Solution 1:[1]
It's case sensitive, so I think there probably is a sex
column in your raw_clinical_patient
data frame rather than a Sex
column.
Solution 2:[2]
You may simply write
clinical=raw_clinical_patient[["Sex","Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]]
clinical.columns=['sex','last_fu'] #rename accordingly
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | OD1995 |
Solution 2 | Grall |