'Pandas: Create dataframe based on specific columns from another dataframe
I want to create the clinical dataframe with a sex column based on the Sex column in the raw_clinical_patient dataframe.
import pandas as pd
raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4) # Skip first 4 rows
clinical = pd.DataFrame()
clinical["sex"] = raw_clinical_patient.loc[:,"Sex"]
clinical["last_fu"] = raw_clinical_patient.loc[:,"Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]
Traceback:
KeyError: 'Sex'
Solution 1:[1]
It's case sensitive, so I think there probably is a sex column in your raw_clinical_patient data frame rather than a Sex column.
Solution 2:[2]
You may simply write
clinical=raw_clinical_patient[["Sex","Last Alive Less Initial Pathologic Diagnosis Date Calculated Day Value"]]
clinical.columns=['sex','last_fu'] #rename accordingly
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | OD1995 |
| Solution 2 | Grall |
