'How do I rename pandas dataframe column?
I want to merge raw_clinical_patient and raw_clinical_sample dataframes.
However, the SAMPLE_ID column in raw_clinical_sample should be relabeled as PATIENT_ID before the merge (because it was wrongly labelled). I used pandas' rename function but it did not change the SAMPLE_ID to PATIENT_ID
I want to merge by the new PATIENT_ID column of the two dataframes.
import pandas as pd
# Clinical patient info
raw_clinical_patient = pd.read_csv("./gbm_tcga/data_clinical_patient.txt", sep="\t", header=4).drop(labels="OTHER_PATIENT_ID", axis=1).set_index("PATIENT_ID")
raw_clinical_patient = raw_clinical_patient.sort_index()
# Clinical sample info
raw_clinical_sample = pd.read_csv("./gbm_tcga/data_clinical_sample.txt", sep="\t", header=4).set_index("SAMPLE_ID").drop(labels=["PATIENT_ID", "OTHER_SAMPLE_ID"], axis=1)
raw_clinical_sample = raw_clinical_sample.sort_index()
raw_clinical_sample.rename(columns={'SAMPLE_ID':'PATIENT_ID'}, inplace=True)
# Merge both dataframes
raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")
raw_clin
Solution 1:[1]
You set SAMPLE_ID as index, so there is no column with that name to change. If you want to change that index name you can go with raw_clinical_sample.rename_axis(index='PATIENT_ID', inplace=True)
btw you don't need to change it because you join on index. By default join joins index-on-index, just skip the on.
Change
raw_clin = raw_clinical_patient.join(raw_clinical_sample, on="PATIENT_ID", lsuffix="_left")
to
raw_clin = raw_clinical_patient.join(raw_clinical_sample, lsuffix="_left")
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
