'Populate empty pandas dataframe with specific conditions
I want to create a pandas dataframe where there are 5000 columns (n=5000) and one row (row G). For row G, 1 (in 10% of samples) or 0 (in 90% of samples).
import pandas as pd
df = pd.DataFrame({"G": np.random.choice([1,0], p=[0.1, 0.9], size=5000)}).T
I also want to add column names such that it is "Cell" followed by "1..5000":
Cell1 | Cell2 | Cell3 | Cell5000 | |
---|---|---|---|---|
G | 0 | 0 | 1 | 0 |
Solution 1:[1]
The columns will default to a RangeIndex
from 0-4999. You can add 1 to the column values, and then use DataFrame.add_prefix
to add the string "Cell" before all of the column names.
df.columns += 1
df = df.add_prefix("Cell")
print(df)
Cell1 Cell2 Cell3 ... Cell5000
G 0 0 0 ... 0
For a single-liner, you can also add 1 and prefix with "Cell" by converting the column index dtype manually.
df.columns = "Cell" + (df.columns + 1).astype(str)
To make a single row DataFrame, I would construct my data with numpy
in the correct shape instead of transposing a DataFrame. You can also pass in the columns as you want them numbered and the index labelled.
import numpy as np
import pandas as pd
df = pd.DataFrame(
np.random.choice([1,0], p=[.1, .9], size=(1, size)),
columns=np.arange(1, size+1),
index=["G"]
).add_prefix("Cell")
print(df)
Cell1 Cell2 Cell3 ... Cell4999 Cell5000
G 0 0 0 ... 0 0
Solution 2:[2]
Another Method could be:
size = 5000
pd.DataFrame.from_dict(
{"G": np.random.choice([1,0], p=[0.1, 0.9], size=size)},
columns=(f'Cell{x}' for x in range(1, size+1)),
orient='index'
)
Output:
Cell1 Cell2 Cell3 Cell4 Cell5 Cell6 Cell7 Cell8 Cell9 ... Cell4992 Cell4993 Cell4994 Cell4995 Cell4996 Cell4997 Cell4998 Cell4999 Cell5000
G 0 0 0 0 0 1 0 1 0 ... 0 0 0 0 0 0 0 0 0
[1 rows x 5000 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | |
Solution 2 | BeRT2me |