'Pandas: create new column in df with random integers from range
I have a pandas data frame with 50k rows. I'm trying to add a new column that is a randomly generated integer from 1 to 5.
If I want 50k random numbers I'd use:
df1['randNumCol'] = random.sample(xrange(50000), len(df1))
but for this I'm not sure how to do it.
Side note in R, I'd do:
sample(1:5, 50000, replace = TRUE)
Any suggestions?
Solution 1:[1]
To add a column of random integers, use randint(low, high, size). There's no need to waste memory allocating range(low, high); that could be a lot of memory if high is large.
df1['randNumCol'] = np.random.randint(0,5, size=len(df1))
Notes:
- when we're just adding a single column,
sizeis just an integer. In general if we want to generate an array/dataframe ofrandint()s, size can be a tuple, as in Pandas: How to create a data frame of random integers?) - in Python 3.x
range(low, high)no longer allocates a list (potentially using lots of memory), it produces arange()object - use
random.seed(...)for determinism and reproducibility
Solution 2:[2]
An option that doesn't require an additional import for numpy:
df1['randNumCol'] = pd.Series(range(1,6)).sample(int(5e4), replace=True).array
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | shortorian |
