'Sequence representation method for a random sequences
I have a set of sequences of ABCD units, for example let A = 0, B = 1, C = 2, D = 4. I can represent a sequence AAABBBCCCDDD as a vector [000111222444] with no problem. However, what if I only know the % of ABCD units, but the sequence is inherently random (statistically random) the distribution of ABCD could be anything. What would be the best way to represent such a sequence in Python?
The end goal would be to feed such sequences to the machine learning model.
Thank you for your help!
Vito
Solution 1:[1]
You can generate random sequences with the right proportion of letters A, B, C, D like so:
import random
length = 10
ratio_a = 0.4
ratio_b = 0.3
ratio_c = 0.2
ratio_d = 0.1
population = (
'A' * round(ratio_a * length) +
'B' * round(ratio_b * length) +
'C' * round(ratio_c * length) +
'D' * round(ratio_d * length)
)
seq = random.sample(population, k=length)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Jacques Gaudin |
