'Correct function for dataset fragmentation?
I am trying to create a trained LSTM-model for Human Activity Recognition.
The dataset have 11 features (8 piezo-sensors and 3 acc-gyro axis) and four classes: Walking, Jogging, StairsUp and standing. The sampling rate is 1750Hz.
The dataset goes through a fragmentaion function to have a final shape of (number_of_blocks, 3500, 11) for x_train.
I have two different functions for the fragmentation part but I don't really know which one is more logical to prepare the data for training.
First Function:
def get_segments(dataSet, frame_size, N_FEATURES, step):
frames=[]
labels=[]
for i in range(0, len(dataSet) - frame_size, step):
x= dataSet['x'].values[i: i+frame_size]
y= dataSet['y'].values[i: i+frame_size]
z= dataSet['z'].values[i: i+frame_size]
S1= dataSet['S1'].values[i: i+frame_size]
S2= dataSet['S2'].values[i: i+frame_size]
S3= dataSet['S3'].values[i: i+frame_size]
S4= dataSet['S4'].values[i: i+frame_size]
S5= dataSet['S5'].values[i: i+frame_size]
S6= dataSet['S6'].values[i: i+frame_size]
S7= dataSet['S7'].values[i: i+frame_size]
S8= dataSet['S8'].values[i: i+frame_size]
label = stats.mode(dataSet['label'][i: i+frame_size])[0][0]
frames = np.append(frames,[x,y,z,S1,S2,S3,S4,S5,S6,S7,S8])
labels.append(label)
frames= np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(pd.get_dummies(labels), dtype = np.float32)
return frames, labels
The second function :
def get_segments (dataSet, frame_size, N_FEATURES, step):
segments = []
labels = []
for i in range(0, len(dataSet) - frame_size, step):
for j in range (i, i + frame_size):
xs = dataSet['x'].values[j]
ys = dataSet['y'].values[j]
zs = dataSet['z'].values[j]
S1= dataSet['S1'].values[j]
S2= dataSet['S2'].values[j]
S3= dataSet['S3'].values[j]
S4= dataSet['S4'].values[j]
S5= dataSet['S5'].values[j]
S6= dataSet['S6'].values[j]
S7= dataSet['S7'].values[j]
S8= dataSet['S8'].values[j]
segments.append([[xs], [ys], [zs], [S1], [S2], [S3], [S4], [S5], [S6], [S7], [S8]])
label = stats.mode(dataSet['label'][i: i + frame_size])[0][0]
labels.append(label)
reshaped_segments = np.asarray(segments, dtype= np.float32).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(pd.get_dummies(labels), dtype = np.float32)
return reshaped_segments, labels
with frame_size= 3500 (corresponding to two seconds of sampling) and step=3000
I am currently using the first function, but I want to make sure does it make more sense ?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
