'Correct function for dataset fragmentation?

I am trying to create a trained LSTM-model for Human Activity Recognition.

The dataset have 11 features (8 piezo-sensors and 3 acc-gyro axis) and four classes: Walking, Jogging, StairsUp and standing. The sampling rate is 1750Hz.

The dataset goes through a fragmentaion function to have a final shape of (number_of_blocks, 3500, 11) for x_train.

I have two different functions for the fragmentation part but I don't really know which one is more logical to prepare the data for training.

First Function:

def get_segments(dataSet, frame_size, N_FEATURES, step):

frames=[]
labels=[]
for i in range(0, len(dataSet) - frame_size, step):
    x= dataSet['x'].values[i: i+frame_size]
    y= dataSet['y'].values[i: i+frame_size]
    z= dataSet['z'].values[i: i+frame_size]
    S1= dataSet['S1'].values[i: i+frame_size]
    S2= dataSet['S2'].values[i: i+frame_size]
    S3= dataSet['S3'].values[i: i+frame_size]
    S4= dataSet['S4'].values[i: i+frame_size]
    S5= dataSet['S5'].values[i: i+frame_size]
    S6= dataSet['S6'].values[i: i+frame_size]
    S7= dataSet['S7'].values[i: i+frame_size]
    S8= dataSet['S8'].values[i: i+frame_size]
    
    
    label = stats.mode(dataSet['label'][i: i+frame_size])[0][0]
    frames = np.append(frames,[x,y,z,S1,S2,S3,S4,S5,S6,S7,S8])
    labels.append(label)
    
frames= np.asarray(frames).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(pd.get_dummies(labels), dtype = np.float32)        
return frames, labels 

The second function :

def get_segments (dataSet, frame_size, N_FEATURES, step):
segments = []
labels = []
for i in range(0, len(dataSet) - frame_size, step):
    
    for j in range (i, i + frame_size):
        xs = dataSet['x'].values[j]
        ys = dataSet['y'].values[j]
        zs = dataSet['z'].values[j]
        S1= dataSet['S1'].values[j]
        S2= dataSet['S2'].values[j]
        S3= dataSet['S3'].values[j]
        S4= dataSet['S4'].values[j]
        S5= dataSet['S5'].values[j]
        S6= dataSet['S6'].values[j]
        S7= dataSet['S7'].values[j]
        S8= dataSet['S8'].values[j]
        segments.append([[xs], [ys], [zs], [S1], [S2], [S3], [S4], [S5], [S6], [S7], [S8]])
        
    label = stats.mode(dataSet['label'][i: i + frame_size])[0][0]
    labels.append(label)
    
    

reshaped_segments = np.asarray(segments, dtype= np.float32).reshape(-1, frame_size, N_FEATURES)
labels = np.asarray(pd.get_dummies(labels), dtype = np.float32)

return reshaped_segments, labels

with frame_size= 3500 (corresponding to two seconds of sampling) and step=3000

I am currently using the first function, but I want to make sure does it make more sense ?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source