This function accepts a dataframe of multi-channel signal, segments it into epoch windows with length specified in breaks.

segment_data(df, breaks, st = NULL)

Arguments

df

dataframe. Input dataframe of the multi-channel signal. The first column is the timestamps in POSXlct format and the following columns are accelerometer values.

breaks

character. An epoch length character that can be accepted by cut.breaks function.

st

character or POSIXct timestamp. An optional start time you can set to force the breaks generated by referencing this start time. If it is NULL, the function will use the first timestamp in the timestamp column as start time to generate breaks. This is useful when you are processing a stream of data and want to use a common start time for segmenting data. Default is NULL.

Value

dataframe. The same format as the input dataframe, but with an extra column "SEGMENT" in the end specifies the epoch window a sample belongs to.

How is it used in MIMS-unit algorithm?

This function is a utility function that was used in various part in the algorithm whenever we need to segment a dataframe, e.g., before aggregating values over epoch windows.

Examples

  # Use sample data
  df = sample_raw_accel_data

  # segment data into 1 minute segments
  output = segment_data(df, "1 min")

  # check the 3rd segment, each segment would have 1 minute data
  summary(output[output['SEGMENT'] == 3,])
#>  HEADER_TIME_STAMP       X             Y             Z          SEGMENT   
#>  Min.   :NA        Min.   : NA   Min.   : NA   Min.   : NA   Min.   : NA  
#>  1st Qu.:NA        1st Qu.: NA   1st Qu.: NA   1st Qu.: NA   1st Qu.: NA  
#>  Median :NA        Median : NA   Median : NA   Median : NA   Median : NA  
#>  Mean   :NaN       Mean   :NaN   Mean   :NaN   Mean   :NaN   Mean   :NaN  
#>  3rd Qu.:NA        3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA   3rd Qu.: NA  
#>  Max.   :NA        Max.   : NA   Max.   : NA   Max.   : NA   Max.   : NA  

  # segment data into 15 second segments
  output = segment_data(df, "15 sec")

  # check the 1st segment, each segment would have 15 second data
  summary(output[output['SEGMENT'] == 1,])
#>  HEADER_TIME_STAMP                      X               Y            
#>  Min.   :2016-01-15 11:00:00.00   Min.   :0.129   Min.   :-2.715000  
#>  1st Qu.:2016-01-15 11:00:01.50   1st Qu.:1.004   1st Qu.:-0.309000  
#>  Median :2016-01-15 11:00:03.00   Median :1.205   Median :-0.125000  
#>  Mean   :2016-01-15 11:00:03.00   Mean   :1.273   Mean   :-0.008275  
#>  3rd Qu.:2016-01-15 11:00:04.50   3rd Qu.:1.583   3rd Qu.: 0.011000  
#>  Max.   :2016-01-15 11:00:06.00   Max.   :2.652   Max.   : 2.871000  
#>        Z              SEGMENT 
#>  Min.   :-1.7300   Min.   :1  
#>  1st Qu.:-0.3490   1st Qu.:1  
#>  Median :-0.1840   Median :1  
#>  Mean   :-0.2462   Mean   :1  
#>  3rd Qu.:-0.0740   3rd Qu.:1  
#>  Max.   : 1.1090   Max.   :1  

  # segment data into 1 hour segments
  output = segment_data(df, "1 hour")

  # because the input data has only 15 minute data
  # there will be only 1 segment in the output
  unique(output['SEGMENT'])
#>   SEGMENT
#> 1       1
  summary(output)
#>  HEADER_TIME_STAMP                      X               Y            
#>  Min.   :2016-01-15 11:00:00.00   Min.   :0.129   Min.   :-2.715000  
#>  1st Qu.:2016-01-15 11:00:01.50   1st Qu.:1.004   1st Qu.:-0.309000  
#>  Median :2016-01-15 11:00:03.00   Median :1.205   Median :-0.125000  
#>  Mean   :2016-01-15 11:00:03.00   Mean   :1.273   Mean   :-0.008275  
#>  3rd Qu.:2016-01-15 11:00:04.50   3rd Qu.:1.583   3rd Qu.: 0.011000  
#>  Max.   :2016-01-15 11:00:06.00   Max.   :2.652   Max.   : 2.871000  
#>        Z              SEGMENT 
#>  Min.   :-1.7300   Min.   :1  
#>  1st Qu.:-0.3490   1st Qu.:1  
#>  Median :-0.1840   Median :1  
#>  Mean   :-0.2462   Mean   :1  
#>  3rd Qu.:-0.0740   3rd Qu.:1  
#>  Max.   : 1.1090   Max.   :1  

  # use manually set start time
  output = segment_data(df, "15 sec", st='2016-01-15 10:59:50.000')

  # check the 1st segment, because the start time is 10 seconds before the
  # start time of the actual data, the first segment will only include 5 second
  # data.
  summary(output[output['SEGMENT'] == 1,])
#>  HEADER_TIME_STAMP                      X               Y            
#>  Min.   :2016-01-15 11:00:00.00   Min.   :0.129   Min.   :-2.715000  
#>  1st Qu.:2016-01-15 11:00:01.25   1st Qu.:1.004   1st Qu.:-0.312500  
#>  Median :2016-01-15 11:00:02.50   Median :1.223   Median :-0.125000  
#>  Mean   :2016-01-15 11:00:02.50   Mean   :1.276   Mean   :-0.006125  
#>  3rd Qu.:2016-01-15 11:00:03.74   3rd Qu.:1.582   3rd Qu.: 0.020000  
#>  Max.   :2016-01-15 11:00:04.99   Max.   :2.652   Max.   : 2.871000  
#>        Z             SEGMENT 
#>  Min.   :-1.730   Min.   :1  
#>  1st Qu.:-0.350   1st Qu.:1  
#>  Median :-0.184   Median :1  
#>  Mean   :-0.251   Mean   :1  
#>  3rd Qu.:-0.074   3rd Qu.:1  
#>  Max.   : 1.109   Max.   :1