Introduction

In my research with the Neuroscience of Dance in Health and Disability Laboratory, I was tasked with verifying that the recorded section of motion capture data of the participant had truely traveled 3 meters. We chose 3 meters as it is the best sampled portion of the walk for analysis and was consistent with past studies. I wrote this in hopes that this would be used in future studies so that a researcher would not need to calculate by hand when verifying and picking a section of the recording.

from IPython.core.display import display, HTML  
display(HTML("<style>.container { width:100% !important; }</style>"))

Import libraries for processing

  • pandas for the dataframes
  • random for simulation
  • glob for importing files
import pandas as pd
import random
import glob

General Function That Calculates and Outputs Differences

In this function, all files in with the tsv extension in the same directory as this notebook file are loaded into a list called filenames, after being loaded into the list, the first for loop iterates through the filenames list and makes a dataframes list with all the tsv files converted into Pandas DataFrames.

The Function takes an input called marker, which is used to indicate which column we are interested in.

The we use the second loop to iterate through the dataframes list. In each iteration, the difference between the first and last indicies of right heel is calculated and stored in the diff variable. Next we initialize a dictionary d that has the current filename (filenames[i]) and the calculated difference (diff). At the end of the iteration, the dictionary entry will be appended to a list called data.

At the end of the function , the DataFrame df_all is returned giving a DataFrame with the file name and calculated displacement for each row

def walkDisplacement(marker):
    filenames = glob.glob("*.tsv")
    dataframes = []
    data = []
    allowedMarkers = [] # going to have a if statement to check if marker variable exists, otherwise python will just yell
    
    for f in filenames:
        dataframes.append(pd.read_csv(f,delimiter='\t',encoding='utf-8'))
    
    for i in range(len(dataframes)):
        diff = metercalc(dataframes[i],marker)
        d = {"Filename": str(filenames[i]),"Difference (meters)" : diff}
        data.append(d)

    df_all = pd.DataFrame(data)
    return df_all

Functions For Calculating Differences

This is the function calculates the difference of the first and last rows of a dataframe.

The inputs are a dataframe and a string for the column we are interested in. I called it marker because in our case we are dealing with marker values

def difference(df,marker):
    first = df[marker].iloc[0]
    last = df[marker].iloc[-1]
    diff = last - first
    return diff

This function calculates the difference of the first and last rows of a the walk.

The function is an adaptation of the logic Andrea wrote for calculating the amount of distance walked based on the position data

I added a second conditional for cases where the difference is negative

Example for TDP MS W MRI 001 10MWT2014223mmatlab.qtm

  1. Get the starting position xi (=2717.08) of the right heel
  2. Calculate the difference d by subtracting 3000mm from xi (because 3m=3000mm) d=2717.08-3000=-282.92mm
  3. Get the finish position at the end of the crop section of the right heel xf (=-321.77)
  4. If xf doesn’t equal d then check the exact difference. For example d=-282.92mm and xf=-321.77 get the difference between them –38.85.
  5. The exact difference will now be 3000+38.85=3038.85mm which equals 3.038m
def metercalc(df,marker):
    xi = df[marker].iloc[0]
    d = xi - 3000
    xf = df[marker].iloc[-1]
    if (xf != d): # checks if xi is equal to xf
        diff = xf - d # if not then it will take the difference between the two values
    if(diff < 0): # checks if the difference is negative
        exact = 3000 - (diff)
        return exact/1000 # will return flipped difference if negative
    return diff/1000 # will return the positive difference

The Actual Function in use

To use walkDisplacement, place all raw tsv files into the same directory as this notebook, next, input the column name of the marker you are interested in. The following example will use the right heel

walkDisplacement("right heel")
Filename Difference (meters)
0 test1.tsv 3.859
1 test2.tsv 2.314
2 test3.tsv 3.073
3 test4.tsv 0.760

IF YOU GET A KeyError, THAT MEANS THE MARKER YOU INPUTTED DOES NOT EXIST ON IN THE TSV COLUMNS

Sample Data Tests

I made a sample data set to test the initial difference function

numbers =[1,2,3,4,5,6,7,8,9]
df_n = pd.DataFrame({"right heel": numbers})
df_n
right heel
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9

This was a simulated data set I generated to test the metercalc function for values over 0

data = []
for i in range(10):
    lower = random.randint(1000,6000)
    d = {"right heel": lower}
    data.append(d)
    
sim = pd.DataFrame(data)
sim
right heel
0 5778
1 5214
2 3172
3 5992
4 5218
5 4846
6 5449
7 1962
8 1267
9 1334

Andrea's example

ae = [2717.08,-321.77]
df_ae = pd.DataFrame({"right heel" : ae})
df_ae
right heel
0 2717.08
1 -321.77

These lines of code were to test out the .iloc function, I tried it here first before implementing it to the difference function

first = df_n["right heel"].iloc[0]
last = df_n["right heel"].iloc[-1]
diff = last - first
diff
8

Test cases for the function

Below are tests I ran to check the accuracy of the functions

I wrote this test case initially to see if my difference function outputted the correct value

the intial value was 1 and final value was 9, the difference that should be outputted is 8 = (9-1)

right_diff = difference(df_n, "right heel")
print(right_diff)
8

I wrote this test case to check the metercalc function was outputting the correct value given random data that is in the ranges of 1000-6000

actual_test = metercalc(sim, "right heel")
actual_test
2.698

Testing ways to import the data

I wanted to make a dataframe that outputted the values, this is still under construction.

Right now it is a for loop that takes a list of dataframes, runs the metercalc function on each entry and outputs a dataframe of the calculations. this section is my tests I did to make the general function

If I am given more time, I will turn this into a function that can take all the tsv files in a directory, convert them into dataframes, and make a list of all the dataframes that will be processed

data = []
for i in range(len(dataframes)):
    diff = metercalc(dataframes[i], "right heel")
    d = {"filename": str(filenames[i]), "Difference (meters)": diff}
    data.append(d)

df_all = pd.DataFrame(data)
df_all
filename Difference (meters)
0 test1.tsv 3.859
1 test2.tsv 2.314
2 test3.tsv 3.073
3 test4.tsv 0.760

I did some research and found a page that showed how to import files and load it into a data frames

I asked for some help from sigpwny and rats_irl helped me write some code based on the link I showed to him

filenames is a variable that stores a list of the tsv files of a directory

dataframes is a list

the for loop iterates through filenames and for each entry, and converts the tsv files into dataframes

with this I can make a general function

filenames = glob.glob("*.tsv")
dataframes = []
for f in filenames:
    dataframes.append(pd.read_csv(f,delimiter='\t',encoding='utf-8'))

Checking The Outputs of filenames and dataframes

print("name of the file = " + str(filenames[0]))
display(dataframes[0])
name of the file = test1.tsv
Unnamed: 0 right heel
0 0 3516
1 1 2689
2 2 5181
3 3 1408
4 4 2545
5 5 2229
6 6 5266
7 7 1686
8 8 1558
9 9 4375
for i in range(len(dataframes)):
    display(dataframes[i])
Unnamed: 0 right heel
0 0 3516
1 1 2689
2 2 5181
3 3 1408
4 4 2545
5 5 2229
6 6 5266
7 7 1686
8 8 1558
9 9 4375
Unnamed: 0 right heel
0 0 3030
1 1 5689
2 2 1525
3 3 4448
4 4 5904
5 5 3180
6 6 1661
7 7 2981
8 8 5494
9 9 2344
Unnamed: 0 right heel
0 0 5448
1 1 4231
2 2 5242
3 3 1325
4 4 3561
5 5 2519
6 6 2443
7 7 4251
8 8 3286
9 9 2375
Unnamed: 0 right heel
0 0 5273
1 1 2096
2 2 4345
3 3 2171
4 4 2170
5 5 1516
6 6 4599
7 7 1919
8 8 5891
9 9 3033