Verifying a 3 Meter Walk
In my research with the Neuroscience of Dance in Health and Disability Laboratory, I was tasked with verifying that the recorded section of motion capture data of the participant had truely traveled 3 meters. We chose 3 meters as it is the best sampled portion of the walk for analysis and was consistent with past studies. I wrote this in hopes that this would be used in future studies so that a researcher would not need to calculate by hand when verifying and picking a section of the recording.
Introduction
In my research with the Neuroscience of Dance in Health and Disability Laboratory, I was tasked with verifying that the recorded section of motion capture data of the participant had truely traveled 3 meters. We chose 3 meters as it is the best sampled portion of the walk for analysis and was consistent with past studies. I wrote this in hopes that this would be used in future studies so that a researcher would not need to calculate by hand when verifying and picking a section of the recording.
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
import pandas as pd
import random
import glob
General Function That Calculates and Outputs Differences
In this function, all files in with the tsv extension in the same directory as this notebook file are loaded into a list called filenames
, after being loaded into the list, the first for loop iterates through the filenames
list and makes a dataframes
list with all the tsv files converted into Pandas DataFrames.
The Function takes an input called marker
, which is used to indicate which column we are interested in.
The we use the second loop to iterate through the dataframes
list. In each iteration, the difference between the first and last indicies of right heel is calculated and stored in the diff
variable. Next we initialize a dictionary d
that has the current filename (filenames[i]
) and the calculated difference (diff
). At the end of the iteration, the dictionary entry will be appended to a list called data
.
At the end of the function , the DataFrame df_all
is returned giving a DataFrame with the file name and calculated displacement for each row
def walkDisplacement(marker):
filenames = glob.glob("*.tsv")
dataframes = []
data = []
allowedMarkers = [] # going to have a if statement to check if marker variable exists, otherwise python will just yell
for f in filenames:
dataframes.append(pd.read_csv(f,delimiter='\t',encoding='utf-8'))
for i in range(len(dataframes)):
diff = metercalc(dataframes[i],marker)
d = {"Filename": str(filenames[i]),"Difference (meters)" : diff}
data.append(d)
df_all = pd.DataFrame(data)
return df_all
def difference(df,marker):
first = df[marker].iloc[0]
last = df[marker].iloc[-1]
diff = last - first
return diff
This function calculates the difference of the first and last rows of a the walk.
The function is an adaptation of the logic Andrea wrote for calculating the amount of distance walked based on the position data
I added a second conditional for cases where the difference is negative
Example for TDP MS W MRI 001 10MWT2014223mmatlab.qtm
- Get the starting position xi (=2717.08) of the right heel
- Calculate the difference d by subtracting 3000mm from xi (because 3m=3000mm) d=2717.08-3000=-282.92mm
- Get the finish position at the end of the crop section of the right heel xf (=-321.77)
- If xf doesn’t equal d then check the exact difference. For example d=-282.92mm and xf=-321.77 get the difference between them –38.85.
- The exact difference will now be 3000+38.85=3038.85mm which equals 3.038m
def metercalc(df,marker):
xi = df[marker].iloc[0]
d = xi - 3000
xf = df[marker].iloc[-1]
if (xf != d): # checks if xi is equal to xf
diff = xf - d # if not then it will take the difference between the two values
if(diff < 0): # checks if the difference is negative
exact = 3000 - (diff)
return exact/1000 # will return flipped difference if negative
return diff/1000 # will return the positive difference
walkDisplacement("right heel")
numbers =[1,2,3,4,5,6,7,8,9]
df_n = pd.DataFrame({"right heel": numbers})
df_n
This was a simulated data set I generated to test the metercalc function for values over 0
data = []
for i in range(10):
lower = random.randint(1000,6000)
d = {"right heel": lower}
data.append(d)
sim = pd.DataFrame(data)
sim
Andrea's example
ae = [2717.08,-321.77]
df_ae = pd.DataFrame({"right heel" : ae})
df_ae
These lines of code were to test out the .iloc function, I tried it here first before implementing it to the difference function
first = df_n["right heel"].iloc[0]
last = df_n["right heel"].iloc[-1]
diff = last - first
diff
I wrote this test case initially to see if my difference function outputted the correct value
the intial value was 1 and final value was 9, the difference that should be outputted is 8 = (9-1)
right_diff = difference(df_n, "right heel")
print(right_diff)
I wrote this test case to check the metercalc function was outputting the correct value given random data that is in the ranges of 1000-6000
actual_test = metercalc(sim, "right heel")
actual_test
Testing ways to import the data
I wanted to make a dataframe that outputted the values, this is still under construction.
Right now it is a for loop that takes a list of dataframes, runs the metercalc function on each entry and outputs a dataframe of the calculations. this section is my tests I did to make the general function
If I am given more time, I will turn this into a function that can take all the tsv files in a directory, convert them into dataframes, and make a list of all the dataframes that will be processed
data = []
for i in range(len(dataframes)):
diff = metercalc(dataframes[i], "right heel")
d = {"filename": str(filenames[i]), "Difference (meters)": diff}
data.append(d)
df_all = pd.DataFrame(data)
df_all
I did some research and found a page that showed how to import files and load it into a data frames
I asked for some help from sigpwny and rats_irl helped me write some code based on the link I showed to him
filenames
is a variable that stores a list of the tsv files of a directory
dataframes
is a list
the for loop iterates through filenames
and for each entry, and converts the tsv files into dataframes
with this I can make a general function
filenames = glob.glob("*.tsv")
dataframes = []
for f in filenames:
dataframes.append(pd.read_csv(f,delimiter='\t',encoding='utf-8'))
print("name of the file = " + str(filenames[0]))
display(dataframes[0])
for i in range(len(dataframes)):
display(dataframes[i])