Nutrient Density and Value Data Visualization
This is a project I made to analyze the amount of protein the various animal and vegetable products people can consume.
Table of Contents
- 1 Nutrient Density and Value Data Visualization
- 2 About
- 3 Importing libraries
- 4 Raw Data
- 5 Largest values
- 6 Plotting Function
- 7 Charts
- 7.1 All Energy Values Comparison
- 7.2 All Protein Values Comparison
- 7.3 All Fat Values Comparison
- 7.4 All Saturated Fat Values Comparison
- 7.5 All Cholesterol Values Comparison
- 7.6 All Vitamn Values Comparison
- 7.7 All Sodium Values Comparison
- 7.8 All Phosphorus Values Comparison
- 7.9 All Iron Values Comparison
- 7.10 All Zinc Values Comparison
- 7.11 All Retail Cost Comparison
- 8 Testing
- 9 Previous versions of functions
About
This is a project I made to analyze the amount of protein the various animal and vegetable products people can consume.
In this post, I learned the basics of using matplotlib. A future post might be refining this post
The data was imported from
Bohrer, B. M. (2017). Review: Nutrient density and nutritional value of meat products and non-meat foods high in protein. Trends in Food Science & Technology, 65, 103-112. doi:10.1016/j.tifs.2017.04.016
.
import pandas as pd
import matplotlib as plt
import matplotlib.pyplot as plt; plt.rcdefaults()
import matplotlib.pyplot as plt
import numpy as np
df1 = pd.read_csv("Nutrition.csv", encoding= 'unicode_escape')
df2 = pd.read_csv("nutrition-cost.csv", encoding= 'unicode_escape')
# categories
meat = "Meat, raw/unprepared unless noted otherwise"
fish = "Fish, raw/unprepared"
non_meat = "Non-meat, raw/unprepared"
from IPython.core.display import display, HTML
display(HTML("<style>.container { width:100% !important; }</style>"))
df1
df2
df2.nlargest(n = 10, columns = "Energy value (kcal/US$)")
Plotting Function
These are functions I wrote to plot energy value and protein content.
Arguments
The arguments include a dataframe, and a string variable called category
, the variable category
helps determine whether we will plot a specific protein product or if it will plot all graphs.
What Occurs During a Function Call
When the function is called, the variables skipCheck
, categoryCase
, and columnTitleCase
are initialized the variables that end in case
, convert the lowercase to a title to match the dictionary entry. Next, a dictionary is initialized for all the categories of the protein products. In the next line, we will use a if
statement to check which dataframe we are processing, since df2
has the retail cost, we will look for retail cost
, if the column exists, we will set columnDictionary
to the second columnCostDictionary
we initialized, otherwise the default dictionary will be columnRegularDictionary
. After initializing the dictionary, we will check to see if the argument for category
exists in the dictionaries we initialized previously. If category
is not in columnDictionary
we will have the function exit and spit a message telling the user they made a misspelling.
Once we finish the verification, we will set the plot specific setttings. The first if
statement will check if categoryCase
(the first letters of all words capitalized), is in the dictionary categoryDictionary
. If categoryCase
matches Meat
, Fish
, or Non_Meat
, we will have it plot only for the specifc protein product. Otherwise, we will use the settings for a graph containing all protein products.
With the plot specifc settings complete, we can set the general plot settings and return the plot object. Outside of the function, we will use the .show()
command to display the plot we generated. I left the .show()
outside of the function so that I could prevent myself from running out of cpu.
def plotNutrition(df, category, columnTitle):
"""
This script will plot a graph from a the following variables
df, dataframe to be plotted
category, the following arguments are allowed:
Meat
Fish
Non_Meat
ColumnTitle , columns allowed to be plotted
"""
# Variable initialization
skipCheck = False
categoryCase = category.title()
columnTitleCase = columnTitle.title()
# Dictionaries to reference
categoryDictionary = {"Meat": "Meat, raw/unprepared unless noted otherwise", "Fish": "Fish, raw/unprepared", "Non_Meat" : "Non-meat, raw/unprepared"}
columnRegularDictionary = {"Energy Value" : "Energy value (kcal)", "Protein": "Protein (g)", "Fat" : "Fat (g)", "Saturated Fat" : "Saturated fat (g)", "Cholesterol" : "Cholesterol (mg)", "Vitamin" : "Vitamin B12 (mcg)", "Sodium" :"Na (mg)", "Phosphorus" : "P (mg)", "Iron" : "Fe (mg)", "Zinc" : "Zn (mg)"}
columnCostDictionary = {"Retail Cost" : "Retail Cost/100\xa0g (US$)", "Energy Value" : "Energy value (kcal/US$)", "Protein" : "Protein (g/US$)", "Vitamin" : "Vitamin B12 (mcg/US$)", "Phosphorus" : "P (mg/US$)", "Iron" : "Fe (mg/US$)", "Zinc" : "Zn (mg/US$)" }
# Check if which dataframe we are using, if there is "Retail Cost/100 g (US$)", we will use the cost dictionary
allList = df.columns.tolist()
if("Retail Cost/100\xa0g (US$)" in allList):
columnDictionary = columnCostDictionary
else:
columnDictionary = columnRegularDictionary
# Check if the arguement for category is exists in the dictionary, if it is not in the dictionary, it will exit the function
if(categoryCase == "All"):
skipCheck = True
if((categoryCase not in categoryDictionary) & (skipCheck != True)):
return print("spelling error in category argument, your category argument was " + categoryCase)
# Initialize all dataframes needed for plotting, if the category name is inside of the category dictionary, we change the plot settings to focus only on the category name
if(categoryCase in categoryDictionary):
df_specific = df[df["Category"] == categoryDictionary[categoryCase]].sort_values(by = [columnDictionary[columnTitleCase]])
objects = df_specific["Product"]
performance = df_specific[columnDictionary[columnTitleCase]]
else:
df = df.sort_values(by = [columnDictionary[columnTitleCase]])
objects = df["Product"]
performance = df[columnDictionary[columnTitleCase]]
plt.tick_params(axis='y', which='major', labelsize=6)
#plt.tick_params(axis='x', which='major', labelsize=6)
#Plotting settings
y_pos = np.arange(len(objects))
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xticks(rotation=45)
plt.xlabel(columnDictionary[columnTitleCase])
plt.title(categoryCase + ' ' + columnTitleCase + ' Comparison')
plt.figure(dpi=900)
return plt
plotNutrition(df1, "all", "energy value").show()
plotNutrition(df1, "meat", "energy value").show()
plotNutrition(df1, "fish", "energy value").show()
plotNutrition(df1, "non_meat", "energy value").show()
plotNutrition(df1, "all", "protein").show()
plotNutrition(df1, "meat", "protein").show()
plotNutrition(df1, "fish", "protein").show()
plotNutrition(df1, "non_meat", "protein").show()
plotNutrition(df1, "all", "fat").show()
plotNutrition(df1, "meat", "fat").show()
plotNutrition(df1, "fish", "fat").show()
plotNutrition(df1, "non_meat", "fat").show()
plotNutrition(df1, "all", "saturated fat").show()
plotNutrition(df1, "meat", "saturated fat").show()
plotNutrition(df1, "fish", "saturated fat").show()
plotNutrition(df1, "non_meat", "saturated fat").show()
plotNutrition(df1, "all", "Cholesterol").show()
plotNutrition(df1, "meat", "Cholesterol").show()
plotNutrition(df1, "fish", "Cholesterol").show()
plotNutrition(df1, "non_meat", "Cholesterol").show()
plotNutrition(df1, "all", "Vitamin").show()
plotNutrition(df1, "meat", "Vitamin").show()
plotNutrition(df1, "fish", "Vitamin").show()
plotNutrition(df1, "non_meat", "Vitamin").show()
plotNutrition(df1, "all", "Sodium").show()
plotNutrition(df1, "meat", "Sodium").show()
plotNutrition(df1, "fish", "Sodium").show()
plotNutrition(df1, "non_meat", "Sodium").show()
plotNutrition(df1, "all", "Phosphorus").show()
plotNutrition(df1, "meat", "Phosphorus").show()
plotNutrition(df1, "fish", "Phosphorus").show()
plotNutrition(df1, "non_meat", "Phosphorus").show()
plotNutrition(df1, "all", "Iron").show()
plotNutrition(df1, "meat", "Iron").show()
plotNutrition(df1, "fish", "Iron").show()
plotNutrition(df1, "non_meat", "Iron").show()
plotNutrition(df1, "all", "Zinc").show()
plotNutrition(df1, "meat", "Zinc").show()
plotNutrition(df1, "fish", "Zinc").show()
plotNutrition(df1, "non_meat", "Zinc").show()
allRetailCost = ["retail cost", "energy value", "protein", "vitamin", "phosphorus", "iron", "zinc"]
for i in range(len(allRetailCost)):
plotNutrition(df2, "all", allRetailCost[i]).show()
plotNutrition(df2, "meat", allRetailCost[i]).show()
plotNutrition(df2, "fish", allRetailCost[i]).show()
plotNutrition(df2, "non_meat", allRetailCost[i]).show()
test1 = False
test2 = False
name = "meat"
name2 = "Saturated Fat"
ref2 = {"Meat": "Meat, raw/unprepared unless noted otherwise", "Fish": "Fish, raw/unprepared", "Non_meat" : "Non-meat, raw/unprepared"}
if(name.title() in ref3):
test1 = True
print(test1)
ref = {"meat": "Meat, raw/unprepared unless noted otherwise", "fish": "Fish, raw/unprepared", "non_meat" : "Non-meat, raw/unprepared"}
ref1 = {meat: "Meat", fish: "Fish", non_meat: "Non_meat"}
ref2 = {"Meat, raw/unprepared unless noted otherwise" : "meat", "Fish": "Fish, raw/unprepared", "Non_meat" : "Non-meat, raw/unprepared"}
ref3 = {"retail cost" : "Retail Cost/100 g (US$)", "energy value" : "Energy value (kcal/US$)", "protein" : "Protein (g/US$)", "vitamin" : "Vitamin B12 (mcg/US$)", "phosphorus" : "P (mg/US$)", "iron" : "Fe (mg/US$)", "zinc" : "Zn (mg/US$)" }
ref4 = {"Energy Value" : "Energy value (kcal)", "Protein": "Protein (g)", "Fat" : "Fat (g)", "Saturated Fat": "Saturated fat (g)", "Cholesterol" : "Cholesterol (mg)", "Vitamin B12" "Vitamin B12 (mcg)" "Sodium" :"Na (mg)", "Phosphorus" : "P (mg)", "Iron" : "Fe (mg)", "Zinc" : "Zn (mg)"}
#ref1["Meat, raw/unprepared unless noted otherwise"]
ref["meat"]
ref3["energy Value".casefold()]
ref4["Saturated Fat"]
if(name2.title() in ref4):
test1 = True
print(test1)
category = "meat"
columnTitle = "retail cost"
categoryCase = category.casefold()
columnTitleCase = columnTitle.casefold()
categoryDictionary = {"meat": "Meat, raw/unprepared unless noted otherwise", "fish": "Fish, raw/unprepared", "non_meat" : "Non-meat, raw/unprepared"}
columnDictionary = {"retail cost" : "Retail Cost/100 g (US$)", "energy value" : "Energy value (kcal/US$)", "protein" : "Protein (g/US$)", "vitamin" : "Vitamin B12 (mcg/US$)", "phosphorus" : "P (mg/US$)", "iron" : "Fe (mg/US$)", "zinc" : "Zn (mg/US$)" }
#df_specific = df2[df2["Category"] == categoryDictionary[categoryCase]].sort_values(by = [columnDictionary[columnTitleCase]])
#df_specific
columnDictionary[columnTitleCase]
allList = df2.columns.tolist()
if("Retail Cost/100\xa0g (US$)" in allList):
print("worked")
else:
print("didn't work")
allList
p= []
p.append(ref)
df3 = pd.DataFrame(p)
df3
if(name in p):
test1 =
Previous versions of functions
These are previous iterations of the plotNutrition()
function, I decided to keep them at the end in case you were curious about the evolution of nutrition function. When I started this project, I was had a early deadline. I started with a longer function so that could plot graphs for specific columns.
At the time, I did not realize that lines 11-15 were relatively the same. I also was not very comfortable with substituting dictionary entries for strings. When I had more time, I started to work on making the plot function more modular. In the final product you will notice that I have formal dictionaries calls.
Dictionary calls make the code more modular, but they also abstract the code more. If you are not careful with keeping track of your dictionary calls, diagnosing errors in the code become much harder
def plotEV(df, name):
ref = {"meat": "Meat, raw/unprepared unless noted otherwise", "fish": "Fish, raw/unprepared", "non_meat" : "Non-meat, raw/unprepared"}
caseInsensitiveName = name.casefold()
if(caseInsensitiveName in ref):
df_specific = df[df1["Category"] == ref[caseInsensitiveName]].nsmallest(n = len(df), columns = "Energy value (kcal)")
objects = df_specific["Product"]
y_pos = np.arange(len(objects))
performance = df_specific["Energy value (kcal)"]
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Energy Value (kcal)')
plt.title(name.title() + ' Energy Values Comparison')
plt.figure(dpi=300)
else:
df = df.nsmallest(n = len(df), columns = "Energy value (kcal)")
objects = df["Product"]
y_pos = np.arange(len(objects))
performance = df["Energy value (kcal)"]
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Energy Value (kcal)')
plt.title('All Energy Values Comparison')
plt.tick_params(axis='y', which='major', labelsize=6)
plt.figure(dpi=300)
return plt
def plotProtein(df, name):
ref = {"meat": "Meat, raw/unprepared unless noted otherwise", "fish": "Fish, raw/unprepared", "non_meat" : "Non-meat, raw/unprepared"}
if(name in ref):
df_specific = df[df1["Category"] == ref[name]].nsmallest(n = len(df), columns = "Protein (g)")
objects = df_specific["Product"]
y_pos = np.arange(len(objects))
performance = df_specific["Protein (g)"]
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Protein (g)')
plt.title(name.title( + ' Protein Values Comparison')
plt.figure(dpi=300)
else:
df = df.nsmallest(n = len(df), columns = "Protein (g)")
objects = df["Product"]
y_pos = np.arange(len(objects))
performance = df["Protein (g)"]
plt.barh(y_pos, performance, align='center', alpha=0.5)
plt.yticks(y_pos, objects)
plt.xlabel('Protein (g)')
plt.title('All Protein Values Comparison')
plt.tick_params(axis='y', which='major', labelsize=5)
plt.figure(dpi=300)
return plt