To Serve or Not to Serve?
Using Exploratory Data Analysis and Hypothesis testing to decide whether to Bartend or Serve tables
This expedition began with a search for restaurant data. After having served and bartended for so long, I know restaurants generate a lot of data and have been eager to see how they can put it to use!
Specifically, I was ready to get my hands dirty with all types of crazy stuff like:
- Product Sales Mix
- Beverage Cost Calculators
- Table turn time
- Guest satisfaction
- Price points and menu mix
- Server competency
- Labor cost & food cost
Alas, apparently a lot of data is proprietary business intelligence, and not so easy to find. It’s like passing by a restaurant you can’t afford (yet…) — looks great, but totally inaccessible! So this dining data dive is BYOD (Bring Your Own Data).
A Brief Introduction to Tip Data
Tipped employees (mostly restaurant front-of-house workers…) are required by the IRS to track 100% of their tips in order to accurately declare their income. Yes, the restaurant will track our tips for us as well, but there are various opportunities for imperfection on their end. For example the restaurant in this dataset only tracks credit card tips automatically, so it’s up to each server / bartender to calculate our own “tip-out” — the amount we share with the people who bring our food, the people who bus our tables, sometimes the bartender, and I’ve heard in some places the host as well. I could write a million theses on the topic, but apparently data analysis is about shrinking the scope of analysis, not expanding it.
TL;DR — The IRS makes me do data entry every shift I work! It ends up on a sheet like this, except handwritten:
As a huge fan of productivity, I was NOT a fan of writing data down on a pen and paper in 2021 — how useless! So I started using the ServerLife app (HIGHLY recommend for all my industry folk out there!) a while ago. The main motivation in effectively collecting this data was to ensure that I was making the most money possible per shift/hour worked. After “graduating” from server status to bartender status, and as a bartender evaluating management opportunities, I needed to be able to quickly and effectively analyze my tips to make the most prudent choices in which shift I would work — ServerLife app captured SO much detail, so that’s the dataset I have now.
The major caveat here is that this was manual data entry. The data entered will reflect the priority I placed on tips taken home per hour worked. Before analyzing the data, my personal “business” question was something like:
How can I make the most money?
(Trick question — the real answer is ALL of them :-P ). …that’s not a real business question. Lacks scope, context, specificity, resource restrictions…
For now, let’s just see what it looks like…
From here we see 24 columns! Since I can’t see them all at once, I’ll print them:
Looking at these columns I’ve decided to remove several that I know are irrelevant:
It looks like the Hourly Wage column is partially filled in from times that I manually entered the tipped minimum wage. So I’m going to eliminate this, and create a new “Hourly” column that has a constant hourly wage of 3.89 plus whatever I earned in tips, divided by the number of hours worked.
And finally, I’ll create and explore a new pandas dataframe with just what I think will be helpful:
Whew! A lot of data cleaning to do there…I would never hire myself for data entry if I received something like this!
Cleaning the data allowed for great visualizations and summary statistics. After having bartended and served for long enough, I’ve developed certain intuitions on which shifts are more lucrative or worth my time. While this does introduce a certain amount of bias into the analysis of the data, I wanted to see what new information I might be able to glean from the data. Learned a few things:
From the pairplot
- — Data entry was lacking! Looking at the graphs in the “Covers” column, there are a ton of values at 0. Sometimes, I just didn’t have the time to count my exact covers, or I decided it was unimportant for the IRS. Unfortunately, this means that I cannot dig further into the sales/cover (e.g., how many Mango Margaritas did I get my friends to buy in one sitting), or the amount of covers/hour (e.g., how efficient/productive I was during the shift).
- — For most of the quantitative data, graphed against itself, it looks like data surrounds what may be a tighter linear regression for smaller and mid-size values, but as it gets higher becomes more dispersed.
From using Pandas’ groupby function (by position, shift)
- - — The one manager data point was from a different restaurant, but the amount paid per shift was the same. So while I would normally toss it, in this case I kept it since it might provide a decent bench mark in wage comparisons when making the decision on whether or not to jump from bartender/server to manager.
- — The hourly mean is not nearly as far apart as I had expected between positions! It’s possibly not even statistically significant, something we can verify or not verify in a couple steps.
- — Between shifts, PM close clearly stole the show , but not more than ^ away from the mean of lunch close. Further, the standard deviation on PM close shifts demonstrate that the range of expected tips varies greatly from the center. As one classmate put it: just a glance at this deviation seen in PM close demonstrates that the shift is high risk high reward.
My last steps after importing these modules and observing the data provide a foundation upon which to continue statistical analysis. Questions I may try to answer:
- Is there a relationship between sales, hourly
2. Is lunch close more lucrative than pm close, based on hourly?
3. Regression analysis — is the extent in variance between certain items strong (or even existent)?
4. Hypothesis Testing — Do bartenders make more than servers?
Going with number 4: H0 — There is no difference in hourly wage; H1 there is a difference in hourly wage.
This data is much cleaner, and keeps only the shifts worked as bartender or server, enabling hypothesis testing through scipy:
Because pvalue = .16 is higher than alpha threshold of .05, we CANNOT invalidate the null hypothesis. In other words, there is no statistically significant difference in hourly wage between serving and bartending.