Udacity Project — A/B Testing
As part of my goal to change careers to enter the data field, I’m going back to school at Western Governors University for their Data Management/Data Analytics Bachelor’s Degree. The later classes in the program take you through Udacity’s Data Analyst Nanodegree, broken up into 5 classes. This is the project for class 2 of 5, the Practical Statistics class.
The data for this project consisted of 2 csv files. One called ab_data.csv
that showed an ID, if they were part of the control or treatment group, if they got the new page or the old page, and if they converted. The second was countries.csv
. This held the same ID, and showed the country the user was from.
After importing the libraries, I read in the ab_data.csv
file, then got some information about the dataset
There was also an issue with the data that needed to be looked at. If someone is in the treatment
group, they should also have their landing page as new_page
, then drop those rows.
Then I looked to see if there were any duplicate user_id
rows. One was found, and it was dropped.
I looked into the probability of someone converting regardless of which page they received, then the probability of someone converting if they were in the treatment or control group
And I looked at the split of people who got the new page vs the old page, to see if it was skewed in one direction or the other. I found there was a 50/50 split between people who got the new vs old page, about as perfect of a divide as you can get
A/B test
I’m going to say that the null hypothesis is that Pnew < = Pold, and the alternative hypothesis is that Pnew > Pold. Assuming Pnew = Pold, I came up with the following:
I then simulated a sample for the new_page_converted
and old_page_converted
based on n_new
and n_old
that I got above
After that, I did the following:
Then I created a histogram from that data
Through all this, I’ve found that both versions of the site have a p-value outside of the 5% error rate. In this case, we can fail to reject the null hypothesis and keep the old version of the website. There is no statistical evidence that shows making a change to the site will result in more conversions.