Forecasting eCommerce Sales - Using data science to add value to your digital data
Posted 16 months ago
6 Minute(s) to read
Estimated read time: 4 min, 58 secs
Collecting historical analytics data can provide extremely valuable insights into how your business has been performing. However, organisations are realising that is more and more crucial to consider how the business has been performing, and will perform in the future.
Imagine for a second that your business is a car and you’re behind the wheel - to steer yourself successfully from A to B you will definitely need your mirrors, but the most important place you should be looking is at what is in front of you!
So, what sort of questions can forecasting help me solve?
There is a whole range of both simple and complex business questions that can be solved with forecasting, such as:
What is an effective stretch target to set my sales team next month?
I want to justify hiring additional employees, but how can I know what is likely to happen next quarter with our sales results.
Every rainy day it feels like our sales tank, how can we measure the impact weather will have on our sales next month?
Will our sales keep trending up next month similar to the last few months?
I can kind of guess what our sales will be next month, but how much variation is there in this guess?
What is our likely worst case scenario next month?
Let’s get forecasting!
Now I am going to show you some easier forecasting methods. I’m going to use the Google Analytics demo account which shows data from the Google Merchandise Store (that sells Google-branded merch funnily enough!). This account is really helpful for our examples as this account’s data shows a clear representation of how an eCommerce site’s data will look.
So as you can see below I’ve selected two months of daily transactions for July and August.
From the two months of data I’ve selected I want to generate a forecast for September.
Let’s start with the most recent day of transactions as this is the ‘freshest’ data. This should be the most reliable, right?
Not really, this is actually a very ‘naive’ way of forecasting. While this method can work well in some contexts, such as for financial markets, it doesn’t quite work for the majority of businesses. This is because it is too sensitive to daily fluctuations.
Why don’t we try to take the average of the data we have to make our forecast more ‘robust’?
Now this looks more accurate! This also matches our rough guess of a typical day of transactions based on our historical data..
Although, there is still one clear weakness - the line is completely flat. Obviously daily sales numbers fluctuate, with some days having lots of transactions, and others having relatively few.
The next question to ask might be if there is any pattern to these fluctuations?
Given we have daily data, we might assume our data follows some sort of weekly seasonal cycle. For sales and consumer behaviour this assumption is fairly reasonable - our purchase behaviour will be different on weekdays compared to weekends etc.
Now let’s try grouping our historical sales data by the day of week the transaction occurred on and compare them.
Clearly weekends see fewer transactions than weekday traffic. Armed with this new insight we can make our forecast more accurate. Instead of basing our forecast on yesterday’s data, let’s try to make it equal the same day of week in the most recent week of data we observed. This will help us capture this ‘seasonal’ pattern in the data.
This looks great! It picks up the most recent week’s variation nicely.
But if we look closely we can see it is effectively a copy of last week’s data. While we know intuitively more recent data is better than old, stale data, there is also merit in using historical data to capture longer term trends and patterns.
It would be nice to use a forecasting method that blends long term trends with the recent, and more short-term movements.
Enter the Holt-Winters Forecasting method!!!
This method is super useful as it uses a technique based on exponential smoothing. This works like a type of weighted average of your historical data, however these weights decay exponentially as we go back in time. We can control (and even algorithmically determine) the parameters that influence this smoothing. This in turn allows us to change the relative importance of recent versus historical data in our forecast. In this case, we can similarly include the effects of the long term trend and seasonal patterns.
Nice! Now we get a sense for the weekly seasonal patterns and it also captures the recent slight down trend in the data and extrapolates this forward throughout our forecast model.
One final touch is to recognise that, from a statistical perspective this forecast line just represents the mean or average of a range of possible outcomes. So while it’s useful, we know this line is never going to be completely accurate. We can represent this uncertainty using prediction intervals. Here they are set at the 80% and 95% confidence level (represented by the dark blue and light blue regions respectively).
What did we learn?
Generating a reliable forecast has allowed a much deeper analysis and now enables us to answer questions about the future.
While it was pretty obvious to come up with a basic or heuristic forecast, it was clear that with some more advanced analysis we better captured the nuanced signal in our data.
We now have the data we need to plan ahead and make decisions to save time and money in our operations.
We have only scratched the surface on forecast methods. The methods used here have been adapted from the excellent ‘Forecasting: principles and practice’ by Hyndman and Athanasopoulos (2018).
Want to learn more?
For more information on forecasting or how to use data to solve complex problems, get in touch with our data science team today.
Our team of data science experts are constantly helping clients by using data to solve business problems. Regardless of what data you are capturing, whether its in Google Analytics or in a spreadsheet, we can add value and provide solutions to your complex problems.
Google Merchant Store. https://support.google.com/analytics/answer/6367342?hl=en
Hyndman and Athanasopoulos (2018) Forecasting: principles and practice, 2nd edition, OTexts: Melbourne, Australia. https://OTexts.org/fpp2/
Hyndman R, Athanasopoulos G, Bergmeir C, Caceres G, Chhay L, O’Hara-Wild M, Petropoulos F, Razbash S, Wang E, Yasmeen F (2018). forecast: Forecasting functions for time series and linear models. R package version 8.4, <URL: http://pkg.robjhyndman.com/forecast>.
Hyndman RJ, Khandakar Y (2008). “Automatic time series forecasting: the forecast package for R.” Journal of Statistical Software, 26(3), 1-22. <URL: http://www.jstatsoft.org/article/view/v027i03>