Тёмный

Seasonality and Trend Forecasting using Multiple Linear Regression with Dummy Variables as Seasons 

Excel@Analytics - Dr. Canbolat
Подписаться 12 тыс.
Просмотров 46 тыс.
50% 1

In this video I demonstrate how to download search data from Google Trends, how to use pivot tables to manipulate the data, and how to use a multiple linear regression forecasting model in Excel to capture seasonality and trend.
Warning: This video requires a halftime break that involves coffee and almond croissants.

Опубликовано:

 

27 дек 2018

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 82   
@XXBASSOON1STXX
@XXBASSOON1STXX Год назад
You're an absolute life saver! I would have never even gotten close to completing my assignment without this guide!
@tsibir3800
@tsibir3800 Год назад
WELL DONE Mustafa! Amazing Work! Your Videos Helped me very much with my job as Invetory and a Buyer manager!! We are able to predict the futere to our inventory preety good! Thank's a lot. Love from Greece
@turkanseymurqizihasanzade6608
Than you so so much! I had no idea how I was going to complete my assignment. This video saved my life.!
@emilk4743
@emilk4743 11 месяцев назад
thank you so much! had an assignment for this and couldn't wrap my head around it.
@marthacastrejon9688
@marthacastrejon9688 2 года назад
Thank you so much for this. It was so helpful with a homework assignment.
@brianeasley9343
@brianeasley9343 4 года назад
Fantastic video, thank you!
@almota7
@almota7 5 лет назад
Really nice! Well done.
@manishbhaskar2917
@manishbhaskar2917 5 лет назад
Superb Explanation!
@jthie6802
@jthie6802 2 года назад
Thank you, this video was really helpful for my assigment
@jimmybrooks9812
@jimmybrooks9812 5 лет назад
Thank you for the help!
@warne232000
@warne232000 3 года назад
Super helpful! Thank you 😊 Could you please take us through the regression statistics and how to interpret them? Adjusted R2, coefficients, Anova table etc. Thanks!!
@mcanbolat
@mcanbolat 3 года назад
I already posted four videos on regression analysis. Please see my video list.
@fxcksalt1
@fxcksalt1 3 года назад
Thank you! Subscribed
@LovinduWijesinghe
@LovinduWijesinghe 3 года назад
Great Video..!
@unseen1231
@unseen1231 3 года назад
WOW! Thanks a bunch
@MohammedHawat
@MohammedHawat 4 года назад
thank you, very nice
@Deniz-si8qu
@Deniz-si8qu 3 года назад
teşekkürler, çok yararlı oldu.
@pinkaura6983
@pinkaura6983 3 года назад
Perfect!
@TheRohit9463
@TheRohit9463 3 года назад
superb
@yasserfatehy
@yasserfatehy 4 года назад
Thank u
@6toolbaseball
@6toolbaseball 3 года назад
Great job! Love the MMULT function I haven’t thought of that. Also, I’m surprised you didn’t remove the Nov variable due to its high p-value and re-run the regression analysis. However, then you wouldn’t be able to forecast November. What do you recommend in such situations?
@mcanbolat
@mcanbolat 3 года назад
When a categorical variable is used, you cannot remove the categories just because they have a high p-value. They are the part of the original variable, they are not variables themselves.The meaning of that p-value is quite different here. It means that the category which has a high p-value is not significantly different from the baseline category. So November sales here are not significantly different from December sales. Think it as an ANOVA model.
@6toolbaseball
@6toolbaseball 3 года назад
@@mcanbolat Thank you this is a great lesson to takeaway for future regressions!
@IAKhan-km4ph
@IAKhan-km4ph Год назад
VERY NICE. i NEED ITS MATHS EQUATION
@adirayhan6379
@adirayhan6379 Год назад
Helpful. One question though! what if some of the coefficients are negative?
@karimsebak5960
@karimsebak5960 3 года назад
Hi Mustafa much thx for the great work it really worked fine with me, i have a question is this model can work with any type of analysis? ie, if there is no seasonality or non cyclical trends...etc?
@mcanbolat
@mcanbolat 3 года назад
Thanks. For data with no seasonality or trend, you can use models that are suitable for stationary data. These include moving averages and exponential smoothing. I have videos covering them.
@patriario9379
@patriario9379 4 года назад
Hello Mustafa, this is a great video. It's very helpful for my works. So thank you so much. However, I would like to ask you, why you don't include Dec on your dummy variable? if my data start from Jan, did i must to not include Jan for my dummy variable too or not? what happen if I include Jan as dummy variable? really appreciate your advice
@mcanbolat
@mcanbolat 4 года назад
Thanks. Dec is already included in the model as the baseline. If you want to forecast for December, each of the other dummies will be equal to zero. You can choose any month as your baseline, you just need to use (n-1) dummy variables if there are (n) categories.
@YAlife721
@YAlife721 2 года назад
I have questions.. can u make on inventory forecasting and budget planning
@ganeshn3800
@ganeshn3800 Год назад
Hi Mustfa, thank you. How to do a forecasting when you have commitment months and Lead time. How to predict delays based on this? I don't have order date. Just have commit months, LT and order was on time Or delayed column. How to forecast delays based on LT?
@dalalbackups5572
@dalalbackups5572 2 года назад
Thank you for share this !! I have a question, if I have a data for one year can I find the seasonality index? Or can I find the trend ?
@mcanbolat
@mcanbolat 2 года назад
I would not recommend using it for seasonality as you will have only one data value per season (month). But you can make your seasons as days instead of months if there is a daily seasonal pattern. You can also just look at the overall trend.
@robialmaynaufalmahdy5420
@robialmaynaufalmahdy5420 4 года назад
Nice work! I have a question about time series with multiple variable. What i read from college book about time series, they introduce you to use exactly like you did (using a column of t as period, also other as dummy variables). But a lot of sources online says that it is incorrect to use multiple regression method as it doesn't count time component (or so they say, from my understanding that column t acts as time component). Also it seems that many sources directed me to use vector autoregression, or box jenkin, which are above my capability for now. Does those VAR and box jenkin superior to multiple regression method, maybe in term of forecast accuracy? Let me know what you think. Thank you, sorry for the long post.
@mcanbolat
@mcanbolat 4 года назад
The time component is taken into consideration here with t being an independent variable. We cannot say one method is superior to another one as the performance really depends on the dataset you are using. I think for most cases MLR provides clear and practical results.
@robialmaynaufalmahdy5420
@robialmaynaufalmahdy5420 4 года назад
How about adding another time series data that might be causing demand to fluctuate, for example foreign exchange rate or other economic data? How do i fit those that into this method? And can i use seasonality by including monthly period index (1 to 12 for first to last month and reapeating for the new year, sequential just like period t) as the independent variable?
@mcanbolat
@mcanbolat 4 года назад
You can add other variables too. It would not be problem. It would be a mistake using months as numbers from one to twelve. You cannot assume a linear change in demand when you increase the month value.
@juanpablohorn6642
@juanpablohorn6642 3 года назад
Hi Mustafa! I've watched your video and I found it very interesting. I have a question. Why do we not use December as a dummy variable? You said it is a baseline. What exactly do you mean by this? Since you are basically ignoring all the values for December arent you?
@mcanbolat
@mcanbolat 3 года назад
Hi Juan, it is the way that dummy coding works here. We are using the December values but December is controlled by the intercept. If the month is December, the estimated mean of the first December is the intercept.
@juanpablohorn6642
@juanpablohorn6642 3 года назад
@@mcanbolat so, the values for slope for December are the same as the intercept?
@mcanbolat
@mcanbolat 3 года назад
No, there is no slope for December. The mean estimated value for December is Intercept + Coefficient of t x t
@hoaphanthi5267
@hoaphanthi5267 2 года назад
1, When all of the month index = 0, the month must be December. It didn't disappear, it was just shown in a different way. Hence, the teacher didn't write it in the table. 2, December_value = intercept + error. We always have error values when we estimate. I think so. If anyone has any other ideas, feel free to respond to me. Thank you. P/s: My English is not good, I hope you can understand my explanation.
@andresvega3892
@andresvega3892 4 года назад
When I try to do the regression Excel shows an alert saying that the model contains non.numeric data. When I change the text to numbers it then says that X Range and Y Range must have the same number of rows (sample values), regardless of labels. What am I doing wrong?
@mcanbolat
@mcanbolat 4 года назад
If you are including the titles (the first row) make sure that you check the box "Labels".
@Templarium
@Templarium 4 года назад
I'm getting the same error. I did check 'labels'. Did you figure out how to fix it?
@Templarium
@Templarium 4 года назад
I figured out what the problem is. For the Y range the video makes it seem you're only selecting the top row with the months but you're actually selecting that and the whole dummy variable matrix. I did that and the problem was fixed.
@mohiuddinshojib2647
@mohiuddinshojib2647 Год назад
Hello Mustafa, Can you explain why you don't copy the December, while coping the month and make it transpose ?
@arielaweiner2338
@arielaweiner2338 2 года назад
Hey, why did you use per month averages rather than the sum? If I have daily sales data and want to forecast demand, should I be looking at the average monthly sales or sum of sales in each month?
@mcanbolat
@mcanbolat 2 года назад
You will use the sum of the sales in each month but you will average the sums (assuming you have multiple years) to find your seasonality indices.
@danelleduardorapozoramirez1030
@danelleduardorapozoramirez1030 3 года назад
Hi Mustafa. How would you forecast sales when there was an extreme value in the historical data and it's unlikely for that event to happen again. For example, sales completely stopped (went to 0) in april due to government lockdown because of covid. We believe that for next year that won't happen again. I think maybe that changing the real value of April (0) equal to march's vaule is a reasonable approach. Thanks in advance, great video Edit: what do you think about excel's forecast function using exponential triple smoothing? Is this a similar method?
@mcanbolat
@mcanbolat 3 года назад
Hi Danell, you can put the previous year’s April value or the value of March as you suggested. You can also use the median or mean sales value. Excel’s forecasting tool is using a different approach and it may or may not be superior to the model here for different datasets. You need to check and compare the error measures. The advantage of constructing your own model is that you don’t use a black box approach like Excel’s and you may be able to generate further insights on your dataset.
@danelleduardorapozoramirez1030
@danelleduardorapozoramirez1030 3 года назад
@@mcanbolat great. Thanks
@danelleduardorapozoramirez1030
@danelleduardorapozoramirez1030 3 года назад
@@mcanbolat another question. If we were to make a risk analysis using this forecast (and i say this because the projection here is made based on how the number of corolla searches changes over time), how would you calculate the probability that the corrolla searches fall between a range? From the graph alone i can tell that the searches somehow fall into a normal distribution.
@mcanbolat
@mcanbolat 3 года назад
You then just create a frequency chart-histogram for all values ignoring the time periods. The histogram will give you an idea about the probabilities for certain ranges.
@manhtuannguyen7375
@manhtuannguyen7375 3 года назад
Hi @@mcanbolat, regarding Danell's question about extreme value during Covid. Can we add a factor called pandemic in the model and use binary coding (1 if the month is in Covid time and 0 if not)? I assume it might help explain the jump or dip of sales or consumption during the pandemic but not sure if there is any flaw with that approach
@littlecreator4838
@littlecreator4838 2 года назад
Hi Mustafa. When there's a seasonality, we usually use triple exponential method.how is this different and what advantages does this have over triple exponential
@mcanbolat
@mcanbolat 2 года назад
Holt-Winters (the triple exponential smoothing) may or may not be better than this method and it really depends on the dataset. The regression approach gives you a better understanding of average differences between seasons and it also quantifies the trend. I personally prefer the regression approach because of its interpretability.
@dayamathishetty7225
@dayamathishetty7225 3 года назад
Sir how to use it for large number of materials ,company where I'm working has more than 5000 spares to forecast and analyse please help if u can . Thank u.
@mcanbolat
@mcanbolat 3 года назад
If all those parts have some kind of a monthly seasonality then you can implement the following method: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE--WK0Db7hY2U.html But the best way is to learn more about different forecasting methods and move to R or Python.
@dayamathishetty7225
@dayamathishetty7225 3 года назад
@@mcanbolat Thank u for ur help , Yea I'm planning to learn python till then was in need of some method to forecast n analyse.
@yaramostafa6319
@yaramostafa6319 3 месяца назад
Hello, does it matter which month we choose as baseline? Does it affect the results? Also, I dont get why you chose the average of the month and not the total demand per month. Also, for November the p-value is greater than 0.05, does it mean that November doesn't have a significant effect on December baseline? Do we need to remove it from the variables then? Because in other forms of variables, the insignificant variable on the "y" is usually removed. Thanks!
@mcanbolat
@mcanbolat 3 месяца назад
Hi, 1. No, any month could be a baseline. 2. Average is used because some months have more occurences than others. 3. All of those month dummy variables are part of a single variable. You don’t remove them. It means that November sales are not significantly different from December sales.
@yaramostafa6319
@yaramostafa6319 3 месяца назад
Thank you so much! Well explained.@@mcanbolat
@Ruben-mx8gk
@Ruben-mx8gk 3 года назад
Hello, how can I forecast seasonality and trend with daily data. I'm working on forecasting daily deposits and want to know day by day the forecast
@mcanbolat
@mcanbolat 3 года назад
Your seasons will be days instead of months. You cannot use the regression model in Excel because there will be 364 dummies and Excel won’t let you use more than I believe 18 variables. An alternative could be using R or check my other video on seasonality and trend forecasting model.
@estellanyimba4729
@estellanyimba4729 Год назад
Hi Dr. I noticed the November coefficient has an insignificant p value. What does this mean? Can we remove November from our data on this basis?
@mcanbolat
@mcanbolat Год назад
Hi, it means that November is not significantly different from the baseline month (the month that we did not have a coefficient for). You won’t remove it as it is part of one variable (month).
@estellanyimba4729
@estellanyimba4729 Год назад
@@mcanbolat Thank you. So if i got you, we keep it in the equation even if the coefficient is insignificant? what would be the implication of dropping it as one of the variables
@xinyuetang9332
@xinyuetang9332 3 года назад
Hello Mustafa, I don’t understand why you see Dec as baseline and choose not to put Dec as a variable. Could you please explain it in more detail?
@mcanbolat
@mcanbolat 3 года назад
This is a typical way of modeling categorical variables. Think it in this way: if you have a gender variable you would not need to create two columns one for female and one for male. You will use a single column with 1 being female or male and 0 otherwise. If you code 1 as female, the intercept value will give you the mean for male. See also: stats.idre.ucla.edu/other/mult-pkg/faq/general/faqwhat-is-dummy-coding/
@xinyuetang9332
@xinyuetang9332 3 года назад
@@mcanbolat Thank you so much, it’s very helpful!!
@littlecreator4838
@littlecreator4838 2 года назад
Sir would you also let me know,how do I adjust my forecast in case of positive bias. I am looking for the formula pls
@mcanbolat
@mcanbolat 2 года назад
Regression based models should provide unbiased estimations. If you are using another model, you can find the average bias (average of all forecast errors) and then simply subtract it from each forecast value.
@littlecreator4838
@littlecreator4838 2 года назад
Thank you for the the explanation. It was very useful. Sir it would also be very beneficial if you make videos on exploring different time series models in Python pls.
@gabrielrochasouza8796
@gabrielrochasouza8796 2 года назад
Thanks for the video Dr. Mustafa. I tried using the MMULT formula like you showed here, but receive a 0 amount for my forecast. Do you know what might be causing that to happen? Thank you
@mcanbolat
@mcanbolat 2 года назад
I am not sure what the issue is but first make sure that you use Ctrl+Shift+Enter for the MMULT formula. If it still does not work, you can put both arrays in the same way (both horizontal) and use SUMPRODUCT(), or you can multiply them one by one :)
@davidlangridge2246
@davidlangridge2246 3 года назад
Do you not encounter problems of multicolinearity by not checking the box that runs constant = 0?
@mcanbolat
@mcanbolat 3 года назад
Good question. Because we use (12-1) dummy variables to code 12 months, we do not need to remove the intercept. If we had 12 dummies we would have to remove the intercept.
@davidlangridge2246
@davidlangridge2246 3 года назад
Dear @@mcanbolat , thank you your response, is much appreciated. So do we interpret the coefficients as deviations from the average values of the period of focus i.e. the month that you do not create a dummy for?
@mcanbolat
@mcanbolat 3 года назад
Yes, but only if their p-values are significant.
@user-ve2qs4pg1m
@user-ve2qs4pg1m 9 месяцев назад
That work is fantastic Dr. Mustafa,!! It was quite helpful. In one of your videos (Seasonality and Trend Forecasting), you forecasted using (Intercept & Slope functions) for a similar data set (ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-RoKGvvMJAME.html). However, when I applied both methods-(Intercept & Slope functions) and (The regression analysis tool as shown in this video-I got different results (the calculated intercept value is slightly different). My question is why the results are different, and which one should we use? Note: I applied both methods on the same dataset (Corolla).
@mcanbolat
@mcanbolat 9 месяцев назад
If it is a slight difference that is okay. You may have not included the first data point, or other minor errors.
Далее
Creating a dummy variable for regression
7:24
Просмотров 314 тыс.
What is Time Series Analysis?
7:29
Просмотров 173 тыс.
Excel Multiple Regression
6:33
Просмотров 147 тыс.
Forecasting in Excel using Linear Regression
12:44
Просмотров 264 тыс.