Тёмный

Time Series Analysis using Python| ARIMA & SARIMAX Model Implementation | Stationarity Handling 

Learnerea
Подписаться 18 тыс.
Просмотров 29 тыс.
50% 1

"What is Time Series Analysis", "How to Make Time Series Forecasting Model ARIMA or SARIMAX in Python", "What is Stationarity in Time Series Analysis and How to Reduce it in it", "What is ACF, PACF in Time Series Analysis"... if you have any of this kind of question and what to have the understanding from beginner level then you are going to have all these concepts clarified in this vide.
You may also like to watch -
Time Series Playlist - • Time Series
Pandas all in one - • Python Pandas Complete...
Pandas Full Playlist - • Python Pandas Tutorial...
Numpy Full Playlist - • NumPy
Matplotlib Full Playlist- • Python Matplotlib Tuto...
Seaborn Full Playlist - • Seaborn Beginner to Pr...
You can find the code file here - github.com/LEA...
Tags -
Time Series Analysis,
Time Series Modelling,
Components of Time Series,
Trending Time Series,
Cyclic Time Series,
Seasonal Time Series,
#DataScience #TimeSeries #PyhonProgramming #Python #learnerea

Опубликовано:

 

20 окт 2024

Поделиться:

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист
Посмотреть позже
Комментарии : 100   
@iustinatorul7579
@iustinatorul7579 Год назад
One of the best ARIMA implementation tutorials I have seen. I’m a bit frustrated I found it after I had used ARIMA for a project. I can’t even tell you how much time I had wasted going online and on forums, trying to understand how it works. But hey, now that I learned it the hard way it better be sticking. 😂 Appreciate it!
@learnerea
@learnerea Год назад
Glad it helped!
@cvrbcheppali8214
@cvrbcheppali8214 10 месяцев назад
This is one of the best video on Timeseries in youtube .Well Explained.Content is very nice.
@learnerea
@learnerea 10 месяцев назад
Glad you liked it
@rajaganesh3462
@rajaganesh3462 Год назад
I have come across many blogs and videos to understand the time series process, but I didn't get a clear picture. However, this video gave me a clear understanding of the process. Really great work! Much appreciated.
@learnerea
@learnerea Год назад
Glad it was helpful!
@fayezullah655
@fayezullah655 6 месяцев назад
one of the best video i have ever seen base on the time series in yt. Thanks for making it.
@learnerea
@learnerea 5 месяцев назад
Glad it was helpful
@Beanzmai
@Beanzmai 2 месяца назад
Incredible video, thank you! I kept trying to train my model with the Differenced data and was not getting good results but I caught my error because of this video.
@learnerea
@learnerea Месяц назад
Glad it helped!
@oladayoojekunle1732
@oladayoojekunle1732 Год назад
You really did justice to this topic. Very well done!
@learnerea
@learnerea Год назад
thank you very much
@oladayoojekunle1732
@oladayoojekunle1732 Год назад
@@learnerea. Please, can you make a video on how to use the transformed data especially one gotten using log, sqrt and shift. I have been trying to figure that out. The area that got me confused is how to transform the data back to the original format. Thank you
@rishabhpandey3609
@rishabhpandey3609 6 месяцев назад
Its really a crazy explanation. I would recommend this in my org, Jio. Keep it up man. God bless you!
@learnerea
@learnerea 5 месяцев назад
Thank you, I will
@priyankamore4030
@priyankamore4030 Месяц назад
Really appreciate this video !!👍
@pepsibrandambassador
@pepsibrandambassador 5 месяцев назад
you are great! helped me with my project last minute thanks for the video!!
@learnerea
@learnerea 5 месяцев назад
Glad I could help!
@melainetape
@melainetape 6 месяцев назад
So informative. I do not see a relation between the transformation (Log&Sqrt&Shift) which makes the data stationary and the ARIMA model you build. I'm so confused at this step. I tried with my data and noted that the ShiftDiff transformation makes my data stationary but when it comes to building the model, it does not fit well. Thank in advance.
@nothing_to_love
@nothing_to_love 3 месяца назад
Thanks for this amazing VDO!!!
@learnerea
@learnerea 2 месяца назад
Glad it was helpful!!
@asanteka.2403
@asanteka.2403 18 дней назад
Thanks very much for this video, really helpful. However, I have a question, the ARIMAX can be used to implement panel dataset right? do you have a tutorial on the implementation of an ARIMAX?
@erinbai8510
@erinbai8510 2 месяца назад
Thanks for the video. There is a mior mistake in ADF I noticed is that you cannot accept the null hypothesis and you can only reject the null hypothesis.
@surendrabera2878
@surendrabera2878 7 месяцев назад
Your content is too good. I am not able to understand why yiu have such a low views on this video. One suggesgion please make the thumnail little bit eye catchy.
@learnerea
@learnerea 2 месяца назад
Noted
@vanikmalhotra6586
@vanikmalhotra6586 5 месяцев назад
Basic Question...Why did we run the model on original set and towards the end you mentioned on running model on altered data set basically diff/square root ?
@RavinderKumar-m7u1j
@RavinderKumar-m7u1j 10 месяцев назад
Hi, Content is very good and very well explained. thanks for sharing it. Can you please help me understand that we have tried to identify the stationarity but did not use it in modelling. and even identifying the stationarity was not concluded. we did not get desired results.
@learnerea
@learnerea 10 месяцев назад
Thank you very much for watching it. Yes, that was primarily because it was a beginner level and hence we did not want to spend a lot of time in reverting it back. Certainly we will make another one where we conclude and utilize the stationarity.
@borisgisagara
@borisgisagara 6 месяцев назад
as you said you were trying to keep it to the beginner's level that's why it's kind of more understandable to the smallest degree possible, except you just got it wrong about the model, it's not ARIMA model that is working bad, it's you trying to predict a whole range of values with the same training data. it means, it'd work well on the first few values but not for all. you have to use the walk forward variation, that is basically to update you training set each time you predict a new value, Thats my idea. and thank you for the good video.
@RishiKahtri
@RishiKahtri Год назад
Thank you so much for this vedio, studying since last 3 years, taken some expensive courses, this is the best explanation, kept me motivated to explore and learn throughout the vedio...let us know how we can support you to make more learning vedio thanks.
@learnerea
@learnerea Год назад
You are most welcome, and I'm glad that it was helpful.. keep watching
@queenx3572
@queenx3572 7 месяцев назад
If you use the time shift method, d will be the interval for the shift. What happens if you use any other method like the log or square root? What will d be?
@KeshavKarki-c2v
@KeshavKarki-c2v 8 месяцев назад
great tutor thanks for the video ❤❤
@learnerea
@learnerea 8 месяцев назад
Glad you liked it!
@madhuripatel5250
@madhuripatel5250 8 месяцев назад
if I am dealing with time series data with hourly frequency data collected for 2 years. What should I take as lag (shift) value.
@razinust2579
@razinust2579 4 месяца назад
brother your work is extremely helpful ,brother i looked for the rolling statistics video link but couldn't find it please share it then thanks in anticipation
@julianatorressanchez5250
@julianatorressanchez5250 5 месяцев назад
You are amazing. I love the way you explain. Can you do the same for multidimensional data sets?
@learnerea
@learnerea 2 месяца назад
Yes, soon
@sellamimohamedkhaled4527
@sellamimohamedkhaled4527 Год назад
really good work👌, keep it up
@learnerea
@learnerea Год назад
Thanks a lot 😊
@Vizia219
@Vizia219 7 месяцев назад
Hi, I was using your tutorial to learn how to implement ARIMA models. I then went about and implemented my own with some of my own data that I'm using for a school project. However, while my model fit my data very well, my forecasts are flat and they're strange. Could you help me in any way?
@PriyeshM-yj8wi
@PriyeshM-yj8wi 6 месяцев назад
i have sales data consisting of time period and other features including different schemes as features, almost 7-8 those are active on some months so basically they are categorial variables containing 0 or 1. Should i go ahead with Armia for forecasting, if yes then how to consider those categorical variable
@thegroup3261
@thegroup3261 9 месяцев назад
the best tutorials bro
@learnerea
@learnerea 9 месяцев назад
Glad it was helpful
@erinbai8510
@erinbai8510 2 месяца назад
I have a doubt... at 54 mins when you are using ARIMA model and you started with the original data. Then why did you transform the data to stationary data since you used the original data instead?Thank you so much.
@zaedgtr6910
@zaedgtr6910 Месяц назад
exactly,,, i have this question as well? Because we were taught to fit the model on transformed data? Please reply it would be very helpful.
@timetraveller7513
@timetraveller7513 9 месяцев назад
Can't thank you enough 🙏
@learnerea
@learnerea 8 месяцев назад
Glad it was helpful
@saurabharbal2684
@saurabharbal2684 Год назад
Hello sir, I don't know whats your mistake But i got desired results using arima model at time 1:13;45 Instead of the line at the bottom i got desired results. And I followed all things teached by you.
@saurabharbal2684
@saurabharbal2684 Год назад
Using arima model i got 43 as my mean squared error
@learnerea
@learnerea Год назад
Super
@scientensity
@scientensity Год назад
In a sarima model while doing an analysis i found that for d=0,D=1(as i did seasonal differencing one and no non-seasonal differencing) prediction is fitting whole data except initial 22 values(predicting almost 0 values for initial 22 values) which is the seasonality of my data. can you explain why is this happening? I hope you got my question
@learnerea
@learnerea Год назад
Assuming you are using the same data as in video, please share your code at learnerea.edu@gmail.com so that we could have a view.. and guide you more specifically.. include the data as well if it's different from the video
@MrDevnandan
@MrDevnandan 6 месяцев назад
Did you mistakenly plot the PACF of airP['arimaPred'] at time stamp - 1:15:52 ? I am not sure why you would plot PACF of predicted values. 😕
@srinivasreddy8134
@srinivasreddy8134 6 месяцев назад
For that airP['12diff'] we have to take, as it is seasonal difference
@mattsamelson4975
@mattsamelson4975 10 месяцев назад
I have a situation where I can make reasonable training and predictions with the original (non-stationary) data. When I transform the data, I am able to successfully make it stationary BUT it loses all autocorrelation so predictions are junk. Have you ever seen this? I have found some things on line that says this is possible but it depends very much on the characteristics of the time series.
@learnerea
@learnerea 10 месяцев назад
Yes, the situation you're describing is not uncommon in time series analysis, and it's often a delicate balance to strike between achieving stationarity and preserving important characteristics like autocorrelation. When you difference or transform a time series to achieve stationarity, you are essentially altering the original data to make it more amenable to modeling. However, as you've observed, too aggressive a transformation can result in the loss of autocorrelation, which is crucial for capturing temporal dependencies in the data. Here are a few considerations and potential approaches to handle this situation: Selective Transformation: Instead of applying a uniform transformation to the entire time series, consider selectively applying transformations to specific components. For example, you might difference the data only where it's necessary or apply different transformations to different seasonal components. Partial Transformation: Rather than making the entire time series stationary, consider transforming only certain parts of it. For instance, you might apply differencing or another transformation to the trend component while leaving the seasonal component untouched. Different Models for Different Components: If your time series exhibits both trend and seasonality, you might consider using models that can handle each component separately. Seasonal decomposition of time series (STL) is one such approach where the time series is decomposed into trend, seasonal, and residual components, and each can be modeled independently. Advanced Models: Explore advanced models that can handle non-stationary data more effectively. Long Short-Term Memory (LSTM) networks and other recurrent neural networks (RNNs) are known for their ability to capture temporal dependencies in data. Ensemble Approaches: Combine predictions from models trained on the original data and models trained on the transformed data. Ensemble methods can sometimes capture the strengths of different models. Grid Search and Cross-Validation: Systematically experiment with different combinations of transformations and models. Use grid search and cross-validation to evaluate the performance of various configurations and find the optimal solution. It's worth noting that the ideal approach can vary depending on the specific characteristics of your time series data. Experimentation and a deep understanding of the data's behavior are key. If possible, consider consulting with domain experts or seeking feedback from colleagues who have experience with similar time series patterns. Remember that achieving stationarity is a means to an end (better model performance), and the goal is to strike a balance that preserves the essential characteristics of the data while making it amenable to modeling.
@mattsamelson4975
@mattsamelson4975 10 месяцев назад
Thanks for your detailed reply. How do you conduct a partial transformation? for example, do I difference only a section of the source data that I’m training the model on? How would I even then reverse transform predictions?
@Shiva-zn4nz
@Shiva-zn4nz Год назад
This was so informative. Thank you a bunch! I understood time series. Do you have similar videos for regressions? Thank you! Subscribed
@learnerea
@learnerea Год назад
Glad it was helpful. the below one is on linear regression - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-IigoyVON0eM.html here is a problem we solved using the regression and other best fit models - ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-2YAheiIHNzI.html I recommend you to have a look at the whole datascience playlist - ru-vid.com/group/PL4GjoPPG4VqOmyh7hQ730evtLaz04LwSf
@Shiva-zn4nz
@Shiva-zn4nz Год назад
@@learnerea Thank you so much. Love you guys!
@ismailhosni7760
@ismailhosni7760 Год назад
Hellow Dr thanks a lot for sharing the information and teanch us . I have a little question with your permission the question is : if we estimate our model "ARIMA" and found that there is autocorolation between the riseduals the the model ...... how can we fix this problem ? thanks again 🤗🙏🙏🧡❤
@learnerea
@learnerea Год назад
There are several potential approaches you can take if you find autocorrelation in the residuals of your ARIMA model. Here are a few options you could consider: Adding additional AR or MA terms to the model: If the autocorrelation is due to a pattern that has not been captured by the current model, adding additional terms may help to capture this pattern and improve model performance. Differencing the data: If the autocorrelation is due to a trend in the data, differencing the data may help to remove this trend and improve model performance. Using a different model: If the ARIMA model is not suitable for the data, you may need to consider using a different model altogether. For example, a seasonal ARIMA (SARIMA) model may be more appropriate for data with seasonal patterns. Modeling the residuals: If none of the above approaches work, you can try modeling the residuals as a separate time series. This can help to capture any remaining patterns in the data that are not accounted for by the primary model.
@ismailhosni7760
@ismailhosni7760 Год назад
@@learnerea 🥰🥰🥰🥰🥰❤❤🧡💛 thanks a lot 🙏🙏
@Cs11-CanhNau
@Cs11-CanhNau 5 месяцев назад
The original data series is not a stationary series yet, I see you have done some way to convert it to a stationary series. But why do you use the initial data when training the model when it is not a stationary sequence?
@NutritionandMetabolism-uq5kf
@NutritionandMetabolism-uq5kf 5 месяцев назад
I have the same query as well. I can understand the section on checking on stationarity, but I don't see how that's getting incorporated into the subsequent training and model fitting. If the original dataset can be used for training rather than the transformed dataset, what's the use of determining if the data is stationary or not? Did I miss something ? Otherwise, excellent video, clearly explained. Would be interested to see videos on Time Series Analysis using other models such as XGBoost, Prophet. Thank you sir.
@esranurgunay1776
@esranurgunay1776 Год назад
Hello sir, in the 35:35 , ai didnt get the same result with you when i execute the line of df.head()
@learnerea
@learnerea Год назад
>> You may like to revisit the code, you have created >> You can put the code here as well, we will analyze the diff. and can help
@arnabmodak3377
@arnabmodak3377 8 месяцев назад
ARIMA Model Building starts here: 56:47
@esranurgunay1776
@esranurgunay1776 Год назад
if we were not use the stationarity stuffs, why we calculated them?
@learnerea
@learnerea Год назад
Being the Data Scientist, you gotta explore all the posibbilities.. as explained in the video as well... the decision was taken basis on analysis where it was observed that it won't perform better comparatively and it has also been suggested, that we will try making another video where we utilize the stationary data to see the how it performs.. As a learner, your question make sense.. keep asking the questions for clarity
@JuanCarlosLópez-w6i
@JuanCarlosLópez-w6i 10 месяцев назад
i was wondering the same
@apsaraG-k7r
@apsaraG-k7r Год назад
What does diff(12) mean
@learnerea
@learnerea Год назад
diff computes the difference of a set of values, essentially subtracting each value from the subsequent value in an array or list, if can provide the timestamp here, will be able to give you the specific guidence
@177-l2v
@177-l2v Год назад
Suppose month attribute is missing you only have year attribute in that case how can u make data stationary,can you explain please I mean u only have year and passenger attribute in that case how to make the data stationary.Please reply
@learnerea
@learnerea Год назад
Stationarity can be on year basis as well.. When you're dealing with time series data that only has a yearly frequency, the approach to making the data stationary is similar to what you'd do with more frequent data, but with some specifics to consider. Visualizing the Data: Start by plotting the data. This will give you an idea of the overall trend, seasonality, and variance. Since the data is yearly, you might not observe any distinct seasonality. python code - import matplotlib.pyplot as plt plt.plot(year, passenger) plt.xlabel('Year') plt.ylabel('Passenger') plt.title('Yearly Passenger Count') plt.show() Differencing: A common approach to making time series data stationary is by differencing the data. Differencing helps to remove trends in the data. You subtract the previous year's observation from the current year's observation. python code- passenger_diff = passenger.diff().dropna() After differencing, plot the data again to see if it appears more stationary. Checking for Stationarity: The Augmented Dickey-Fuller test is commonly used to check the stationarity of a time series. python code - from statsmodels.tsa.stattools import adfuller result = adfuller(passenger_diff) print('ADF Statistic:', result[0]) print('p-value:', result[1]) A low p-value (typically ≤ 0.05) indicates that the time series is stationary. Transformations: If differencing isn't enough, consider other transformations like: Log transformation: To stabilize variance. python code import numpy as np passenger_log = np.log(passenger) Rolling means: To smooth out short-term fluctuations and highlight longer-term trends. python code - rolling_mean = passenger.rolling(window=5).mean() # 5-year window as an example passenger_detrended = passenger - rolling_mean passenger_detrended.dropna(inplace=True) Decomposition: Even though the data is yearly, if you suspect any seasonality or a strong trend, you can use decomposition. The Seasonal Decomposition of Time Series (STL) from the statsmodels library can be useful. python code - from statsmodels.tsa.seasonal import STL stl = STL(passenger, seasonal=13) result = stl.fit() detrended = result.trend deseasonalized = result.seasonal You can then work with the residuals from the decomposition process, which should ideally be stationary.
@sanjaisrao484
@sanjaisrao484 6 месяцев назад
thanks
@learnerea
@learnerea 5 месяцев назад
Welcome
@أبويزيد-ض5ي
@أبويزيد-ض5ي Год назад
Hi, you did not upload a video where stationery data was used.
@learnerea
@learnerea Год назад
I think, not yet..
@abhilashpatel1361
@abhilashpatel1361 Год назад
Hi can you plz help me to understand why lag for pacf is 20
@learnerea
@learnerea Год назад
It will be great if you can share the time stamp where you spot this point
@JuanCarlosLópez-w6i
@JuanCarlosLópez-w6i 10 месяцев назад
Hi, i cannot find the data set, could you help me please! =D
@learnerea
@learnerea 10 месяцев назад
the dataset is part of seaborn library.. you can just run the code - import seaborn as sns df = sns.load_dataset('flights') you can also download the notebook github link provided in the description
@harisri-p4d
@harisri-p4d Год назад
great
@learnerea
@learnerea Год назад
thank you very much for watching
@2380raj
@2380raj 9 месяцев назад
👌
@amazonamazon6510
@amazonamazon6510 11 месяцев назад
How to approach forecasting with he lockdown data?
@learnerea
@learnerea 11 месяцев назад
That's an excellent problem statement to choose, little bit of more detail which you might have provided is - >> what sort of model you want to develop >> what is the main purpose/scope of the model etc. lets assume that you want to build a credit risk model and the data which you are taking under consideration, includes the covid period as well. (Before I start, make sure that the data is in relatively balanced quantity & period). Below are the approach which you can undertake - Data Collection: Gather historical credit risk data, including loan performance, defaults, delinquencies, and relevant economic indicators. Include data specific to the COVID-19 period, such as unemployment rates, government stimulus programs, and financial relief measures. Data Preprocessing: Clean and preprocess the data by addressing missing values, outliers, and data inconsistencies. Create relevant features, such as lagged values of credit risk indicators and economic variables, to capture time dependencies. Exploratory Data Analysis (EDA): Perform EDA to understand the data's characteristics and relationships. Explore trends, seasonality, and patterns, paying specific attention to changes during the COVID-19 period. Define the Target Variable: Define the credit risk metric you want to predict, such as default probability or loan delinquency. Feature Selection: Identify relevant features that may influence credit risk. This includes economic indicators, loan characteristics, borrower information, and external factors. Time Series Decomposition: Decompose the time series data to understand underlying trends, seasonality, and residuals, considering the effects of COVID-19. Create a Historical Train-Test Split: Split the data into training and testing sets, ensuring that the testing set includes the COVID-19 period. Model Selection: Choose a suitable forecasting model. In this case, time series models like ARIMA, SARIMA, or Prophet may be appropriate. Consider using machine learning models like Gradient Boosting, Random Forest, or LSTM if you have sufficient data. Model Training: Train the selected model on the historical data, excluding the testing period. Model Validation: Evaluate the model's performance using the testing data, specifically during the COVID-19 period. Use appropriate evaluation metrics, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), or classification metrics for binary outcomes. Model Interpretation: Interpret the model's predictions to understand which factors contribute to credit risk during the COVID-19 period. Feature Importance: Analyze feature importance to identify key drivers of credit risk during the pandemic. Model Refinement: Fine-tune the model and hyperparameters if the initial model's performance is suboptimal. Scenario Analysis: Conduct scenario analysis to assess credit risk under different economic conditions related to COVID-19, such as varying levels of unemployment or government interventions. Model Deployment: Deploy the trained model for ongoing credit risk assessment and predictions. Monitoring and Feedback Loop: Continuously monitor the model's performance and retrain it as new data becomes available. Regulatory Compliance: Ensure that your credit risk model complies with regulatory requirements and standards relevant to your industry. Documentation: Document the entire modeling process, including data sources, preprocessing steps, model selection, and evaluation metrics. Keep in mind that the unique challenges posed by the COVID-19 pandemic may require you to adapt your model and data sources to reflect changing economic conditions and government policies. Regularly update and refine your credit risk prediction model to account for these dynamics.
@anghulingalolop3630
@anghulingalolop3630 9 месяцев назад
can you the forecast this?
@siddharthakar9369
@siddharthakar9369 4 месяца назад
Where is the dataset ?
@Devra380
@Devra380 Год назад
But sir the new statsmodels seems to have different functions
@learnerea
@learnerea Год назад
you can mention the function name which has been used in the video from statsmodel but you do not find them in the model now.. we will try to find and help you with closest alternative function if that doesn't exist
@Devra380
@Devra380 Год назад
​@@learnerea​​@learnerea can you make a new video on implementation of arima.. On share market dataset or weather dataset
@RadwickMuvhu
@RadwickMuvhu 7 месяцев назад
my data is the form of year, week
@saniyashahin-zp6oz
@saniyashahin-zp6oz 11 месяцев назад
share your python notebook sir @Learnerea
@learnerea
@learnerea 11 месяцев назад
Here you go - github.com/LEARNEREA/Data_Science/blob/main/Scripts/time_series_air_passengers.py
@meronika1400
@meronika1400 Год назад
Can you share this jupyter notebook with me? via mail
@learnerea
@learnerea Год назад
Hi Meronika, you can find that using - file name - time_series_air_passengers.py url - github.com/LEARNEREA/Data_Science/tree/main/Scripts
@micahdelaurentis6551
@micahdelaurentis6551 5 месяцев назад
the D parameter is the number of differences you take on your data, which is not what you said. This is as basic as it gets man, come on
Далее
ML Was Hard Until I Learned These 5 Secrets!
13:11
Просмотров 327 тыс.
181 - Multivariate time series forecasting using LSTM
22:40