This is extraordinary in every way. I recently read a similar book, and it was extraordinary in every way. "The Art of Meaningful Relationships in the 21st Century" by Leo Flint
Hi! First of all thank you for this great tutorial! I have a question about train-test split while using lag/window features. When you apply lag/window features on the whole dataset and then make the split, doesn't it lead to data leakage - since you're using test data's information on train dataset? I understand that in this case, an unseen 30 days of data from test was used in train with lag features, am I wrong?
First of all, that was well explained project. However, I do have a problem with my code. Line 45 of your notebook, l am trying to run it in my notebook o am receiving the following error: Expected 2D array, got 1D array instead: array=[6.0e-02 6.2e+01 4.4e+01]. Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample. HOW CAN I FIX THIS??
Just noticed that at longer forecast times, a lag appears to develop in the model. Is this normal or an issue with my coding? For example, at a forecast time of 9 months instead of 1 month, the MSE is quite high. However, when I shift the predictions back 9 months it matches up much better with what actually happened.
Thank you. There are lot of examples like this, but they are not useful. You can't reliably predict tommorow temperature by using previous days. You must assess weather patterns, for that you need all possible variables you can get for your inputs (features), like solar radiation, geopotential heights, wind directions on various levels, humidity on levels, temperature on levels, convergence, divergence, ideally surface and soil temperatures and moistures, and so on. Then you need to find which of those have impact on temperature by checking correlations, and remove all other not-useful inputs. Then you might get really somewhere...
Firstly, I would like to express my sincere gratitude for the invaluable tutorial you provided. It has been incredibly helpful in our coding journey so far. However, while implementing the concepts from the tutorial, we encountered a small issue related to the following code snippet: weather["month_day_max"] = weather["month_max"] / weather["t_max"] weather["max_min"] = weather["t_max"] / weather["t_min"] Unfortunately, we noticed that some values in our dataset for t_min or t_max are zero, resulting in division by zero and subsequently producing infinite values. As a consequence, we encounter errors during the execution of our code later I would greatly appreciate your guidance on how to overcome this problem. Are there any alternative approaches or modifications we can make to the code in order to avoid these errors? Thank you once again for your time and assistance. I eagerly await your response.
I know i can be a little late, but you could use np.where to add a condition to ensure that the denominator is not zero. If the denominator is zero, you can set the value as np.nan and then fill it properly later
@@RafeuLopo Figured out how to check for 0 using .where(): core_weather["month_day_max"] = core_weather["month_max"] / core_weather["temp_max"].where(core_weather["temp_max"] != 0) core_weather.loc[core_weather["month_day_max"].isnull(), "month_day_max"] = core_weather["month_max"] / 0.1 core_weather["max_min"] = core_weather["temp_max"] / core_weather["temp_min"].where(core_weather["temp_min"] != 0) core_weather.loc[core_weather["max_min"].isnull(), "max_min"] = core_weather["temp_max"] / 0.1 Is dividing by 0.1 the "proper" result though?
Hello Thanks for this video. I.m getting an error on line 66/67 saying "TypeError: incompatible index of inserted column with frame index. Here is my line of code core_weather["monthly_avg"] = core_weather["temp_max"].groupby(core_weather.index.month).apply(lambda x: x.expanding(1).mean()) If it makes a difference, I'm running this from vscode. Everything has worked fine so far except I didnt get the plots
hi , first thanks for this tutorial , but i've some difficulties to have the same csv as you on my notebook . In mine there's no date column , STATION NAME , ACMH etc . Is it possible for you to help me please ?
Hey Dataquest ! I have a question :) ! I followed your video and it was pretty straightforward, well explained. Buuuuut, i'm trying to adapt this to a personnal case, for my studies. I took an other dataset, with 3 values ( Temperature / Humidity / Wind ), and i " randomized them. By random, i mean Temperature is always between 18 and 25, and Humidity is Temperature + 10. When i get my predictions, i'm trying to predict my Temperature, they are like all at 19.5. So when i plot, i got nearly a line. Any idea why this happens ? I tought with just a Humidity = Temperature + 10 and those kinds of relation between my values, i could actually get a decent prediction range, but it looks like i'm not understanding something. Thank you for the answer :) !
Machine learning models can't predict if the values are random. Tomorrow's temperature would need to be correlated with today's temperate to be able to make future predictions. I would check the correlations between what you're using to predict, and what you're trying to predict.
Great video! One question I have is about how to make a forecast using this. Right now we are just able to see the models prediction for the test time frame and see how accurate it is. For example, my dataset ends 07-01-22, and so the last value predicted by the model is for June 30th. What code should I use to let the model make a forecast for 07-02?
So if you want to make a prediction for tomorrow, just feed in the data for today. So if the max temp today was 50, and the min temp was 40, you can feed that into the algorithm. The prediction you get will be for the next day. So if you're using data for 7-1-2022 to generate the predictions, your prediction will be for 7-2-2022.
@@Dataquestio Oh ok, that makes sense. So if I remove the line coreweather = coreweather.iloc[:-1,:].copy(), I will then get the forecast for the next day?
@@lakshya6909 Yes I did, train = df.loc['1950-01-01':'2000-12-01'] test = df.loc['2001-01-01':] reg.fit(train[predictors], train['target']) predictions = reg.predict(test[predictors]) To generate a prediction, you use the code above. Lmk if you have any questions.
Hi Dataquest, may I ask how to predict the future max and min temperature , examples my data from 1990 to 2021, i want to get the prediction from 2030 -2060 , how is it ya? Is there example from the video?
Hey, I am new to programming and Ml. Infact, this was my first project. Can anyone please tell me where I should input data for today, so as to obtain predictions for tomorrow? Basically I understood how we trained the model and all, but how do I now use it to obtain results?
Hello Dataquest...I have a question. I want to predict 90 days of temperature and rain....Dou you have the script to predict series for many days for this models.? Regards Friend
Hi Magno - I don't have the code, but you can modify this code to make predictions for several days out. You just have to change the target being predicted. -Vik
excellent video, it's my first contact with machine learning. I have a doubt: I work with meteorological data with 10 years of data, and I would like to reconstruct the time series of the past, in about 20 years (the climatological normal), and then make the forecast for the next years. it would be possible? what would be the best approach? currently I work with hourly wind speed data in brazil. thank you. regards
Hi Vikki - the video shows how to predict the weather for the next day. This is in the second half of the video, when we're training a machine learning algorithm.
I am getting the following error. I am not sure where it is coming from or how to fix it: ValueError: Input X contains infinity or a value too large for dtype('float64').
Hi Vik, thanks for this video ! I used the dataset from JKF Airport and wanted to keep snow and snow_depth in. However, towards the end of the project when I write: error, combined = create_predictions(predictors, core_weather, reg) # I get the following error ValueError Traceback (most recent call last) /var/folders/d7/q_fznsr95_97r6lp_mx_vp640000gn/T/ipykernel_57500/1727150671.py in ----> 1 error, combined = create_predictions(predictors, core_weather, reg) ... and then... ValueError: Input contains NaN, infinity or a value too large for dtype('float64'). Any ideas how to solve this? I think I have some large numbers somewhere - everything up until this point is fine
You can use pd.isnan and pd.isnull to filter the dataframe and check for missing or invalid data. For very large values, you can filter to check for numbers above a certain valuem You can also use the fillna method to replace any missing data.
Did you eventually resolve this? I had the same issue. I looked for min and max values for the new predictors. max (core_weather['month_max']) min (core_weather['month_max']) max (core_weather['month_day_max']) min (core_weather['month_day_max']) max (core_weather['max_min']) #inf min (core_weather['max_min']) Then, changed the formulation of min_max from a ratio to a difference (makes more sense to me that way): core_weather["max_min"] = core_weather["temp_max"] - core_weather["temp_min"] Problem solved.