Hi Hutsons Hacks. Could you please help with a couple of challenges that I met using prediction in caret for elastic nets. 1. how can I share the model without sharing the original training dataset? When I see the structure of the model, it seems that it always uses the original dataset even if it predicts in another dataset. 2. How can I apply the model for the datasets of another structure? So far, predict function demands the same number of predictors as in the initial dataset and not more not less. For me it seems it would be reasonable just to have the predictors that have weight not equal to 0. 3. How can I apply the predict for the testing dataset where one if the chosen in the training dataset predictors are missed. 4. why the predict function for glm does not produce the same predicted values compared to the sum of the each predictor multiplied by its weight.
Thank you so much for creating this package and for providing this video! I am wondering if you could explain (or point me toward some readings) for your point at 6:09 that a statistically significant intercept connotes omitted variable bias? I haven't heard this before and would love to learn more. -Todd
So, you might want to host the docker image on and AWS ECS or EC2 container, so that other people could access your model from the web through their browser. That way anyone can access the model even if they don't have R installed on their computer.
hii buddy.....your code working nice. but when i trying to download 300+ images code file running continusaly . i think google page have not 300+ images. but how many images have should be downloaded
Yeah that is the problem with Selenium Google knows that you are running automation and blocks it if you try too many images. You haven't done anything wrong, just the general issue with Google.
I'm trying to use google lens to download images similar to the image I have. How should I change the code to auto download images in google lens ? I would love any help.
Hello, How should I change the code to auto download images in google lens ? Can you give an idea? I used the link as in google image but it does not download.
Just one question: You did the pre-processing (scaling) and balancing before splitting the dataset. Is this correct? I read that this process needs to be done after splitting the dataset to avoid leaks in the test dataset.
Awesome video, works well! Just want to comment here the errors that you faced for my reference and others in the future 38:18 3 errors line 81 - path needs to end with '/' player_path = './images/nottingham_forest/' line 91 - forgot parameter 'url' urls = get_images_from_google(wd, 0.2, TOTAL_NUMBER_OF_EXAMPLES, url_current) line 94-97 - arguments outside of bracket download_image(down_path=f'images/nottingham_forest/{lbl}/', url=url, file_name=str(idx+1)+'.jpeg', verbose=True)
Great tutorial - worked for me on Mac! Just one question - is there a way to edit the code such that the photos downloaded are of a higher resolution (i.e., allowing the photo to load for a bit then downloading the image rather than the thumbnail?)
Good question. You would have to actually go to the website where the image is hosted and extract the right <img> tag in HTML. This would make the code much less performant.
@@hutsons-hacks3668 I think opening the preview of the image and downloading it from there also works. It should be full resolution without the need to actually access the image host.
Looking at the error it is to do with how your JSON is being passed. Can you examine your JSON string and make sure it matches the pattern needed to pass a request to the API?
Thank you, this is a magnificient addition to my reports!. A question tho, since im migrating from caret to tidymodels: Wit tidymodels do you need to specify something for a multinomial classifier?
Hey, Thank you for this, but your videos are over stimulating and not easy to understand or implement because you continuously skip through important steps and beginners can not follow you.
very basic question but could you give an example of how the API might be used in a practical scenario (i.e. on a job or even at school). my background is in academic research so seeing the power of these types of things can be difficult sometimes! Thank you for the great video!
You train a model on something you want to be able to pass unseen data to predict. For example I created an emergency department predictor, based on patient variables, when this was trained we needed a way to pass production data to it. A good way to do this is an API that is platform agnostic, such as JSON. Meaning this API could be used by R, Python, JavaScript, etc.
Original intention was for odds. But if you can find a way to add the effects and estimates, please consider forking the package, making the additions and you would then be a contributor.
Hi, thanks very useful, 1) Can you please clarify if the MLDataR could be used when your outcome is a continuous variable (e.g., age, birth weight, etc) rather than a categorical variable like you have shown 2) can MLDataR could be used to visualize the outcomes like predicted vs actual as well as ROC?
Yes, you would fit a regression to the problem, such as ElasticNet. The process would be sort of the same in TidyModels, but you would need to set the task to regression.
Just going to second that default variable labels are quite difficult to work with. I've used glm from the stats package for the data input into odds_plot and the variables are including both the dataframe and vector name. If there's a way to make the default variable name the vector only, that would be a huge help.
@@hutsons-hacks3668 Hi there! Just following up about coefficient names from glm() outputs. So if a factored level is supplied (i.e. insurance with Private and Government sub-levels), the output becomes the name of the vector and the sub-level (i.e. insurancePrivate, insuranceGovernment). To my understanding, there's no way to currently rename those variables (i.e. inputting a character vector list c("Private Insurance", "Government Insurance"). Similar to OddsPlotty, the texreg package creates coefficient plots that allow you to rename these variables with custom.coef.names and custom.coef.map commands. Is there any way a similar command to rename automatically generated coefficient variables could be implemented in OddsPlottty?
Please note the ConfusionTableR package has changed. Please see how to use: cran.r-project.org/web/packages/ConfusionTableR/vignettes/ConfusionTableR.html
It is null for multiclass models. You need to refer to the McNemar-Browker version. However, this is a wrapper for caret's confusion matrix, so that does not have this implemented for multiple classification. I hope this answers your questions?
Awesome stuff! Looking forward to annealing/lasso and more!! One thing i would like to request is if you can allow handling missing data? For example, using the rfImpute call? Also, is it possible to combine the highly correlated features rather than dropping them completely?
I'm enjoying the pace and delivery of these python videos, thanks. Just to mention that you have mis-represented what the set union does, it doesn't require the same number of members in each set and doesn't really have any relationship to indexing unless I am missing some nuance - it's hard to get everything right on live video and it got me thinking, but just mentioning in case anyone is confused.
Thanks alot for sharing...am really looking out for a tutorial on model stacking, preferably from the stack model package...please kindly put out something...cheers!
Another helpful video, it's great to follow along - I'm using a Jupyter notebook. Gary is improvising on live video , so some minor inconsistencies are inevitable, trouble-shooting the code helps cement your learning. My tips is to keep an eye on the variable names in the enumeration sections at the end ;-) e.g. make teams_list contain all the countries, not ('England', England' .... )
Thanks for another useful tutorial Gary. Just to show I'm paying attention, you cast 'reality' as an integer in line 93 (around time 10:30 in the video) which negates the need to then fix the format in the print command as far as I can see. Got me thinking though so all good.
Three questions: (1) Is there, not a risk when doing feature reductions using a random forest method, and then afterwards using the reduced feature set as the basis for your random forest model? (2) Can this package be used in some way on categorical/factor features? (3) How does it integrate with the #TidyModels framework. Because in the recipe package one can add specified pre-processing recipe steps for accounting for features and highly correlated features. Could you please explain how it differs compared to the preprocessing steps available in recipe()? Thank you!
Not really no, because you are removing the redundant features using mean decrease in accuracy prior to fitting your master model. This was you have a reduced set that will make any other model trained afterwards work more quickly and have better accuracy due to the right features being in the model.
Recipes could be used with this. You would need to apply these steps before model training. RFE is not available at the moment in Recipes, as the resampling would cause s slow down on that part of the pipe. As far as I know there are steps for zero variance removal and other types, such as resampling. I have made this to work with other tools like caret and mlr3 in R.
Hey this is really cool! How does this package perform in regression problems? I would like to reduce my number of predictiors for my sub-models. I am using #TidyModels machine learning with stacks() to create an ensemble model. But I have originally >40 predictor features