Hello all, ###the above code requires some changes, this is the modified code df_cat = df.select_dtypes(include=['object']) # Select categorical columns df_num = df.select_dtypes(include=['number']) # Select numeric columns # Convert categorical variables to numerical using one-hot encoding df_cat_encoded = pd.get_dummies(df_cat, drop_first=True) # Apply one-hot encoding # Identify boolean columns and convert them to numeric boolean_cols = df_cat_encoded.columns[df_cat_encoded.dtypes == 'bool'] df_cat_encoded[boolean_cols] = df_cat_encoded[boolean_cols].astype(int) # Concatenate the numerical and encoded categorical columns df_ml = pd.concat([df_num, df_cat_encoded], axis=1) # Concatenate along columns (axis=1) # Display the first few rows of the combined DataFrame df_ml.head(2)
I tried below code df = data.apply(lambda x: pd.factorize(x)[0]) and it worked it converted all the catagorial values to numerical and returned a new dataset.