No video :(

#5: Scikit-learn 3: Preprocessing 3: Scaling a sparse matrix, CSR, CSC format

learndataa

Подписаться 3,8 тыс.

Просмотров 5 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

22 авг 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 8

@johnschultz4079 11 месяцев назад

The previously defined ‘scaler’ (for MaxAbsScaler()) was used for standard scaler transformation. The scaler for StandardScaler() was defined as ‘scalar’ Note the typo

@learndataa 11 месяцев назад

Thank you. Updated video description for 11:53 timepoint.

@sinatorkzadeh7872 Год назад

it seems like you use indices regarding column and index for each element but this not the way CSC concept is.

@learndataa Год назад

Thank you for the comment. Trying to understand the suggestion, below is description on Wikipedia: "CSC is similar to CSR except that values are read first by column, a row index is stored for each value, and column pointers are stored." Link: en.wikipedia.org/wiki/Sparse_matrix

@sridharans9400 10 месяцев назад

Hi Sir, can we say that StandardScalar() method is more suitable for dense matrix. As the average will be representative of the datapoints. Whereas in sparce data due to null values .. using mean to scale will be misleading... Is my thought process right ?

@learndataa 9 месяцев назад

While sklearn.preprocessing.StandardScaler() does offer options to use it on sparse matrices, choosing the right scaler would depend on your data and the requirements of your specific machine learning problem "scikit-learn: z = (x - u) / s This scaler can also be applied to sparse CSR or CSC matrices by passing with_mean=False to avoid breaking the sparsity structure of the data." Having said that, your thought process is on the right track! StandardScaler (not StandardScalar) can be more suitable for dense matrices as it assumes that the data is normally distributed and uses the mean and standard deviation for scaling. This works well when you have a significant amount of data and the null or missing values are minimal. However, in sparse data with many null values, using the mean for scaling could be misleading because the null values are treated as zeros, affecting the mean and potentially skewing the scaling process. Hope it helps! Thanks for watching.