Two-step Cluster Analysis in SPSS

James Gaskin

Подписаться 42 тыс.

Просмотров 204 тыс.

50% 1

Видео Поделиться Скачать Добавить в

Опубликовано:

30 окт 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии

@Gaskination 3 года назад

Here's a fun pet project I've been working on: udreamed.com/. It is a dream analytics app. Here is the RU-vid channel where we post a new video almost three times per week: ru-vid.com/show-UCiujxblFduQz8V4xHjMzyzQ Also available on iOS: apps.apple.com/us/app/udreamed/id1054428074 And Android: play.google.com/store/apps/details?id=com.unconsciouscognitioninc.unconsciouscognition&hl=en Check it out! Thanks!

@duran099 10 лет назад

Thank you! I enjoyed the back and forth of your problem shooting at the start for which variables to use. Made it more real, and gave some context from a theory perspective.

@blackchallice 11 лет назад

WOW, you have explained cluster analysis very clearly. This is the first time I'm learning CA and I totally get it. Thank you!

@vide0gameCaster 9 лет назад

Dude you don't understand how this vid helped me for my statistic exam. I aced my test thanks to you! You just gain a subscriber!

@lefkiospaik 10 лет назад

Great presentation! Moreover the "not suitable" variables you chose in the beginning, really helped a lot to understand more on the cluster analysis. Thanks

@yuriveneziani8029 6 лет назад

Amazing explanation... clear and direct! Thank you!

@samirsarsamss 10 лет назад

Many thanks dear James Gaskin for this helpful video, please go ahead with other different aspects or even tools.

@ЕленаПономарёва-л5х6ъ 4 года назад

thank you for awesome explanation! wish you good luck! I've found all your videos very very very helpful

@krismatthews7550 9 лет назад

You seriously just saved my Quantitative Analysis project :] THANK YOU!

@JohnParavantis 5 лет назад

If I may, at 9:01 I would like to correct your reference to the boxplot: the middle line does indeed represent the median, but the left and right edges of the box lie at the first and third quantile respectively. So, rather than representing one standard deviation below and above the mean, the box represents the middle 50% of the observations. Thank you very much for the video, very lucid explanation of swamping variables, still very useful in 2019!

@Gaskination 5 лет назад

Thanks!

@TulioMaia 12 лет назад

Thank you so much! I'm a starter on SPSS. I'm a R user, but i'm gonna start SPSS from now! Thanks again!

@Xirukah 11 лет назад

You're a great guy!! I study SPSS in College in three levels.. Introduction to Data Analysis, Univariate Data Analysis and Multi-Variate Data Analysis for 3rd level. In this moment i'm on 3rd and this process is really usefull! Thank You!!

@vshapoval 11 лет назад

I do not have questions, but I found your video extremely helpful with very good explanation So I only wanted to say thank you. Your video was a great help. =)

@Gaskination 12 лет назад

Funny you should ask! I was just considering doing this yesterday. I will probably do a K-means cluster, and also show how to segment the data and explore clusters for sub-populations. This is definitely on my to do list.

@Gaskination 11 лет назад

Not a stupid question because I had to look up the answer :) The SPSS help manual says that the two-step cluster analysis assumes normally distributed data for all continuous variables, but that tests have shown it to be robust enough to handle non-normal data fairly well.

@talhelmt 11 лет назад

Thanks! I appreciate the time you put into making this.

@snailbby6664 7 лет назад

"These are the ones you'll probably punish by making them managers" 😂

@ildilovasz2982 4 года назад

Thank you for this video, very clear and it helped me write my thesis.

@AlbertGavino 9 лет назад

great simple video on 2 step clustering (great for categorical variables or binary ones) with some continuous variables.But I like 2 step since it creates it's own clusters of which I don't have to specify (unlike in K-means)

@Gaskination 11 лет назад

Look at the sig value. If it is less than 0.05, then it is the groups are significantly different for that variables of comparison. If it is poor quality, then you might try a three factor model. Not sure you can rely on the cluster groups when they are poor. This means that the membership assignment was inconsistent based on the indicators used for the clustering. e.g., sometimes males went into cluster 1, sometimes in cluster 2.

@koenovisch 11 лет назад

Thank you for your reaction! I will continue looking for it!

@tekonen 9 лет назад

Thanks for sharing your knowledge!

@mxm001 9 лет назад

Thank you SO much, James. This was very helpful.

@petradubajovamarinakova9268 10 лет назад

Your video helped me. Thank you very much :)

@MarcRodrigues10 6 лет назад

Thank you! This video helped me a lot, especially with the results analysis.

@Gaskination 11 лет назад

You can certainly try k-means. It just depends on what your research intentions are. I actually prefer k-means over two-step. I just learned two-step first, so that's what I made the video for. I should probably make one for k-means sometime...

@TheCopginger 12 лет назад

That's great indee! Well, I also have some ideas on how you could make it better from learner point of view. 1. Explaining why use certain/specific methodology for clustering 2. Producing it from basic to advanced methodology 3. Probably using data across industry/sector I dont know how much time you have to spend on these and you would want to, however I can provide you data which will enhance your quality of analysis. (and off course your self marketing value)

@chrisnahm 7 лет назад

Really enjoyed this and was very helpful. Thank you!

@DaDonnyZhang 10 лет назад

Great video! Thank you so much!

@Gaskination 11 лет назад

Did you double click it? You have to double click it to make it show up.

@emindeger. 4 года назад

Hi thank you very much for this video series. I have a question, I would appreciate it if you answer. Do we need to normalize the data in spss?

@Gaskination 11 лет назад

1. No references come to mind. When you run comparisons later on between clusters, if one cluster is much larger than another, then this will affect the critical ratio (t, f, or z statistic) since critical ratios are sensitive to sample size. Thus, working with similar sizes is ideal when making comparisons. 2. SPSS makes n+1 groups, where the extra 1 is those who did not fit in anywhere else. To figure out which clusters are which, look at the cluster output number in the output window.

@Jemoeder86 10 лет назад

Very informative! Thanks

@Gaskination 12 лет назад

Thanks for the ideas. I just do these when the need arises or when I have the time. I'll probably have some time to do a couple next week. I have some data that has grouping variables, so no need to send me yours. Thank you though.

@zhexiongtao2167 11 лет назад

really interesting and helpful! Hope you can also make one for K-Means

@sureshpatel3992 3 года назад

Hello James, can Two-step Cluster Analysis handle mixed variable type? Eg. some variables that are output of factor analysis (that will have negative values too), and some binary variables?

@Gaskination 3 года назад

Yes. The two-step method can handle all types of variables. The only thing you need to watch out for is highly skewed or kurtote variables, or discrete (categorical/nominal) variable without adequate representation from each group/category.

@sureshpatel3992 3 года назад

@@Gaskination thanks so much for your reply, this would really help!

@polisherci 8 лет назад

Hey, can you run a regression clustered by a certain variable on SPSS? like the regress ... cluster (.. ) command in stata?

@Gaskination 8 лет назад

I'm not sure. I haven't used STATA much. You can run a cluster analysis, and then use those clusters as grouping variables when running regressions.

@Gaskination 11 лет назад

That is what I meant, but those are undesirable sample sizes. You might also look at indicator importance to see if one variable is swamping out the others. If so, you might consider removing it. Or you can try K-means clustering... I haven't made any video for that yet...

@sticky924 10 лет назад

Thank you for this video, it is very helpful

@bayankhalifa1543 10 лет назад

Very helpful, Prof. I did clustering for 2 continuous variables and 4 clusters, but how can I represent them in a high-high, high-low, low-high, and low-low matrix? Also, the two variables are highly correlated, will it be bad for clustering? Thanks.

@Gaskination 10 лет назад

1. How to represent them: If you click on one of the button options in the table of clusters, you will see their distributions along the scale of measurement (low/high). The button looks like a distribution. This should help you represent them. 2. If they are highly correlated, then it might just be difficult to find low-high and high-low since they are probably mostly low-low and high-high.

@bayankhalifa1543 10 лет назад

James Gaskin thanks, but which cluster is the high-high quarter in the matrix, which one is the high-low quarter, etc.? thanks

@Gaskination 10 лет назад

bayan khalifa Just look at the distributions it shows you. If the bulk of the distribution is on the right, then it is high (assuming your scale went from low to high), if it is on the left, then it is low.

@joseedupont2409 10 лет назад

Very helpful! What version of SPSS are you using?

@Gaskination 10 лет назад

Probably v20 or 21 in this video. Maybe 19...

@juliaworldwide 8 лет назад

Thank you very much for that !

@Gaskination 11 лет назад

I have not. Best of luck. But, basically it is like an R-squared analysis. It shows how much of the variance is being explained by each indicator.

@Gaskination 11 лет назад

Glad to be helpful. Hope you'll subscribe and tell your friends. :)

@spss-for-research6518 9 лет назад

I have a dumb problem and I wonder if someone could help me. The SPSS shows the cluster comparisons only for the inputs, but NOT for the descriptive variables. It just shows a message: "the cluster comparison view encountered a problem and cannot display correctly" or something like that. Why? I can't figure out.

@Gaskination 9 лет назад

spss-for-research I'm not sure. It may have something to do with the variables included. Try removing one variable at a time to see if you can identify which one is causing the problem. If it isn't that, then it may be a conflict in one of the libraries being utilized to run the analysis. If that is the case, then you might need to reinstall SPSS, or you might need to update your java or .NET version (not sure which one SPSS uses).

@OPaixao13 8 лет назад

Hi James How can I get the Cubic Criterion Values at different number of clusters under consideration?? I think it's also a good way to justify why X number of clusters instead of Y, right??

@Gaskination 8 лет назад

I'm not sure. I've never heard of the cubic criterion. Best of luck to you.

@azianwacko 8 лет назад

Hello again James, can you explain how the analysis actually creates the clusters? I've tried using it for categorical variables and I'm not fully understanding just how it determines the clusters. Thank you

@Gaskination 8 лет назад

Here are some resources to help you understand 2 step cluster analysis better: 1. www.ibm.com/support/knowledgecenter/SSLVMB_21.0.0/com.ibm.spss.statistics.help/idh_twostep_main.htm 2. www.spss.ch/upload/1122644952_The%20SPSS%20TwoStep%20Cluster%20Component.pdf 3. www.ryerson.ca/~rmichon/mkt700/SPSS/TwoStep%20Cluster%20Analysis.htm 4. ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-2Lz2bU-sBGA.html

@jeromeboissel2793 3 года назад

Dear James, What references have you used on this occasion ? Besides, what would be most appropriate : K-means or Two-steps. In the paper I am working on, I have used both sets of analysis and, if the number of clusters remains the same, the number of respondants in each cluster differs quite significantly depending on which technique I use. Any tips ?

@Gaskination 3 года назад

I'm not much of an expert on cluster analysis. I've just used the Hair et al 2010 book. As for which approach to use, I think two-step is considered the most useful and valid, since t combines hierchical and non-hierarchical methods.

@ΑλεξάνδραΘεοφίλη-ο3ε 6 лет назад

HELP! I am using the spss v.17 and I don't get the model index ... what is going wrong?

@Gaskination 6 лет назад

I'm not sure what you mean by model index. Do you mean you are not getting the silhouette index? I'm not sure what might be causing that either way though... Sorry about that.

@DisconnectHack 9 лет назад

Hi James, did you say "swarming variable" or "swapping variable"? I couldn't figure it out, and I have tried looking for definitions for both, only found "swapping variable" for computer science, were you talking about the same ?

@Gaskination 9 лет назад

+DisconnectHack Swamping. I don't know what the technical term would be (or if there is one).

@DisconnectHack 9 лет назад

+James Gaskin Thanks James, it appears there isn't one.

@tomh3675 10 лет назад

Thanks for the video, do you have an example of doing a cluster analysis as a way of illustrating factor analysis/factor scores?

@Gaskination 10 лет назад

No, but I do have several videos about how to do factor analysis and extract factor scores.

@nihonbunka 7 лет назад

Is it possible to analyse cluster NOT around central concepts like intelligence or years on the job but upon family relationship (binary relationship closeness in a network with the absence of commonalities, as is the case in real families).

@Gaskination 7 лет назад

That's an interesting idea, but I don't know how to do it in a two-step. You might be able to do it with multiple alignment algorithms, but I'm not sure if SPSS has those...

@nihonbunka 7 лет назад

Thank you very much indeed. I have found a partial solution in the software here socnetv.org/downloads which has a network analysis network community detection algorithm which can be used on the correlation matrix produced by SPSS factor analysis. Others have had the idea before journals.plos.org/plosone/article?id=10.1371/journal.pone.0051558 using a different community detection algorithm Full statement of problem and partial solution www.talkstats.com/showthread.php/69145-Family-Relationship-version-of-Factor-analysis-for-Japanese-Groups?p=199672&highlight=#post199672

@Gaskination 7 лет назад

cool! Thanks!

@rajeshpandit3634 9 лет назад

Great video. I just want to check whether the variables you put both continuous and categorical, do you standardize them? Standardize I mean Z Normal variables as you are putting scale, binary, categorical variables together

@Gaskination 9 лет назад

+Rajesh Pandit SPSS automatically standardizes all continuous variables when doing a 2-step cluster analysis. You can see this in the options area when doing the 2-step.

@ntaalya 10 лет назад

Thank You very much!

@Thanh-ThaoTPham 7 лет назад

Hi James, thanks for your valuable sharing. However, is there any source for the acceptable size of smallest cluster and threshold of ratio of sizes? Thanks in advance.

@Gaskination 7 лет назад

I'm not sure. I'm really not an expert on cluster analysis. Those numbers just "feel" right, which I realize is not very scientific of me. I guess they feel right because they are practically useful - i.e., clusters of those sizes are usable in subsequent analyses and cluster ratios of that proportion break the data up into roughly equivalent groups.

@Thanh-ThaoTPham 7 лет назад

Thanks so much for your reply. Anw, I really love your tutorial series ^^

@olofreichenberg6885 11 лет назад

Very helpful!

@Gaskination 12 лет назад

I don't yet, but people keep asking for one, so I should probably do one.

@Zopzuita 11 лет назад

I can't doublecklick since the model viewer doesn't show up it all. It writes the clusters in the column but that's it - even though I activated the option...Any ideas what could be wrong? Thanks a lot in advance!

@yifanli4312 5 лет назад

Thank you! This vedio is very helpful!

@hem135 6 лет назад

Hi James - This video is very helpful, thank you! Within the model viewer, I can see the average silhouette statistic for the cluster result. My understanding is this number is the average fit across item in the cluster. Is there a way to find the silhouette data for each item separately? For context, I'm using cluster analysis to identify exemplar scenarios for different types of behavior. I'm clustering scenarios based on participant ratings (e.g., this scenario represents X behavior, yes/no). I'd like to compare fit across a few different types of participant groups using an ANOVA of the silhouettes for each item. Thanks in advance!

@Gaskination 6 лет назад

If there is a way, I'm not sure how to do it.

@azianwacko 8 лет назад

Hello James, can you explain evaluation fields and whether something like a scale of mental health would go in there?

@Gaskination 8 лет назад

+Thomas Chan Evaluation fields are used to see differences in evaluation variables based on cluster membership. It is sort of like doing an ANOVA on those variables, using the cluster membership as the factoring variable. The evaluation variables will not be used to determine cluster membership.

@alfonspriessner8556 8 лет назад

Hi James! Very helpful video - you saved me a lot of time. :-) Unfortunately, I have two additional questions, and it would be great if you could help me. I am sure, you are the expert who can help me! 1) Lets assume SPSS program proposes 3 clusters based on a set of variables. What statistical tests are used for the selection of 3 clusters instead of 2 or 4 in the background? I read in some papers that e.g., likelihood-ratio (L2) and its p-value, the Bayesian Information Criterion (BIC) and the number of parameters (Npar) could be examples for these statistical tests (there are for sure others)? And if some of these tests are conducted by SPSS in the background, is there a way how I can create an output-chart of these statistical parameters in SPSS? In other words, since SPSS tells me 3 clusters, I would like to show why 3 clusters and not 4 based on a few statistical tests. 2) Lets assume we still have these 3 clusters from question 1 which were created based on a set of variables. But I have another variable (e.g., age) which I did not use for the cluster analysis. How (if there is any option in SPSS) can I calculate the mean of variable age for each of the 3 identified cluster and show it in an output table (best case for more than 1 additional variable). I hope you understand my questions. I would appreciate your help and guidance!! Thanks a lot in advance! Regards, Alfons

@Gaskination 8 лет назад

1. SPSS let's you choose the AIC or the BIC as the clustering criterion, or you can use the silhouette measure that shows in the output. The silhouette is considered fairly robust. You can force it to 2 or 4 clusters as well to see what the silhouette score is for those. 2. Watch this video at the 2:16 mark. It will show how to do this using the Output button.

@tayeenulhoque1637 9 лет назад

Can you please explain or suggest for likert sclae ordinal data which cluster analysis should apply ? Is it K-Means Cluster/Hierarchical/ Two step. Its it necessary to conduct CATPCA (categorical principal component analysis) prior to starting the cluster analysis, and can you please tell me after CATPCA how can I proceed for cluster analysis apparently the method. As I have four exogenous variable which contains 20 items.

@Gaskination 9 лет назад

Usually we would use factor analysis for this kind of data. However, if you want to do a cluster, then I would do the EFA first and generate factor scores for each construct. Then use these factor scores in a cluster analysis. 2-step or k-means each offer slightly different features and analyses, so you could try both.

@tayeenulhoque1637 9 лет назад

Thank you Mr. James.. i really appreciate your valuable comments

@nassimfard867 9 лет назад

tnx for the videos. Can you please tell me if a set of data can be clsutered only by one variable? and if yes is the two-step cluster more probable or the k-mean clustering? I want to categorize a set of data based on one variable in to three groups and i don't know how to define the cut-off or range for each categorie. I would be glad if you can help me

@Gaskination 9 лет назад

Nassim Fard If it is just one variable, then clustering algorithms won't help. If the variable is categorical, then just group them based on the category values. For example, if the variable is religion, then group them by which religion they affiliate with. If the variable is continous or ordinal, then make logical cutoff points into low, med, high.

@sugun1993 6 лет назад

Thank you for the quick tutorial. I am performing two step clustering on a data from a recent study but wants to somehow fit this new data in the clusters generated from past data. Kind of like supervised learning, but neither the coefficients of the model of past data is not available nor the data, unfortunately. Is there a way to solve this or is this case hopeless? p.s. To get the project done in time, without access to any tools, I tried to put the new records in clusters, manually, respecting the features/characteristics of the previously generated clusters. Since the time is my major constraint and the data is just 40 new entries, I have already performed it (could you give me some idea about my options to justify the job done this way). But I am just curious to know the right way.

@Gaskination 6 лет назад

If the new data is using the exact same variables as the original data, then you can simply add the new rows to the dataset and re-run the cluster analysis. That is the easiest way. If the new data is not using the same variables, then there is no statistical way to cluster them along the same lines.

@DrMMRaziq 12 лет назад

Good tutorial

@medosman23 9 лет назад

great video thank you

@adidash3247 10 лет назад

At 11:18, the coloumn "TSC_5282", what does the number represent?

@Gaskination 10 лет назад

It represents the cluster (group) that record belongs to.

@adidash3247 10 лет назад

Thank You...

@researcher53 11 лет назад

Thanks, very helpful

@SharonaTLevy-nl4dc 8 лет назад

thank you, very helpful

@kaykums5350 10 лет назад

very helpful and easy to understand. Can I use a multiple non-unique ID for Data Mining?

@Gaskination 10 лет назад

I don't think I understand your question. Do you mean you want to do datamining on a dataset that has multiple IDs that are the same? If so, then no, you should combine those rows that have the same ID (if they are actually the same case and not a different one with just a duplicate ID), or create unique IDs for unique cases.

@kaykums5350 10 лет назад

James Gaskin Thanks James for the prompt response. That answers my question. can I send u my sample data so you can guide me further.

@kaykums5350 10 лет назад

Kayode Kumapayi they are not totally the same case. some duplicates have different field records from the order in a given ID.

@Gaskination 10 лет назад

Kayode Kumapayi If you have issues you simply cannot resolve, I might be able to guide you a bit. I receive dozens of requests per day though, so please only email me if you are really stuck. Thanks!

@TheAce0 6 лет назад

You mention that when having SPSS determine clusters automatically, Euclidean distance measurement is more appropriate but when specifying the number of clusters, Log-likelihood is preferred. Could you perhaps elaborate on why this is the case? Would you know any papers that go into a bit of detail about this?

@Gaskination 6 лет назад

oooh, this has been a while. The literature I read at the time suggested these things, but I can't remember which articles and books I read, or what they had to say about it. Sorry about that. If cluster analysis was something I did more often, I would have a better answer for you. But I haven't done a cluster analysis again since making this video...

@TheAce0 6 лет назад

Ah, okay, fair enough. I'm dealing with cluster analysis right now and need to figure out which parameters are appropriate and why :)

@wassdepp1 8 лет назад

Thank you, It made my day

@thomasbulitta3817 8 лет назад

Hi James, Thank you for that Video. It was very helpful. Do you know what actually happens "inside" SPSS when you this "Two-Step-Cluster"? Which forms of clustering are used? Single Linkage and hierarchial cluster analysis?

@Gaskination 8 лет назад

+Thomas Bulitta It performs a hierarchical and a non-hierarchical step. I'm not sure which specific algorithms, but I bet the SPSS manual says.

@educationalconsultant9880 4 года назад

Can I use cluster analysis in step wise classification like first classify asymptomatic and symptomatic , then in asymptomatic classify in terms of symptoms? ??

@Gaskination 4 года назад

I think it should be possible. You could do the classification and save cluster membership number. Then, filter the dataset so that not all rows remain, but only remain those that are part of asymptomatic clusters. Then, cluster again to see if they cluster by symptom. Another route would be to just use evaluation variables in the two-step clustering. These variables aren't used to determine membership in clusters, but each cluster is evaluated post-hoc by symptoms.

@educationalconsultant9880 4 года назад

@@Gaskination Thank you very much for your reply

@123canuckfan 11 лет назад

God I wish you were my stats teacher!

@Byzantic 11 лет назад

I get 'predictor importance' instead of 'variable importance'. Is there a difference?

@Gaskination 11 лет назад

No difference

@mldsg72 9 лет назад

James, nice job, very well done! Do you mind to make a little comment about AIC and BIC on 2-step cluster?

@Gaskination 9 лет назад

Marcelo Gabriel I was not aware you could generate AIC and BIC in SPSS during a 2-step cluster analysis. I've gone back to it to fiddle with it, but I can't figure it out if it is possible.

@mldsg72 9 лет назад

James, thanks for your reply. At least on versions 20 and 22, you must check the "Clustering Criterion" by choosing BIC or AIC. I'm more inclined to consider AIC than BIC due to its characteristics. Your comment would be nice. Regards

@Gaskination 9 лет назад

Marcelo Gabriel Thanks for pointing me to that. I played with it and looked into it and it appears that the results are often the same (with my data), but that in general, AIC is preferred to BIC. Here is an informative explanation of why as well as some useful references: en.wikipedia.org/wiki/Akaike_information_criterion#Comparison_with_BIC

@gs19921 8 лет назад

Thank you for this video I have done 4 different kmeans clustering and I need a method that choose the best clusteranalyses.Can I do it with twosteps or another method?

@Gaskination 8 лет назад

+gs19921 Two step will provide a "fit" measure to let you know if the clustering solution was good. You can also examine the AIC (try to minimize it).

@koenovisch 11 лет назад

James, do you know a video in which the IPA (importance/performance analysis) is being explained? Have you made such a video?

@DrMMRaziq 11 лет назад

What if one of the item after applying post hoc shows a non significant p value e.g. you differentiate clusters on a variable, and then find that two of the clusters do not significantly differ on one item.

@JessicaRodrigues-wz3xo 7 лет назад

Hi! How can I choose variables that are significant to use on it? There´s a statistical test to help? I have a lot of variables and I wanna know how I should choose them, if it has a criteria.

@Gaskination 7 лет назад

Usually it is chosen theoretically, rather than statistically.

@JessicaRodrigues-wz3xo 7 лет назад

Thank you for responding! I have several variables to draw a social and demographic profile of my population. Theoretically all these variables are important, but when I do the analysis with all of them, the results are not good. In other versions of SPSS there was a cut in those variables, a critical value, but I do not know how to identify this in SPSS 22. Can you help me, please?

@Gaskination 7 лет назад

Jéssica Rodrigues you can look at the cluster quality or at the variable importance graph. These will give you indications of the overall value of the variables for clustering into groups.

@cynthiagallagher75 8 лет назад

Is here a video that provides more detail on interpreting the clusters themselves? It would be helpful to understand how the clusters are being selected and how the clusters are developed.

@Gaskination 8 лет назад

The only other two-step cluster analysis video that I have is part of the Rosen College SEM Boot Camp: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-2Lz2bU-sBGA.html

@shahzadfarid6446 7 лет назад

Sir, Please upload detail lectures on Optimal scaling in SPSS (i.e. MCA, CATPCA and non-linear canonical correlation). These lectures are not available on RU-vid. I searched in your channel , with the hope ... , but unfortunately ....

@Gaskination 7 лет назад

I have never done those, so I cannot make videos on them. Any time I learn a new analysis, I make a video for it. If I ever have occasion to do these, I'll make videos for them. Best of luck to you.

@harsin009 6 лет назад

Can these profiles really be used as a moderator in SEM analysis? Because I thought SEM only uses continuous variables since it analyzes relationship between multiple variables through regression analysis. For a while, I thought you were referring to Hierarchical Regression Analysis. Thank you!

@Gaskination 6 лет назад

It can be used as a multigroup moderator for multigroup analysis, which is a form of moderation.

@louizekahina9985 9 лет назад

Thank you for your video. I have a problem with my statistics. I run two factor analysis for two questions in my questionnaire. After that I run a cluster analysis with factor scores of the second factor analysis which i have already done. The first time i got 2 clusters, but I saw that in the column of cluster membership which automatically created by the system there was ( -1) as a cluter membership, i didn't understand why? I run other time the cluster analysis but this time i deleted the scores for the first factor analysis i did, i kept just factor scores for the second analysis i needed to run the cluster analysis, this time i got 3 clusters? My question is there any relation between the two factor analysis i did before? In my cluster i just use the score for one analysis? Why i got different results if the scores were just as variables and no interaction between them?

@Gaskination 9 лет назад

The -1 means it didn't fit in any cluster. I don't understand your other questions.

@louizekahina9985 9 лет назад

James Gaskin Thank you for your answer. For other questions i found why i had different results. I want to know if it's posible to explain more about clusters generated by using factor scores and not variables of our variable list. Thank you

@louizekahina9985 9 лет назад

Hello, if i have the ratio of sizes 3,05 can i keep the 3 cluster i got, or the size of clusters is not wel adjusted because this ratio is greater than 3. Thank you

@Gaskination 9 лет назад

Louize Kahina That ratio is fine. Also, as for using factor scores in cluster analysis, this is fine because the factor scores are just weighted averages based on the factor loadings. So, this is totally fine and requires no special interpretation.

@louizekahina9985 9 лет назад

James Gaskin Thank you. I asked for the interpretation of the clusters regarding to the two score factors i used. I didn't understand what means exaclty the medians for each factor? are these clusters' centers ?

@Jbrandalise 6 лет назад

Hi James, I'm doing a two-step cluster analysis and my ratio of size was nearly to 18.0. Is it something in literature talking about? Thank you so much!

@Gaskination 6 лет назад

I can't remember which literature talks about it. The concern is that you just want to make sure you have adequate representation from each group.

@Jbrandalise 6 лет назад

James Gaskin I have 20 observations and there is one cluster with 19 and another with just 1. Okay, it is China in this one, and my theme is international competitiveness. I think it’s fine, have some test that i can do to make sure that the clustering it’s great? Thank you again!

@Gaskination 6 лет назад

If you only have 20 responses, and all but 1 are part of a single cluster, then this is not a good cluster solution. You might try removing the nationality variable to see if that fixes the clustering.

@Jbrandalise 6 лет назад

James Gaskin I did this and resolved it. Thank you so much!

@Jbrandalise 6 лет назад

James Gaskin I did this and resolved it. Thank you so much!

@arieprabowo4675 7 лет назад

do u have installer for spss 13? two step cluster only can be operate in spss version 13 i guess. thx before

@Gaskination 7 лет назад

13? That is very old. SPSS is now on version 24. My version 24 runs the two step just fine. I don't have an installer though, as I'm not a licensed distributor.

@roxy629 9 лет назад

Awesome! So clear and informational :) James, what would be the major differences between cluster analysis and factor analysis? Is it the profiling aspect? Can CA do things that FA cannot? Thanks again!

@Gaskination 9 лет назад

roxy629 Cluster analysis clusters rows. Factor analysis "clusters" columns.

@roxy629 9 лет назад

James Gaskin ahhh!!! that's why it's called "profiling" makes so much sense thanks james :)

@jameszanzarelli9255 6 лет назад

is it possible to score new customers using an existing clustering model?

@Gaskination 6 лет назад

If you mean to assign them to an existing cluster, yes. You can do this with Multiple Discriminant Analysis. Here is a video on it: ru-vid.com/video/%D0%B2%D0%B8%D0%B4%D0%B5%D0%BE-70vGOdEvYaM.html

@jameszanzarelli9255 6 лет назад

Thanks James! :)

@kamalpreetrakhra8071 4 года назад

Can you recommend any books fo two step cluster analysis?

@Gaskination 4 года назад

I think I learned it from Hair et al 2010 Multivariate Data Analysis, but I'm not sure. It is not my primary methodology, so I'm not too familiar with the literature around it.

@kamalpreetrakhra8071 4 года назад

@@Gaskination thanks.

@mcole6234 11 лет назад

James, Very informative. You mention the need for over 30 in the smallest cluster and between 2-3 for the largest: smallest ratio. I am dong a Phd and wondered where these numbers came from. Do you have an academic reference(s) I could cite. Also, at the end of the video when you ran an ANOVA from the newly formed variables in SPSS. I ran different analysis, and never had more than 4 clusters but there were 5 new variables, all with uniformative names. How do I know which ones to use?

@larasgilangrahmany289 2 года назад

Hi, sir. I hope you are doing well and have a wonderful holiday and merry christmas sir! I want to ask a question related to steps of doing two step cluster : Do we have to use CF Tree first for PRECLUSTERING phase before doing the final clustering using BIC/AIC? I really hope you can answer me this time sir :') thank you so much

@Gaskination 2 года назад

I do not think so. You should be able to jump straight to BIC/AIC as long as the solution has some face validity.

@larasgilangrahmany289 2 года назад

@@Gaskination thank you! can i know what factors that effect some variable can contribute to create the cluster? im talking about the purple table in predictor importance

@Gaskination 2 года назад

@@larasgilangrahmany289 It is determined by the shared variance among all variables, and can be influenced strongly by discrete values, such as binary (e.g., single/married) or multinomial (e.g., age group: child, young adult, adult, middle-aged, post-middle-aged).

@larasgilangrahmany289 2 года назад

@@Gaskination Thank you so much, sir! :') thanks a lott

@larasgilangrahmany289 2 года назад

and it is also determined by how various the responses choose the options right? I observe, if all of the cluster choose almost the same option (ex: woman), its less than 0.5, but if each cluster choose different options (ex : woman and man), the value will be more than 0.5. Is it right?

@xiaoyanggong2006 8 лет назад

Thanks!

@MrNicks86 10 лет назад

Thanks for the great video - very useful! I was just wondering if you could explain (in a nutshell) the difference between this Two-Step cluster analysis and k-means? Thanks

@Gaskination 10 лет назад

The main difference is that two-step allows you to distinguish between categorical and continuous variables, and it processes them differently. Whereas k-means just treats them all the same. So, if you have categorical variables, two-step would be a more accurate clustering.

@MrNicks86 10 лет назад

Thanks for your reply. So with continuous data like domestic energy use, would k means be more appropriate? And is it right to say that k means treats each variable as independent to the next, which in the case of domestic energy use is not quite the case? Many thanks again!

@Gaskination 10 лет назад

Nicholas Samson Unfortunately, I'm not an expert in cluster analyses. So your question surpasses my immediate knowledge. I would just have to look it up. I know that there are some good documents and articles that discuss the differences between two-step and k-means. I just googled it. Best of luck to you.

@MrNicks86 10 лет назад

Thanks James!

@souksomphoneanothay1149 11 лет назад

good video

@olfabenarfa3790 10 лет назад

Very informative video and extremely helpful as usual. I have only one concern is that when I did it the first time it gave me 3 groups, I ran it again it gave me 2 groups,…I did it many times and I noticed that the results are not stable! How come that the same steps and same algorithm gave different results! Did anyone face this issue with the two steps cluster analysis? Thanks.

@Gaskination 10 лет назад

That is bizarre... I'm not sure what would be causing that. It should be the same every time I think.

@DrMMRaziq 12 лет назад

Great!

@TheCopginger 12 лет назад

By the way, I was performing cluster analysis based on your video. However, I have few questions to ask you 1. Is it possible to assign weightage to individual record while performing segmentation? 2. If there is already weightage available for individual record (based on other criterion) how to make use of that in the segmentation process?