@@returnexitsuccess Isn't it the same with his pH example? The pH scale is logarithmic, not linear, and it measures the concentration of hydrogen ions. So I got the feeling that it wasn't really the best example for what he was trying to say.
@@returnexitsuccess I am so furious with you because you taught me that Kelvin is not referred to with degrees. I hate this. I want to write 3° K. But because of you I won't. I'll write 3 K instead. Thanks a lot for ruining my day.
I'm a bit dissapointed that intro didn't continue for the whole "what is data?" video, with Mike holding up different things and asking "is this data?" for up to 12 minutes
When the four data types were enumerated, we should say that the third, the interval data, allow for two main operations: 1.differences (or delta): 30 degrees minus 20 degrees = 10 degrees is a meaningful calculation; 2. Ratios of such differences. E.g. (50-30)/(30-20)=2 is meaningful in the way that it takes twice caloric energy to change a body’s temperature from 30 to 50 degrees than to change it from 20 to 30
4:47 I think we CAN tell if a pH is double another one. A pH of 7 is exactly 10 times higher than a pH of 6 (i.e. the molar concentration of a hidrogen solution is 10 times higher).
Also.. about degrees Celsius, sure you can't dirrectly say that 100 is 2 times hotter than 50, but if you convert them to kelvin, you get 373 and 323, so indirrectly you can say that 100 degrees Celsius is 15% hotter than 50 right?
As a data analyst, a lot of people get really caught up in r and python. They are great tools for sure, but, if you have access to a database then nothing beats good old fashioned sql for sorting, cleansing and transforming and quick analysis.
Except sql and big data may be difficult to combine if there is more data than SQL is designed for, in those cases where your data Is stored in something like hadoop, you may not be able to produce a schema that allows you to utilize SQL.
@@HenrikDalsager fair point, my point is that no all data analysis involves such massive data sets and when discussing data analysis for new people sql should be mentioned also.
Isn't pH a ratio type but on a log scale. Yes, there is no "zero", but you can calculate what is double a pH value, or more easily what is ten times a given pH value.
This is two videos. Up to 5:24 you just can that and have it as a separate video. It's weird having been in computing and never having had it explained this way to me before. They didn't even teach this on my computer science degree. Before I say this video data to me was just anything in a format that enables it to be processed by machine. Because of that though, I'm suspicious: Does data really always fall neatly into one of these categories? Colours, for example, seem to fall into more than one: Yes, you can have nominal red, green, blue, cyan, magenta, yellow, white, but you can also have them stored as RGB values, or a HSV/HSB values. You can have it represented on a CIE graph, and you can have it as radiation frequencies. In the real world, does data really always fall neatly into one of the jars?
I rewinded video and I think it's correctly placed in "interval data" category. It can't be used for ratios directly, though. If you divide pH 1 by pH 0, you get infinity - nonsensical answer. In order to get meaningful ratio, you must convert it to concentration of H+, then you get 0.1. However, you are now working with DIFFERENT numbers, which are now on ratio scale - their zero is "true".
Hexanitrobenzene that argument doesn’t hold. I could just as easily say if you divide 1 child by 0 children you an undefined result. That doesn’t mean it isn’t type of ratio data. The rational numbers themselves are ratios of integers and they also break if you try and have a ratio of anything over 0.
I think the point is that you can't use pH for ratio directly, you must convert it to concentration of H+ ions. The same for degrees Celsius - there is a well known relationship between Celsius and Kelvin scales, however, degrees Celsius can't be used for ratios because their zero is "not true".
If you have a kind of interval data like pH, you can't say that pH 10 is twice as much as pH 5, ok. But pH is a kind of data that can be converted to ratio data (since pH = -log([H+])). So, in a sense, since you can't convert easily with a formula, pH itself might not be rational, but you could rationalize it with just a couple more steps.
Data analysis in Excel is generally not done because it has lots of quirks once you go deeply into the formula's, Microsoft has included a lot of bugs for the sole purpose of being backwards compatible. Things like dates and references introduce mistakes that are often hard to spot.
Does ratio data necessarily have a linear scale? That is, is 2 necessarily half of 4? Some measures are expressed as a logarithm, such as the Richter scale, but having a zero that means there's none, so in this case Richter 6 is 10X Richter 5. Of course a logarithm can always be expanded to a unitary value, so my question is more about practicality.
Isn't all interval data also ratio? For example if you subtract the left most value in the interval representation to get the zero? Or is the problem here that you don't always know the left most variable, since you only know a sample that represents the underlying truth. I was just confused since he said that degrees Celsius is interval data, while it can be easily translated to Kelvin that is ratio data.
It seems like the line between a nominal attribute and an ordinal one is a little blurry. I'd argue that you can order weather types by how conducive they are to outdoor activity (e.g., sunny > overcast > rainy).
I realize you used temperature merely as an example of the interval data set type, but this particular measurement can be converted to a ratio data type (by converting C or F to K). Would there be any rational (pun intended) reason to do this?
Are examples like this standardised or something? In my uni Databases course I made a decision tree for whether or not tennis would be played based on information like weather...
I don't think it's standardized, it's more like use the most basic algorithm that solves the problem for you and call it a day. In your case a decision tree solved your problem, and you called it a day.
You actually can determine what "double the pH" of something is, since pH is actually just the negative base ten logarithm of the concentration of hydronium ions. Edit: tl;dr it's ogarithmic, so double the pH is equal to pH*log2
I'd also discern discrete and continuous data (for the lack of a better word). Discrete data: can take only certain values like Yes, No, Maybe ( 😁 ), integers. It makes no sense to calculate an arithmetic mean, aka average. Frequency counting is OK. A median is also appropriate. A typical misuse of actually discrete data is when they publish that the typical family has 2.3 children. WHAT? This includes nominal, ordinal, and interval data. Continuous data: I'd put the the data you mentioned as "ratio" in this group.