Another video brought to you by BeardedDev, bringing you tutorials on Data Engineering, Business Intelligence, T-SQL Programming and Data Analysis.
In this video I talk through when DISTINCT is evaluated, sometimes I see people write queries with a unique identifier and then wonder why the results include duplicates (aside from the unique identifier, of course), understanding how data is processed is very important to writing accurate SQL queries.
I introduce Derived Tables to take the results from a query that produces unique rows using DISTINCT then add a unique identifier using ROW_NUMBER.
This video was created on SQL Server 2019 using SQL Server Management Studio but you can following along with Azure SQL Database or Azure Synapse Analytics and using another client tool such as Azure Data Studio.
If you wish to follow along with the video, you can use the SQL Scripts below:
IF OBJECT_ID(N'dbo.Customer', N'U') IS NOT NULL
DROP TABLE dbo.Customer;
CREATE TABLE dbo.Customer
(
FirstName VARCHAR(50),
LastName VARCHAR(50)
);
INSERT INTO dbo.Customer (FirstName, LastName)
VALUES
('Sarah', 'Power'),
('Sarah', 'Power'),
('Nathan', 'Jones'),
('Nathan', 'Jones'),
('Nathan', 'Jones'),
('Nathan', 'Jones'),
('Michelle', 'Arnold'),
('Michelle', 'Arnold'),
('Michelle', 'Arnold');
-- unique customers
SELECT DISTINCT
FirstName,
LastName
FROM dbo.Customer;
-- unique customers with unique identifier
SELECT
ROW_NUMBER() OVER(ORDER BY FirstName, LastName) AS RowNum,
FirstName,
LastName
FROM
(
SELECT DISTINCT
FirstName,
LastName
FROM dbo.Customer
) AS D;
15 сен 2024