#spark #interviewquestions #dataengineers #pyspark #sparksql
Question: write a spark or sql code to find the employee count under each manager?
input:
data = [('4529', 'Nancy', 'Young', '4125'),
('4238','John', 'Simon', '4329'),
('4329', 'Martina', 'Candreva', '4125'),
('4009', 'Klaus', 'Koch', '4329'),
('4125', 'Mafalda', 'Ranieri', 'NULL'),
('4500', 'Jakub', 'Hrabal', '4529'),
('4118', 'Moira', 'Areas', '4952'),
('4012', 'Jon', 'Nilssen', '4952'),
('4952', 'Sandra', 'Rajkovic', '4529'),
('4444', 'Seamus', 'Quinn', '4329')]
schema = ['employee_id' ,'first_name', 'last_name', 'manager_id']
output:
employee_id | first_name | count
4125, | Mafalda, | 2
SPARK SQL solution:
df = spark.createDataFrame(data=data, schema=schema)
df.createOrReplaceTempView('EMP')
df.show()
query = '''select e.manager_id as manager_id,
count(e.employee_id) as no_of_emp,(m.First_name) as mangr_name
from emp e
inner join emp m
on m.employee_id =e.manager_id
group by 1,3
'''
result=spark.sql(query).show()
checkout:
/ poojatripathi0697
4 июн 2024