day 8 | capgemini interview question | pyspark scenario based interview questions and answers

Подписаться 5 тыс.

Просмотров 1,9 тыс.

50% 1

pyspark scenario based interview questions and answers
capgemini interview question and answers
Create DataFrame :
================
lift_data = [
(1,300),
(2,350)
]
lift_schema = "id int , capacity_kg int"
lift_df = spark.createDataFrame(data = lift_data , schema = lift_schema)
lift_passengers_data = [
('Rahul',85,1),
('Adarsh',73,1),
('Riti',95,1),
('Viraj',80,1),
('Vimal',83,2),
('Neha',77,2),
('Priti',73,2),
('Himanshi',85,2)
]
lift_passengers_schema = "passenger_name string , weight_kg int, lift_id int"
lift_passengers_df = spark.createDataFrame(data = lift_passengers_data , schema = lift_passengers_schema)
lift_df.display()
lift_passengers_df.display()
Need help ? Connect with me 1:1 - topmate.io/dew...
Let's connect on LinkedIn : / dhirajgupta141
pyspark 30 days challenge : • pyspark 30 days challenge
DSA In Python Interview Series : • dsa for data engineer ...
PySpark Interview Series : • pyspark interview ques...
Pandas Interview Series : • pandas interview quest...
SQL Interview Series : • sql interview question...
PySpark Installation and Setup : • Spark Installation | P...
#pyspark #capgemini #capgeminioffcampus

Опубликовано:

26 сен 2024

Ссылка:

Скачать:

Готовим ссылку...

Добавить в:

Мой плейлист

Посмотреть позже

Комментарии : 6

@ParmeshwarSalunke-lo8zy 4 месяца назад

str = 'HelloabcdefHelloxyz' pattern = 'Hello' ouput : pattern found at 0,11 how to solve in python, i dont understand the logic

@grvdjkg 3 месяца назад

Maybe the question asks about where are the occurances of 'Hello' substring. In the given string ''HelloabcdefHelloxyz'' "Hello" starts at indices 0 and 11. So that is what this question asks. below is your solution: import re def find_pattern_indices(text, pattern): # The 'finditer' method from re module finds all occurrences of the pattern matches = re.finditer(pattern, text) # Extract the starting index of each match indices = [match.start() for match in matches] print(f"pattern found at {', '.join(map(str, indices))}") text = 'HelloabcdefHelloxyz' pattern = 'Hello' #function call find_pattern_indices(text, pattern)

@sarathkumar-tr3is 4 месяца назад

Hi , it's a good question. I have solved it, please have a look and let me know. window=Window.partitionBy('lift_id').orderBy('weight_kg') tr_df=lift_passengers_df.withColumn('Running_Sum', sum('weight_kg').over(window)) joined_df=tr_df.join(lift_df, tr_df.lift_id == lift_df.id, 'left').filter(tr_df.Running_Sum < lift_df.capacity_kg) joined_df.groupBy('lift_id').agg(collect_list(col('passenger_name')).alias('passenger_name')).show(truncate=False)

@mojijojo-r1d 3 месяца назад

Bro can u solve this File 1- DF1 +------------------------------------+----------------------------------+ |id |segments | +------------------------------------+----------------------------------+ |aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee|2036,2476,2396,1366,1456,1466,1516| +------------------------------------+----------------------------------+ File 2 - DF2 +------------------------------------+----------------------------------+ |id |segments | +------------------------------------+----------------------------------+ |aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee|2036,2476,2396,1366,1456,1466,1515| +------------------------------------+----------------------------------+ finalDF +------------------------------------+---------------------------------------+ |id |segments | +------------------------------------+---------------------------------------+ |aaaaaaaa-bbbb-cccc-dddd-eeeeeeeeeeee|2036,2476,2396,1366,1456,1466,1515,1516| +------------------------------------+---------------------------------------+ need to remove duplicate and join them to get final dataframe where id is the same

@DEwithDhairy 2 месяца назад

Bro the question is not clear over here.. can you share the question on linkedin ( Image form of the question )