There's a few different approaches, check out this link that goes through various ways you can set up CICD to use github to store your DAG code! docs.astronomer.io/astro/ci-cd-templates/github-actions
Can Airflow be used to orchestrate a spark streaming YARN job that pulls data from Kafka and writes to HDFS?.. the idea is if the spark streaming job queues and it can be monitored/alerted/detected and restarted automatically by Airflow?
Oh definitely they can! Check out this link for the different options you have for managing Spark via Airflow, you'll probably want to use a Spark hook registry.astronomer.io/providers/apache-airflow-providers-apache-spark/versions/4.1.5