'Run SQL query on dataframe via Pyspark

I would like to run sql query on dataframe but do I have to create a view on this dataframe? Is there any easier way?

df = spark.createDataFrame([
('a', 1, 1), ('a',1, None), ('b', 1, 1),
('c',1, None), ('d', None, 1),('e', 1, 1)
]).toDF('id', 'foo', 'bar')

and the query I want to run some complex queries against this dataframe. For example I can do

df.createOrReplaceTempView("temp_view")
df_new = pyspark.sql("select id,max(foo) from temp_view group by id")

but do I have to convert it to view first before querying it? I know there is a dataframe equivalent operation. The above query is only an example.

Solution 1:^[1]

You can just do

df.select('id', 'foo')

This will return a new Spark DataFrame with columns id and foo.

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Solution	Source
Solution 1	fuzzy-memory