'How to convert a presto query output to a python data frame
I want to convert my query output to a python data frame to draw Line graph
import prestodb
import pandas as pd
conn=prestodb.dbapi.connect(
host='10.0.0.101',
port=8081,
user='hive',
catalog='hive',
schema='ong',
)
cur = conn.cursor()
query="SELECT dtime,tagName FROM machine where tagname is not null
limit 1000"
cur.execute(query)
rows = cur.fetchall()
print(rows)
df = pd.DataFrame(query, columns=['x_axis','tagName'])
This is my sample output from query
[['2018-09-08 00:00:00.000', 26], ['2018-09-08 01:00:00.000', 26],
['2018-09-08 02:00:00.000', 26], ['2018-09-08 03:00:00.000', 27],
['2018-09-08 04:00:00.000', 27], ['2018-09-08 05:00:00.000', 27]]
how to convert this query output to a data frame using python
Solution 1:[1]
It's very simple, I would suggest you to use pyhive.presto
connector (see: https://github.com/dropbox/PyHive), to connect to presto, but also the one you use should work the same way.
Then you have a couple of options:
1 - Use presto connection and pandas read_sql_query
2 - Use presto cursor and use the output of fetchall as input data of the dataframe.
# option 1
import pandas as pd
from pyhive import presto
connection = presto.connect(user='my-user', host='presto.my.host.com', port=8889)
df = pd.read_sql_query("select 100", connection)
print(
df.head()
)
or
# option 2
import pandas as pd
from pyhive import presto
connection = presto.connect(user='my-user', host='presto.my.host.com', port=8889)
cur = connection.cursor()
cur.execute("select 100")
df = pd.DataFrame(cur.fetchall())
print(
df.head()
)
Solution 2:[2]
df = pd.DataFrame(cur.fetchall()) print(df)
Solution 3:[3]
To get data and column names into a dataframe, use this:
df = pd.DataFrame(cur.fetchall(), columns=[i[0] for i in cur.description])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Hammond95 |
Solution 2 | |
Solution 3 | scottlittle |