'Why does the polybase pushdown filter for join not work?
Recently put polybase into production and have noticed that pushdown isn't working for joins. Take the following query:
CREATE TABLE #VendIDs
(VendorAN int primary key)
INSERT INTO #VendIDs(VendorAN)
VALUES (1),(89980),(89993),(90002),(90003),(90008),(90004),(90015),(90018),(97140),(97139),(97138)
,(97137),(97136),(97135),(97134),(97059),(97058),(97057),(97056),(97055),(97054),(97053),(97052)
SELECT VW.VendorAN, [Type], Code, [Name],Address1, Address2, City,State, Zip,Country, Phone,
ShipAddress1, ShipAddress2, ShipCity, ShipState, ShipZip,Latitude, Longitude
FROM vwVendor VW
JOIN #VendIDs FV ON VW.VendorAN = FV.VendorAN
The execution plan shows 22k rows from the 'remote query', which just happens to match the number of rows in the external table. After enabling trace flag 6408, it shows 22k records on the external side.
If I do a simply where vendorAN = XXXXXX, I can clearly see 1 row being returned via the remote query and the filtering be done on the remote side.
Anyone have a thought on why I'm not seeing pushdown filtering on the join as shown above? Makes no sense to me based upon what I've read to date.
Referencing this article for how to know if pushdown filtering is occurring: https://docs.microsoft.com/en-us/sql/relational-databases/polybase/polybase-how-to-tell-pushdown-computation?view=sql-server-ver15#--use-tf6408
Solution 1:[1]
- Is your external table created with PUSHDOWN=ON or your query uses OPTION(ENABLE EXTERNALPUSHDOWN)?
- Have you created statistics from the external table?
- Is the remote table partitioned?
- How is vwVendor created? Is it a view on the external table joined to other tables?
You also need to take a look at sys.dm_exec_distributed_request_steps and sys.dm_exec_distributed_sql_requests to see what is occurring under the hood.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pabechevb |
