'Azure Synapse Union - How do I merge tables instead of adding rows

I am looking for a way to merge three seperate datasets (.csv format) into one in Azure Synapse and then store it as a new .csv in Azure Blob Storage. I am using the Union data flow based on this tutorial: https://www.youtube.com/watch?v=vFCNbHqWct8

Generally speaking, the extraction and saving of the new file works. However, when merging the files I receive 3x the number of rows as in the source datasets. Each source dataset has 36 entries each. CustomerID ranges from 1-36 in each dataset.

Dataset 1 has 2 columns: CustomerID, loyalty_level Dataset 2 has 3 columns: CustomerID, name, email Dataset 3 has 2 columns: CustomerID, salestotal

When I run it, I get a dataset with 108 rows, instead of the aspired 36. Where is my mistake? Am I approaching the process incorrectly?

enter image description here



Solution 1:[1]

You are getting 108 rows because the union transformation is combining the 3 separate datasets into 1. If you watch the video in the union transformation documentation page it describes the behavior of this transformation.

To get your desired results you need to use the join transformation. Using the CustomerID as your join condition this will join the datasets together keeping your row count at 36.

One thing to watch out for is the type of join you choose. If you have customers in one file that are not in another you can drop records. This post describes the different types of joins very well. I suggest you get a firm understanding of this different types of joins.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Chris Albert