'Duplicate associated Entities fetched for Parent Entity

I have 3 entities.

  1. RequestHeader
  2. RequestDetail
  3. RequestDetailResponse

RequestHeader is one to many RequestDetails (RD many to one RH)

RequestDetail is one to many RequestDetailResponse (RDH many to one RD)

Here are the relationship definitions:

RequestHeader -> RequestDetail (note the field is declared as a Collection more on which later)

 @OneToMany(mappedBy = "requestHeader", fetch = FetchType.EAGER)
 Collection<RequestDetail> requestDetails = new HashSet<>();

RequestDetail -> RequestHeader

@ManyToOne
@JoinColumn(name="requestHeaderID")
private RequestHeader requestHeader;

RequestDetail -> DetailResponse

@OneToMany(mappedBy = "requestDetail", fetch = FetchType.EAGER)
Set<DetailResponse> detailResponses = new HashSet<>();

DetailResponse -> RequestDetail

@ManyToOne
@JoinColumn(name="requestDetailID")
private RequestDetail requestDetail;

My Question: when I obtain a collection of RequestDetails from a RequestHeader Entity (through a public getter) I obtain 'duplicate' RequestDetails. Playing around with sql, it seems the 'extra' details are reflective of joining with the DetailResponses. Meaning that for instance

SELECT * FROM RequestHeader rh INNER JOIN RequestDetail rd on     rd.RequestHeaderID = rh.RequestHeaderID 

returns, say, 5 records, where as

SELECT * FROM RequestHeader rh INNER JOIN RequestDetail rd on      rd.RequestHeaderID = rh.RequestHeaderID INNER JOIN DetailResponse drs on drs.RequestDetailID = rd.RequestDetailID

returns, say, 10 results, because of the many to one DetailResponse expansion

When I obtain the collection of RequestDetails from my RequestHeader object, I am getting a size 10 collection. Now if I declare the RequestDetails field on the RequestHeader Entity to be a Set rather than a Collection, I avoid this problem, but I'm thinking that since I have defined a primary key on the RequestDetail entity, the JPA management should be smart enough to know that there cannot be a duplicate in this context. What I have got wrong? Is using a Set declaration the correct way to go about this?

By the way here are my primary key declarations in case they are significant.

RequestHeader:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer requestHeaderID;

RequestDetail:

@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer requestDetailID;

DetailResponse:

     @Id
     @TableGenerator(name="DetailResponseStore", table="PRIMARY_KEYS",
pkColumnName = "KEY_NAME", pkColumnValue = "DETAIL_RESPONSE", valueColumnName = "NEXT_VALUE", initialValue = 1, allocationSize = 1)
@GeneratedValue(strategy = GenerationType.TABLE, generator = "DetailResponseStore")
private Integer detailResponseID;


Solution 1:[1]

(Answering my own question)

The short and sweet answer is that, yes, Hibernate expects the developer to ensure that collections do not contain duplicates. The first suggested method is, as in the question, to store the collection in a Set.

(this is an FAQ topic in the Hibernate documentation:

Hibernate does not return distinct results for a query with outer join fetching enabled for a collection (even if I use the distinct keyword)?

First, you need to understand SQL and how OUTER JOINs work in SQL. If you do not fully understand and comprehend outer joins in SQL, do not continue reading this FAQ item but consult a SQL manual or tutorial. Otherwise you will not understand the following explanation and you will complain about this behavior on the Hibernate forum.

Typical examples that might return duplicate references of the same Order object:

List result = session.createCriteria(Order.class)  
                        .setFetchMode("lineItems", FetchMode.JOIN)  
                        .list();  



<class name="Order">  
    ...  
    <set name="lineItems" fetch="join">  
  
List result = session.createCriteria(Order.class)  
                        .list();  



List result = session.createQuery("select o from Order o left join fetch o.lineItems").list();  

All of these examples produce the same SQL statement:

SELECT o.*, l.* from ORDER o LEFT OUTER JOIN LINE_ITEMS l ON o.ID = l.ORDER_ID  

Want to know why the duplicates are there? Look at the SQL resultset, Hibernate does not hide these duplicates on the left side of the outer joined result but returns all the duplicates of the driving table. If you have 5 orders in the database, and each order has 3 line items, the resultset will be 15 rows. The Java result list of these queries will have 15 elements, all of type Order. Only 5 Order instances will be created by Hibernate, but duplicates of the SQL resultset are preserved as duplicate references to these 5 instances. If you do not understand this last sentence, you need to read up on Java and the difference between an instance on the Java heap and a reference to such an instance.

(Why a left outer join? If you'd have an additional order with no line items, the result set would be 16 rows with NULL filling up the right side, where the line item data is for other order. You want orders even if they don't have line items, right? If not, use an inner join fetch in your HQL).

Hibernate does not filter out these duplicate references by default. Some people (not you) actually want this. How can you filter them out?

Like this:

Collection result = new LinkedHashSet( session.create*(...).list() );  

A LinkedHashSet filters out duplicate references (it's a set) and it preserves insertion order (order of elements in your result). That was too easy, so you can do it in many different and more difficult ways:

Note: the issue in the FAQ is not my exact issue: I am not accessing the collection via a session, but I am allowing Hibernate to populate a collection of child entities as a field on the parent. The child collection expands as a result of joining on their children to populate a collection of grandchildren as a field on the child (their parent). The similarity is in that looking at the SQL being generated by Hibernate I can see the big Left Join, and I glean from the FAQ that Hibernate does not bake in intelligence (such as looking at the ID primary key annotation on the entity) to filter out duplicates (some people want it that way, they say). Also they say explicitly to use a Set to filter out duplicates at the code level.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 metadaddy