'How to do a efficient contained search among two columns?
I have a DriverEntity which is having a private property i.e., DriverInfoEntity which is having driver's first name and last name.
Sample code :
public class DriverEntity{
@JsonManagedReference(value = "driver-info")
@OneToOne(mappedBy = "driver", cascade = CascadeType.ALL)
private DriverInfoEntity driverInfo;
.
.
.
//few other private attributes
}
public class DriverInfoEntity {
@Column(name = "FIRST_NAME")
private String firstName;
@Column(name = "MIDDLE_NAME")
private String middleName;
@Column(name = "LAST_NAME")
private String lastName;
.
.
//few other private attributes
}
And I am doing a contained search on the firstName and lastName columns using specification Builder. Idea is to search based on the provided driverName which may be only firstName or (firstName +' '+ lastName) or only lastName.
Logic written is:
public Specification buildSpecification(Specification specification) {
if (driverSearchRequestFilterDTO.getDriverName() != null) {
List<Object> referenceList = new ArrayList<>();
referenceList.add(DriverEntity_.driverInfo);
specification = specification.and(specificationBuilder
.getSpecificationForConcatField(referenceList, DriverInfoEntity_.firstName,
DriverInfoEntity_.lastName, driverSearchRequestFilterDTO.getDriverName()));
}
return specification;
}
public static Predicate buildConcatPredicateWithJoin(Root root, Join joinCondition, CriteriaBuilder criteriaBuilder,
SingularAttribute firstColumn, SingularAttribute secondColumn, String inputQuery) {
Predicate firstColumnLike;
Predicate secondColumnLike;
Predicate firstSecondColumnLike;
Predicate secondFirstColumnLike;
if (Objects.nonNull(root)) {
firstColumnLike = criteriaBuilder.like(criteriaBuilder.lower(root.get(firstColumn)), inputQuery.replaceAll("\\*", "%"));
secondColumnLike = criteriaBuilder.like(criteriaBuilder.lower(root.get(secondColumn)), inputQuery.replaceAll("\\*", "%"));
//combination of firstColumn + " " + second column
Expression<String> exp1 = criteriaBuilder.concat(criteriaBuilder.lower(root.get(firstColumn)), " ");
exp1 = criteriaBuilder.concat(exp1, criteriaBuilder.lower(root.get(secondColumn)));
firstSecondColumnLike = criteriaBuilder.like(exp1, inputQuery.replaceAll("\\*", "%"));
//combination of secondColumn + " " + firstColumn
Expression<String> exp2 = criteriaBuilder.concat(criteriaBuilder.lower(root.get(secondColumn)), " ");
exp2 = criteriaBuilder.concat(exp2, criteriaBuilder.lower(root.get(firstColumn)));
secondFirstColumnLike = criteriaBuilder.like(exp2, inputQuery.replaceAll("\\*", "%"));
} else {
firstColumnLike = criteriaBuilder.like(criteriaBuilder.lower(joinCondition.get(firstColumn)), inputQuery.replaceAll("\\*", "%"));
secondColumnLike = criteriaBuilder.like(criteriaBuilder.lower(joinCondition.get(secondColumn)), inputQuery.replaceAll("\\*", "%"));
// to search by first name
Expression<String> exp1 = criteriaBuilder.concat(criteriaBuilder.lower(joinCondition.get(firstColumn)), " ");
exp1 = criteriaBuilder.concat(exp1, criteriaBuilder.lower(joinCondition.get(secondColumn)));
firstSecondColumnLike = criteriaBuilder.like(exp1, inputQuery.replaceAll("\\*", "%"));
//to search by last name
Expression<String> exp2 = criteriaBuilder.concat(criteriaBuilder.lower(joinCondition.get(secondColumn)), " ");
exp2 = criteriaBuilder.concat(exp2, criteriaBuilder.lower(joinCondition.get(firstColumn)));
secondFirstColumnLike = criteriaBuilder.like(exp2, inputQuery.replaceAll("\\*", "%"));
}
return criteriaBuilder.or(firstColumnLike, secondColumnLike, firstSecondColumnLike, secondFirstColumnLike);
}
And the main SQL query which is getting created :

The whole code is working fine but having mojor impact on performance when we are having lots of data. As the processing at the backend taking lot of time for huge data set. I would need some enlightenment to make this process quick enough so that it can filter out the huge set of drivers efficiently ?
Info regarding front-end : At the UI we have auto-complete feature which shows only 20 records at a time if if the search is successful whereas the record per page is being implemented at the backend. And that's why I am using Limit in the SQL query.
As you can see it shows the results here in the drop down.
Solution 1:[1]
If the two name columns have a collation that is "case_insensive" you don't need the LOWER() calls. Then the the WHERE clause might be as simple as
WHERE first_name = ? OR last_name = ?
If you are planning to allow wild cards, then
WHERE first_name LIKE ? OR last_name LIKE ?
What the heck is the CONCAT() for?
However, my suggestion is mostly a simplification, it still requires checking every row. That is because of the OR.
About the only way to get rid of the OR is to use a UNION, but that gets messy:
( SELECT ...
FROM ...
JOIN ... ON ...
WHERE first_name LIKE ?
LIMIT ?)
UNION DISTINCT
( SELECT ...
FROM ...
JOIN ... ON ...
WHERE last_name LIKE ?
LIMIT ? )
LIMIT ?
plus
INDEX(first_name),
INDEX(last_name)
Then there are outer issues...
- A
LIMITwithout anORDER BYdoes not make much sense. - If you will also have
OFFSET, it gets messier - If the user provides a leading wild card on the
LIKEstring, then it will still have to do a table scan; in fact two scans.
A FULLTEXT index could be used, but it has limitations, and you would need to combine the two columns into a single column. That way one name would match either first or last very efficiently. But the user would have to enter the full first and/or last name, not just an initial.
Auto-complete
Since the purpose is for an auto-complete UI, I will assume there is no leading wildcard on the LIKE. But, if you still need to check the entry of either first or last name, then the UNION is necessary.
Have the Javascript avoid asking for names until the user has entered at least two letters. This will limit the returned list to under 1% of the total (usually). I suspect that currently you are asking for a list after the first letter and it is looking at 10% of the names.
Without the indexes and union, up to 100% of the table is read each time. My suggestions above limit it to "up to 1%", thereby making things 100x faster. (Well, for various reasons, do not count on a full 100x.)
You must decide what to do about ordering of the subqueries of the Union and of the union as a whole. Otherwise, there will be some surprising lists coming back. The user experience will be sub-par.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |


