'MySQL join query taking far too long
The following query is taking forever in MySQL 5.7.23, on a macOS 10.13.6 on 2018 Macbook Pro (A1990) with 50GB of free space:
INSERT INTO `a_sg1lib` (`book_id`,`title`, `isbn`, `Zstatus_retrieve_TOC`, `Zstatus_retrieve_classifybyoclc`,
`classifybyoclc_respcode`, `classifybyoclc_calln_lcc`, `classifybyoclc_calln_ddc`, `classifybyoclc_calln_nlm`,
`classifybyoclc_fast`, `classifybyoclc_owi`,`reference`, `data`, `crcomment`, `groupname`, `code`, `indentation`, `path`, `specialtag`,
`specialtag_cat`, `specialtag_master`, `specialtag_master_cat`, `specialtag_override_cat_moddetails`,
`specialtag_override_cat_timestamp`, `specialtag_override_cat`, `pageno`, `specialtag_escalated`, `history`,
`history_classification`, `specialtag_esc_start`, `specialtag_esc_start_type`, `callCore`,
`d1`, `d2`, `d3`, `d4`, `d5`, `d6`, `d7`, `d8`, `d9`, `d10`, `d11`, `d12`)
SELECT `RU_sg1lib_classifybyoclc`.`ID`,`Title`, `IdentifierWODash`, `Zstatus_retrieve_TOC`, `Zstatus_retrieve_classifybyoclc`, `classifybyoclc_respcode`, `classifybyoclc_calln_lcc`,
`classifybyoclc_calln_ddc`, `classifybyoclc_calln_nlm`,
`classifybyoclc_fast`, `classifybyoclc_owi`,`reference`, `data`,
`crcomment`, `groupname`, `code`, `indentation`, `path`, `specialtag`, `specialtag_cat`,
`specialtag_master`, `specialtag_master_cat`, `specialtag_override_cat_moddetails`,
`specialtag_override_cat_timestamp`, `specialtag_override_cat`, `pageno`, `specialtag_escalated`, `history`,
`history_classification`, `specialtag_esc_start`, `specialtag_esc_start_type`, `callCore`,
`d1`, `d2`, `d3`, `d4`, `d5`, `d6`, `d7`, `d8`, `d9`, `d10`, `d11`, `d12`
FROM `RU_sg1lib_classifybyoclc`
RIGHT JOIN `loc_classification`.`LOC_Classification_Text_zFULL_YYY`
ON `RU_sg1lib_classifybyoclc`.`classifybyoclc_calln_lcc`=
`loc_classification`.`LOC_Classification_Text_zFULL_YYY`.`callCore`;
Number of rows:
RU_sg1lib_classifybyoclc - 1+ million rows - MyISAM
LOC_Classification_Text_zFULL_YYY - 440000+ rows - InnoDB
a_sg1lib - InnoDB, initially empty
Schema:
CREATE TABLE `RU_sg1lib_classifybyoclc` (
`ID` int(15) unsigned NOT NULL AUTO_INCREMENT,
`Title` varchar(2000) DEFAULT '',
`VolumeInfo` varchar(100) DEFAULT '',
`Series` varchar(300) DEFAULT '',
`Periodical` varchar(200) DEFAULT '',
`Author` varchar(1000) DEFAULT '',
`Year` varchar(14) DEFAULT '',
`Edition` varchar(60) DEFAULT '',
`Publisher` varchar(400) DEFAULT '',
`City` varchar(100) DEFAULT '',
`Pages` varchar(100) DEFAULT '',
`PagesInFile` int(10) unsigned NOT NULL DEFAULT '0',
`Language` varchar(150) DEFAULT '',
`Topic` varchar(500) DEFAULT '',
`Library` varchar(50) DEFAULT '',
`Issue` varchar(100) DEFAULT '',
`Identifier` varchar(300) DEFAULT '',
`ISSN` varchar(9) DEFAULT '',
`ASIN` varchar(200) DEFAULT '',
`UDC` varchar(200) DEFAULT '',
`LBC` varchar(200) DEFAULT '',
`DDC` varchar(45) DEFAULT '',
`LCC` varchar(45) DEFAULT '',
`Doi` varchar(45) DEFAULT '',
`Googlebookid` varchar(45) DEFAULT '',
`OpenLibraryID` varchar(200) DEFAULT '',
`Commentary` varchar(10000) DEFAULT '',
`DPI` int(6) unsigned DEFAULT '0',
`Color` varchar(1) DEFAULT '',
`Cleaned` varchar(1) DEFAULT '',
`Orientation` varchar(1) DEFAULT '',
`Paginated` varchar(1) DEFAULT '',
`Scanned` varchar(1) DEFAULT '',
`Bookmarked` varchar(1) DEFAULT '',
`Searchable` varchar(1) DEFAULT '',
`Filesize` bigint(20) unsigned NOT NULL DEFAULT '0',
`Extension` varchar(50) DEFAULT '',
`MD5` char(32) DEFAULT '',
`Generic` char(32) DEFAULT '',
`Visible` char(3) DEFAULT '',
`Locator` varchar(733) DEFAULT '',
`Local` int(10) unsigned DEFAULT '0',
`TimeAdded` timestamp NOT NULL DEFAULT '2000-01-01 13:00:00',
`TimeLastModified` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
`Coverurl` varchar(200) DEFAULT '',
`Tags` varchar(500) DEFAULT '',
`IdentifierWODash` varchar(300) DEFAULT '',
`Zstatus_retrieve_TOC` varchar(255) DEFAULT NULL,
`Zstatus_retrieve_classifybyoclc` varchar(255) DEFAULT NULL,
`classifybyoclc_respcode` text,
`classifybyoclc_calln_lcc` varchar(256) DEFAULT NULL,
`classifybyoclc_calln_ddc` text,
`classifybyoclc_calln_nlm` text,
`classifybyoclc_fast` text,
`classifybyoclc_owi` text,
`classifybyoclc_timest` text,
`pid` text,
`ida` int(11) DEFAULT NULL,
PRIMARY KEY (`ID`),
UNIQUE KEY `MD5` (`MD5`),
KEY `Generic` (`Generic`) USING BTREE,
KEY `VisibleTimeAdded` (`Visible`,`TimeAdded`) USING BTREE,
KEY `TimeAdded` (`TimeAdded`) USING BTREE,
KEY `Topic` (`Topic`(3)) USING BTREE,
KEY `VisibleID` (`Visible`,`ID`) USING BTREE,
KEY `VisibleTimeLastModified` (`Visible`,`TimeLastModified`,`ID`) USING BTREE,
KEY `TimeLastModifiedID` (`TimeLastModified`,`ID`) USING BTREE,
KEY `DOI_INDEX` (`Doi`) USING BTREE,
KEY `Identifier` (`Identifier`),
KEY `classifybyoclc_calln_lcc` (`classifybyoclc_calln_lcc`),
KEY `classifybyoclc_fast` (`classifybyoclc_fast`(300)),
FULLTEXT KEY `Title` (`Title`),
FULLTEXT KEY `Author` (`Author`),
FULLTEXT KEY `Language` (`Language`),
FULLTEXT KEY `Extension` (`Extension`),
FULLTEXT KEY `Publisher` (`Publisher`),
FULLTEXT KEY `Series` (`Series`),
FULLTEXT KEY `Year` (`Year`),
FULLTEXT KEY `Title1` (`Title`,`Author`,`Series`,`Publisher`,`Year`,`Periodical`,`VolumeInfo`),
FULLTEXT KEY `Tags` (`Tags`),
FULLTEXT KEY `Identifierfulltext` (`IdentifierWODash`)
) ENGINE=InnoDB AUTO_INCREMENT=3246566 DEFAULT CHARSET=utf8
CREATE TABLE `LOC_Classification_Text_zFULL_YYY` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`reference` longtext COLLATE utf8mb4_unicode_ci,
`data` longtext COLLATE utf8mb4_unicode_ci,
`crcomment` longtext COLLATE utf8mb4_unicode_ci,
`groupname` longtext COLLATE utf8mb4_unicode_ci,
`code` longtext COLLATE utf8mb4_unicode_ci,
`indentation` int(10) unsigned DEFAULT NULL,
`path` longtext COLLATE utf8mb4_unicode_ci,
`specialtag` longtext COLLATE utf8mb4_unicode_ci,
`specialtag_cat` mediumtext COLLATE utf8mb4_unicode_ci,
`specialtag_master` mediumtext COLLATE utf8mb4_unicode_ci,
`specialtag_master_cat` mediumtext COLLATE utf8mb4_unicode_ci,
`specialtag_override_cat_moddetails` mediumtext COLLATE utf8mb4_unicode_ci,
`specialtag_override_cat_timestamp` mediumtext COLLATE utf8mb4_unicode_ci,
`specialtag_override_cat` mediumtext COLLATE utf8mb4_unicode_ci,
`pageno` longtext COLLATE utf8mb4_unicode_ci,
`specialtag_escalated` longtext COLLATE utf8mb4_unicode_ci,
`history` longtext COLLATE utf8mb4_unicode_ci,
`history_classification` longtext COLLATE utf8mb4_unicode_ci,
`specialtag_esc_start` longtext COLLATE utf8mb4_unicode_ci,
`specialtag_esc_start_type` longtext COLLATE utf8mb4_unicode_ci,
`callCore` varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
`d1` longtext COLLATE utf8mb4_unicode_ci,
`d2` longtext COLLATE utf8mb4_unicode_ci,
`d3` longtext COLLATE utf8mb4_unicode_ci,
`d4` longtext COLLATE utf8mb4_unicode_ci,
`d5` longtext COLLATE utf8mb4_unicode_ci,
`d6` longtext COLLATE utf8mb4_unicode_ci,
`d7` longtext COLLATE utf8mb4_unicode_ci,
`d8` longtext COLLATE utf8mb4_unicode_ci,
`d9` longtext COLLATE utf8mb4_unicode_ci,
`d10` longtext COLLATE utf8mb4_unicode_ci,
`d11` longtext COLLATE utf8mb4_unicode_ci,
`d12` longtext COLLATE utf8mb4_unicode_ci,
PRIMARY KEY (`id`),
KEY `indentation` (`indentation`),
KEY `callCore` (`callCore`),
FULLTEXT KEY `specialtag` (`specialtag`)
) ENGINE=InnoDB AUTO_INCREMENT=496218 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci |
| a_sg1lib | CREATE TABLE `a_sg1lib` (
`ida` int(11) DEFAULT '0',
`classifybyoclc_respcode` text CHARACTER SET utf8mb4,
`classifybyoclc_timest` text CHARACTER SET utf8mb4,
`isbn` text CHARACTER SET utf8mb4,
`choseStatus` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
`choseStatus2` varchar(1000) CHARACTER SET utf8mb4 DEFAULT NULL,
`choseStatusSpecial` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
`choseStatusMarker` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
`title` text CHARACTER SET utf8mb4,
`subtitle` text CHARACTER SET utf8mb4,
`publisher_name` text CHARACTER SET utf8mb4,
`imprint_name` text CHARACTER SET utf8mb4,
`publication_date` text CHARACTER SET utf8mb4,
`edition_number` text CHARACTER SET utf8mb4,
`authors` text CHARACTER SET utf8mb4,
`editors` text CHARACTER SET utf8mb4,
`others` text CHARACTER SET utf8mb4,
`contributors` text CHARACTER SET utf8mb4,
`is_activated` text CHARACTER SET utf8mb4,
`public_url` text CHARACTER SET utf8mb4,
`date_added` text CHARACTER SET utf8mb4,
`format` text CHARACTER SET utf8mb4,
`rating` text CHARACTER SET utf8mb4,
`engagement_score` text CHARACTER SET utf8mb4,
`university_list_count` text CHARACTER SET utf8mb4,
`published_list_count` text CHARACTER SET utf8mb4,
`award_count` text CHARACTER SET utf8mb4,
`mobile_disabled` text CHARACTER SET utf8mb4,
`categories` text CHARACTER SET utf8mb4,
`year` text CHARACTER SET utf8mb4,
`subjects` text CHARACTER SET utf8mb4,
`topics` text CHARACTER SET utf8mb4,
`topics_facet_filter` text CHARACTER SET utf8mb4,
`topics_detailed` text CHARACTER SET utf8mb4,
`main_subject` text CHARACTER SET utf8mb4,
`main_topic` text CHARACTER SET utf8mb4,
`keywords` text CHARACTER SET utf8mb4,
`description` text CHARACTER SET utf8mb4,
`language_id` text CHARACTER SET utf8mb4,
`language` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
`sales_rights` text CHARACTER SET utf8mb4,
`cover_image` text CHARACTER SET utf8mb4,
`related_isbns` text CHARACTER SET utf8mb4,
`file_size` text CHARACTER SET utf8mb4,
`mobile_disabled_v3` text CHARACTER SET utf8mb4,
`is_restricted` text CHARACTER SET utf8mb4,
`organisation_list` text CHARACTER SET utf8mb4,
`objectID` text CHARACTER SET utf8mb4,
`_highlightResult` text CHARACTER SET utf8mb4,
`chapters` text CHARACTER SET utf8mb4,
`Zstatus_retrieve_TOC` text CHARACTER SET utf8mb4,
`Zstatus_retrieve_classifybyoclc` text CHARACTER SET utf8mb4,
`pid` text CHARACTER SET utf8mb4,
`ida2` int(11) DEFAULT NULL,
`withdrawal_date` text CHARACTER SET utf8mb4,
`book_id` int(13) DEFAULT NULL,
`classifybyoclc_calln_lcc` varchar(256) CHARACTER SET utf8mb4 DEFAULT NULL,
`classifybyoclc_calln_ddc` text CHARACTER SET utf8mb4,
`classifybyoclc_calln_nlm` text CHARACTER SET utf8mb4,
`classifybyoclc_fast` varchar(960) CHARACTER SET utf8mb4 DEFAULT NULL,
`classifybyoclc_owi` text CHARACTER SET utf8mb4,
`classifybyoclc_fast_max` varchar(512) CHARACTER SET utf8mb4 DEFAULT NULL,
`id` int(10) unsigned DEFAULT '0',
`reference` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`data` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`crcomment` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`groupname` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`code` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`indentation` int(10) unsigned DEFAULT NULL,
`path` varchar(1000) DEFAULT NULL,
`specialtag` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_master` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_master_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_override_cat_moddetails` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_override_cat_timestamp` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_override_cat` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`pageno` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_escalated` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`history` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`history_classification` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_esc_start` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`specialtag_esc_start_type` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`callCore` varchar(256) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
`d1` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d2` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d3` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d4` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d5` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d6` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d7` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d8` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d9` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d10` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d11` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`d12` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
`idmaster` int(11) NOT NULL AUTO_INCREMENT,
PRIMARY KEY (`idmaster`),
KEY `choseStatus` (`choseStatus`),
KEY `classifybyoclc_calln_lcc` (`classifybyoclc_calln_lcc`),
KEY `book_id` (`book_id`),
KEY `path` (`path`),
KEY `choseStatusSpecial` (`choseStatusSpecial`),
KEY `choseStatusMarker` (`choseStatusMarker`),
KEY `choseStatus2` (`choseStatus2`(256)) USING BTREE,
FULLTEXT KEY `title` (`title`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 |
I thought what makes the query so slow is the fact that one of the tables is MyISAM, and tried to change that, but even that takes forever. Altering the columns involved in the join from TEXT to VARCHAR also takes forever ;(;(. Basically I am stuck ;(
May I know what is wrong, and how can I make this faster?
Solution 1:[1]
SHOW CREATE TABLE is more descriptive than DESCRIBE. In particular, I need to see if the indexes are composite and, if so, what order the columns are in.
RU_sg1lib_classifybyoclc: INDEX(classifybyoclc_calln_lcc)
If it turns out that that is really a JOIN instead of a RIGHT JOIN, then this might be useful:
LOC_Classification_Text_zFULL_YYY: INDEX(callCore)
I suspect that most of the columns are declared larger than necessary. Shrinking them may help some. For example, the 8-byte BIGINT is usually bigger than will ever be needed. year and some ids are unnecessarily (and inefficiently) TEXT.
Solution 2:[2]
forgodsakehold, We should talk if this is not clear. View my profile for contact info, please.
Observation 1,
RU_sg1lib_classifybyoclc.ID appears to be INT(15) unsigned
SELECTED for push into a_sg11lib table as book_id
and
a_sg1lib.book_id appears to be simple INT(13) (NOT unsigned and diff lengths)
most often we see same column attributes when moving data.
Observation 2, RIGHT JOIN of calln_lcc = to callCore
RU_sg1lib_classifbyoclc.classifybyoclc_calln_lcc is simple varchar(256)
and
LOC_Classification_Text_zFULL_YYY.callcore is varchar(256) COLLATE utf8mb4_unicode_ci
most often we see matched column definitions for left and right side of =.
Solution 3:[3]
What's going on here to make this slow? A few things.
You have big tables and no WHERE clause, so this statement handles a lot of data. It's never going to be sub-minute fast.
Your tables have many CLOB (mediumtext, longtext) columns. The way MySQL stores those columns makes retrieving and storing them surprisingly expensive. Your query insists on retrieving them all and storing them again.
You correctly identified the mix of MyISAM and InnoDB tables as a possible performance problem. But, your table definitions show InnoDB tables, so you may have successfully converted one of them. That's good.
You're joining on this:
ON `RU_sg1lib_classifybyoclc`.`classifybyoclc_calln_lcc` = `loc_classification`.`LOC_Classification_Text_zFULL_YYY`.`callCore`;It's very important to performance that the columns in your ON-clauses have precisely the same definition. The declaration of
classifybyoclc_calln_lccinRU_sg1lib_classifybyoclclacks both its own COLLATE clause and a default COLLATE for the table, so its collation is unknown. It is the database's default, which we don't know.varchar(256) CHARACTER SET utf8mb4 DEFAULT NULLThe declaration of
callCoreinLOC_Classification_Text_zFULL_YYYis this.varchar(256) COLLATE utf8mb4_unicode_ci DEFAULT NULL,If your database default collation is also
utf8mb4_unicode_ciyou should be in good shape. Otherwise alter the first table to specify the collation forclassifybyoclc_calln_lccexplicitly.It's possible you could store those OCLC call numbers using
CHARACTER SET latin1 COLLATE latin1_bin. They may not need to be case-insensitive and they may not contain characters outside the ASCII and western European character sets. If your call numbers work like that, you might alter your tables to use the more efficient collation. Latin-1 text takes a quarter as many bytes to store in indexes as utfmb4 text. If you need case-insensitive searching on the call numbers, useCOLLATE latin1_general_ci.RIGHT JOINis just plain weird. Almost everybody usesLEFT JOIN, reversing the order of your two tables. Do you really need an outer join like that? If you don't need an outer join, use an ordinaryINNERjoin.
What can you do about this? You didn't tell us why you need that a_sg1lib table. If you're using it with queries containing WHERE clauses, to find just a few rows at a time, you could create a VIEW rather than a TABLE. Each lookup will then use the indexes on your existing tables, and you won't have to copy the data.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Rick James |
| Solution 2 | Wilson Hauck |
| Solution 3 | O. Jones |
