Wednesday, May 25, 2011

InnoDB error : Total number of locks exceeds the lock table size

Recently, one of the clients that I'm working with had a problem with one of their replication slaves where an sql statement gave an error while executing on the slave which got executed successfully on the Master server.

The slave stopped with the error as below.

mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: reporting.com
Master_User: replication
Master_Port: 3306
Connect_Retry: 10
Master_Log_File: mysql3308-bin-log.000488
Read_Master_Log_Pos: 340658086
Relay_Log_File: mysql3319-relay-log.000156
Relay_Log_Pos: 4983506
Relay_Master_Log_File: mysql3308-bin-log.000462
Slave_IO_Running: Yes
Slave_SQL_Running: No
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 1206
Last_Error: Error 'The total number of locks exceeds the lock table size' on query. Default database: 'production'. Query: 'create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1'
Skip_Counter: 0
Exec_Master_Log_Pos: 4983353
Relay_Log_Space: 14212286599
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: NULL
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 1206
Last_SQL_Error: Error 'The total number of locks exceeds the lock table size' on query. Default database: 'production'. Query: 'create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1'
1 row in set (0.00 sec)


I have started the slave and monitored the query running status using "show engine innodb status\G". According to the EXPLAIN plan of the query, it is estimated to scan and lock about 910 million rows. The below InnoDB status showed that it already locked 677 million rows and the slave got stopped again with the same error.

This is an error specific to InnoDB tables. The query mentioned above attempts to set locks on all records scanned, to ensure the data returned doesn’t change before creation of the new table. These locks are put into an area of memory allocated within innodb_buffer_pool_size variable.




---TRANSACTION 5722EE5D, ACTIVE 18100 sec, process no 13052, OS thread id 1222195536 fetching rows
mysql tables in use 2, locked 2
2653776 lock struct(s), heap size 327759856, 677089582 row lock(s)
MySQL thread id 2860, query id 1863366 Copying to tmp table on disk
create table Summary SELECT Dimension.calendarYearMonth, count(*) as EmailsSent, count(distinct Email_Sent.UserID) as "Unique Recipients"
FROM Dimension, Email_Sent
WHERE
Email_Sent.DateID = Dimension.id
AND Dimension.date >= '2010-11-01'
GROUP BY 1
TABLE LOCK table `production`.`Email_Sent` trx id 5722EE5D lock mode IS
RECORD LOCKS space id 5623 page no 11 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
TABLE LOCK table `production`.`Date_Dimension` trx id 5722EE5D lock mode IS
RECORD LOCKS space id 5611 page no 34 n bits 240 index `PRIMARY` of table `production`.`Date_Dimension` trx id 5722EE5D lock mode S locks rec but not gap
RECORD LOCKS space id 5623 page no 12 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 15 n bits 272 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 18 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 21 n bits 224 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 29 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
RECORD LOCKS space id 5623 page no 31 n bits 440 index `ClientID` of table `production`.`Email_Sent` trx id 5722EE5D lock mode S
TOO LOCKS PRINTED FOR THIS TRX: SUPPRESSING FURTHER PRINTS


We found the main difference in MySQL configuration between Master and Slave was amount of memory allocated to InnoDB_buffer_pool_size because Slave was running several MySQL instances that were used for backup purposes.

Need to increase the memory of innodb_buffer_pool_size variable where the locks table can fit in the configuration file and restart the database should fix the problem.

InnoDB uses buffer pool to cache data and indexes of its tables, but it also uses it to store system information, like locks table. The query set locks on all the records it scans. If we have a small buffer pool size, then the InnoDB lock table may indeed grow so big that it does not fit in the buffer pool. Locks table size for the 677 million rows was 327M according to the innodb status. So, we have increased the innodb_buffer_pool_size from 500MB to 1GB which was enough to hold all the locks and restarted the database server. Now the SQL query completed successfully on the slave.


More details on the Innodb row level locks and the size it takes can be found here.