Using InnoDB table compression in MySQL 5.5

We have a database that stores XML BLOBs. The database is big, it takes about more than 900GB of disk space. Compressed mysqldump in contrast, takes about 60GB. Since MySQL 5.5 went GA last Friday, it is quite obvious, that this database is a good candidate for applying the new MySQL feature of the InnoDB Data Compression. We are playing with MySQL 5.5 since Google released its version of MySQL and it was adopted and published by Sun as 5.4 beta. I was excited by the new ability of compressing table data in a InnoDB table and I tried to enable it on one of our replicas. It was not that successful, and MySQL has segfaulted on importing very simple data chunk into compression enabled table. So, again, we are promised now that 5.5 has reached production quality and I decided to try it once again on our XML BLOB database.

There are a few things that we should know before we just do “alter table blobs ROW_FORMAT=COMPRESSED;” on a database running in MySQL 5.5.8 server. Table compression can be enabled only on tables stored in the new InnoDB file format called Barracuda. MySQL documentation is traditionally messed up and confusing. While this documentation entry is saying that “In MySQL 5.5.5 and higher, the default value is “Barracuda””, it is actually not true and newly installed MySQL 5.5.8 server default file format is the old one called “Antelope”. It is noted in the new manual document. You can see that Barracuda was default in (>= 5.5.0, <= 5.5.6) versions, but (>= 5.5.7) version defaults to the old one, “Antelope” file format. Needless to say that ‘This applies only for tables that have their own tablespace, so for it to have an effect, innodb_file_per_table must be enabled.”

So after we installed the new, 5.5.8 GA on our server, the following lines must be present in the /etc/my.cnf file:

innodb_file_per_table = 1
innodb_file_format = Barracuda

Currently running settings can be validated this way:

mysql> show variables like ‘innodb_file%’;
+———————————+————-+
| Variable_name                          |     Value       |
+———————————+————-+
| innodb_file_format | Barracuda |
| innodb_file_format_check    | ON              |
| innodb_file_format_max       | Barracuda |
| innodb_file_per_table | ON |
+———————————+————+
4 rows in set (0.01 sec)

After  server is started with this config, we are able to create a compressed table with the following statement:

CREATE TABLE blobs (
id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
payload longtext,
dateupdated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 DEFAULT CHARSET=utf8;

I specified compressed page size of 16K as the database is used for storing XML blobs. And as noted in the MySQL documentation, “Using a 16K compressed page size can reduce storage and I/O costs for BLOBVARCHAR or TEXT columns, because such data often compress well, and might therefore require fewer “overflow” pages, even though the B-tree nodes themselves take as many pages as in the uncompressed form”.

This is it, our blobs table is now compressed. It can be verified by:

mysql> show table status like ‘blobs’\G
*************************** 1. row ***************************
Name: blobs
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 4
Avg_row_length: 4096
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: 5
Create_time: 2010-12-18 17:56:57
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=16
Comment:
1 row in set (0.01 sec)

Converting existing database using mysqldump

Instead of altering existing tables which will not free up already allocated space we can dump existing database with mysqldump, change the create table statements inside the dump file and restore it on another server. After that we can set up replication and swap databases when everything is in sync.

Dumping existing database:

mysqldump –single-transaction –flush-logs –master-data=2 -R blob_db | sed ‘s%ENGINE\=InnoDB%ENGINE\=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 %g’ > blob_db_compressed.sql

In case and we have existing dump file it can be converted this way (we keep it compressed):

gzcat blob_db.sql.gz | sed ‘s%ENGINE\=InnoDB%ENGINE\=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 %g’ > blob_db_compressed.sql

On the destination server:

  1. Create database. Run mysql shell and execute:

    mysql> create database blob_db default charset=utf8;
    Query OK, 1 row affected (0.00 sec)

  2. Restore the database from the dump file

    mysql blob_db < blob_db_compressed.sql

  3. Set up replication using master info from the dump file. In the mysql shell do

    CHANGE MASTER TO MASTER_LOG_FILE=’mysql-bin.014093′, MASTER_LOG_POS=107, master_host=’source_server’, master_user=’replication_user’, master_password=”;

Now, when “show slave status” is indicating that databases are in sync, we can switch over application server to use the newly migrated compressed database.