MySQL 5.5.8 in production – First impressions

As I recently noted, we upgraded our Wix.com production databases to the newly released MySQL 5.5.8 and moved it to Solaris 10.

Here is the first experience and notes:

  1. “mysqldump –single-transaction –flush-logs” does not produce consistent dump anymore
    This way I usually set up a new first slave. After I restored first three databases, I tried to set up replication using “–master-data=2” output from the dump file. Starting replication on the newly created slave has immediately failed with error 1062 – “show slave status\G” says: "1062 | Error 'Duplicate entry '...' for key 1' on query. Default database: '...'. Query: 'INSERT INTO ...". This never happened in previous versions. I don’t know yet exactly what happens there. Brief checking of a few tables is showing that all records are in place. Perhaps, several transactions were flushed twice and small portion of the transaction log was replayed on existing data. I carefully checked new mysqldump documentation, there is no change in –single-transaction behavior. I made a post in the MySQL forum. The forum is somewhat sleepy, waiting for replies…
    Reading mysqldump documentation has revealed a few new quite useful features added in mysqldump in version 5.5: 

    • --dump-slave[=value] – This option is very useful for cloning existing slave replicating data from the same master. It causes mysqldump to include “change master” statement exactly how it does –master-data, but the master info is of the dumped slave’s master. If the running MySQL slave cannotbe stopped for faster cloning procedure, using this option can be very handy.
    • –apply-slave-statements – This one makes life easier in conjunction with –dump-slave when resulting dump should be reapplied on existing slave. It says to mysqldump to include STOP SLAVE prior to CHANGE MASTER statement and START SLAVE at end of the resulting dump.
  2. InnoDB Table Compression works even better than expected
    Compression seems to work quite smooth and stable. Using 16K pages for table containing large XML blobs helped to reduce 1TB database to approximately 250G which was expected. But performance gain is pretty surprising. Look at the New Relic chart:


    Note the brown graph. This is behavior of most popular get XML call. We switched app server to the new compressed database at 3:00. Since then fetching blobs from the database is much faster.
  3. Solaris packages still do not have DTrace probes enabled
    This is bad surprise. One of my big expectations after Sun has acquired MySQL is that Solaris support in MySQL will be significantly improved. I reported about lack of DTrace in 5.5.7-RC in the past. I was really hoping that this is just a matter of release candidate build and that will be solved when 5.5 will go to the GA line. After 5.5.8 was deployed on our servers we have found that there is is still empty “dtrace -l | grep mysql”. I made another post about it in the MySQL forum. No replies so far.
    So while “What Is New in MySQL 5.5” article is announcing “Enhanced Solaris Support“, in practice it is not that enhanced. There is also a good news regarding Solaris support. “innodb_flush_method = O_DIRECT” is now working (it was causing MySQL to produce an error on startup in 5.5.7-RC).

The bottom line so far is that MySQL 5.58 is ready for production. This is a big jump for the MySQL community. We can say good bye to the buggy 5.1 line and really enjoy the next generation of this amazing piece of of software.

making PTR records from bind zone files

This will make PTR records out of a bind zone file in one line:

cat domain_name.db | grep -e ^subdomain | tr -s ” ” | tr ‘ ‘ ‘ ‘ | awk -F’ ‘ ‘{print $4″ IN PTR “$1″.domain_name.com”}’| sed ‘s/\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)\.\([0-9]*\)\(.*\)/\4.\3.\2\5/g’ >> db.172

 

Don’t forget to raise SOA serial and reload the zone.

Cloning MySQL slave to another server

Many times we need to clone existing database for various purposes. Sometimes it should be live replicated copy. When there is existing slave that we can shutdown, it is pretty easy. Just follow these instructions.

One of most common mistakes is that we forget to set permissions on the copied MySQL files on the destination server. This is really hard to understand from MySQL logs that server doesn’t start because mysql user isn’t owner of the data directory.

But if every single thing in the above manual is done but you still see these errors in the new slave MySQL log:

101220 16:22:48 [ERROR] Failed to open the relay log ‘./old_server-relay-bin.000292’ (relay_log_pos 1038154170)
101220 16:22:48 [ERROR] Could not find target log during relay log initialization
101220 16:22:48 [ERROR] Failed to initialize the master info structure

and replication doesn’t start, most likely, you are changing my.cnf params WHILE THE NEW SLAVE IS RUNNING. Please note, if you are trying to change my.cnf parameters according to the doc while MySQL is running and then do “/etc/init.d/mysql restart”, it will not work.

Here is quick procedure check:

  1. Change data directory ownership. “chown -R mysql:mysql /var/lib/mysql”
  2. Stop the new slave. Do “slave stop” in the mysql shell and shutdown MySQL with “/etc/init.d/mysql stop”
  3. Edit /etc/my.cnf. The following options should be there:

    server_id = [unique integer]
    relay-log=[old_server_name]-relay-bin # old relay file basename
    relay-log-index=[old_server_name]-relay-bin # old relay file basename

  4. Start MySQL and do “start slave”

Validate that replication is working again with “show slave status”.

This is it, everything should be fixed now.

Using InnoDB table compression in MySQL 5.5

We have a database that stores XML BLOBs. The database is big, it takes about more than 900GB of disk space. Compressed mysqldump in contrast, takes about 60GB. Since MySQL 5.5 went GA last Friday, it is quite obvious, that this database is a good candidate for applying the new MySQL feature of the InnoDB Data Compression. We are playing with MySQL 5.5 since Google released its version of MySQL and it was adopted and published by Sun as 5.4 beta. I was excited by the new ability of compressing table data in a InnoDB table and I tried to enable it on one of our replicas. It was not that successful, and MySQL has segfaulted on importing very simple data chunk into compression enabled table. So, again, we are promised now that 5.5 has reached production quality and I decided to try it once again on our XML BLOB database.

There are a few things that we should know before we just do “alter table blobs ROW_FORMAT=COMPRESSED;” on a database running in MySQL 5.5.8 server. Table compression can be enabled only on tables stored in the new InnoDB file format called Barracuda. MySQL documentation is traditionally messed up and confusing. While this documentation entry is saying that “In MySQL 5.5.5 and higher, the default value is “Barracuda””, it is actually not true and newly installed MySQL 5.5.8 server default file format is the old one called “Antelope”. It is noted in the new manual document. You can see that Barracuda was default in (>= 5.5.0, <= 5.5.6) versions, but (>= 5.5.7) version defaults to the old one, “Antelope” file format. Needless to say that ‘This applies only for tables that have their own tablespace, so for it to have an effect, innodb_file_per_table must be enabled.”

So after we installed the new, 5.5.8 GA on our server, the following lines must be present in the /etc/my.cnf file:

innodb_file_per_table = 1
innodb_file_format = Barracuda

Currently running settings can be validated this way:

mysql> show variables like ‘innodb_file%’;
+———————————+————-+
| Variable_name                          |     Value       |
+———————————+————-+
| innodb_file_format | Barracuda |
| innodb_file_format_check    | ON              |
| innodb_file_format_max       | Barracuda |
| innodb_file_per_table | ON |
+———————————+————+
4 rows in set (0.01 sec)

After  server is started with this config, we are able to create a compressed table with the following statement:

CREATE TABLE blobs (
id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
payload longtext,
dateupdated timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
) ENGINE=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 DEFAULT CHARSET=utf8;

I specified compressed page size of 16K as the database is used for storing XML blobs. And as noted in the MySQL documentation, “Using a 16K compressed page size can reduce storage and I/O costs for BLOBVARCHAR or TEXT columns, because such data often compress well, and might therefore require fewer “overflow” pages, even though the B-tree nodes themselves take as many pages as in the uncompressed form”.

This is it, our blobs table is now compressed. It can be verified by:

mysql> show table status like ‘blobs’\G
*************************** 1. row ***************************
Name: blobs
Engine: InnoDB
Version: 10
Row_format: Compressed
Rows: 4
Avg_row_length: 4096
Data_length: 16384
Max_data_length: 0
Index_length: 0
Data_free: 0
Auto_increment: 5
Create_time: 2010-12-18 17:56:57
Update_time: NULL
Check_time: NULL
Collation: utf8_general_ci
Checksum: NULL
Create_options: row_format=COMPRESSED KEY_BLOCK_SIZE=16
Comment:
1 row in set (0.01 sec)

Converting existing database using mysqldump

Instead of altering existing tables which will not free up already allocated space we can dump existing database with mysqldump, change the create table statements inside the dump file and restore it on another server. After that we can set up replication and swap databases when everything is in sync.

Dumping existing database:

mysqldump –single-transaction –flush-logs –master-data=2 -R blob_db | sed ‘s%ENGINE\=InnoDB%ENGINE\=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 %g’ > blob_db_compressed.sql

In case and we have existing dump file it can be converted this way (we keep it compressed):

gzcat blob_db.sql.gz | sed ‘s%ENGINE\=InnoDB%ENGINE\=InnoDB ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=16 %g’ > blob_db_compressed.sql

On the destination server:

  1. Create database. Run mysql shell and execute:

    mysql> create database blob_db default charset=utf8;
    Query OK, 1 row affected (0.00 sec)

  2. Restore the database from the dump file

    mysql blob_db < blob_db_compressed.sql

  3. Set up replication using master info from the dump file. In the mysql shell do

    CHANGE MASTER TO MASTER_LOG_FILE=’mysql-bin.014093′, MASTER_LOG_POS=107, master_host=’source_server’, master_user=’replication_user’, master_password=”;

Now, when “show slave status” is indicating that databases are in sync, we can switch over application server to use the newly migrated compressed database.

Getting Solaris 10 patches with smpatch, PCA and Oracle Support ID (CSI)

Short story:

Download PCA for system registered with Oracle CSI. This version of PCA (Patch Check Advanced) allows to maintain Solaris OS patches using Oracle Solaris Premier Subscriptions and Oracle CSI account instead of the SunSolve account with Sun Contract.

Long Story:

After quite successful evaluation of running MySQL on Solaris 10 we decided to move all our production database servers to the Oracle Solaris. We purchased Oracle Solaris Premier Subscriptions for Non-Oracle x86 Systems (our servers are Dell PowerEdge R710). I have registered the my subscription with Oracle support and successfully installed Oracle Configuration Manager on the server. Patch analysis and recommendations didn’t work for the server OS. Attempting to download the recommended patch cluster didn’t work either. Sun server, where the actual patch file is located, responded with 403 (Forbidden) and the “You are not entitled to retrieve this content. ” message. Trying to get to the OS patches via SunSolve finally showed the following picture: Sun servers that host all Solaris patches don’t know about Oracle CSI (Customer Support ID) and Oracle support system knows nothing about Sun Contract Number which is required for getting any Solaris patch other than public security patch.

My next thought was getting the system updated with the built-in smpatch utility. I was thinking that if Oracle is now packaging Solaris 10 distribution, everything shipped with the OS should work. Naive me…

The server was installed with minimal install of the Solaris OS. smpatch is not being installed as part of this choice. Installing just SUNWmga package doesn’t not work as it dependent on a bunch of other packages that are not installed during minimal install. To make the story short you should do the following in order to get smpatch functional:

  1. Install smpatch related packages from Solaris 10 DVD:
    • Insert the DVD and find out its device:
      ls -al /dev/sr* | awk ‘{print “/” $11}’
    • Mount it with:
      mount -F hsfs -o ro /dev/<device name> /mnt/dvd where device name is the one you’ve got in the above step (make sure /mnt/dvd exists)
    • execute as root:
      # pkgadd -d /mnt/dvd/Solaris_10/Product/ SUNWupdatemgru SUNWupdatemgrr SUNWccccr SUNWccccrr SUNWccccfg SUNWcctpx SUNWccfw SUNWccsign SUNWbrg SUNWcsmauth SUNWscnprm SUNWscnsom SUNWsensor SUNWscn-base SUNWsam SUNWscnprmr SUNWcacaort SUNWscn-base-r SUNWsamr SUNWbrgr SUNWzoneu SUNWzoner SUNWpool SUNWxcu4 SUNWsensorr SUNWscnsomr SUNWjdmk-base
  2. Follow this excellent article created by Kevin Pendleton about Registering Solaris 10 and updating patches from the command line (CLI)

So now the smpatch does something useful, other than throwing weird Java hundred lines exception stack traces. Frankly, I don’t understand Sun engineers, who decided to develop such a simple and vital system utility in Java with dependences to a bunch of other packages. The most annoying thing is that the program doesn’t tell you that “SUNWjdmk-base package is missing, please install it”. Instead, it throws weird unreadable Java errors. But the most funny thing is that if you are Oracle customer and you don’t have existing Sun contract you cannot do much with smpatch. smpatch also goes for patch files to SunSolve servers which know nothing about the fact that Oracle sales $1K per CPU support subscriptions for the operating system it is running on.

While I was digging the Internet for finding solution to apply patches to my system, I found two very important things.

One of them is PCA (Patch Check Advanced) – outrageous patching utility for Solaris written by Martin Paul. It is extremely smal Perl script that just does the work in the right way. This is exactly what this kind of OS utility should look like. I’m going to manage all my new Solaris systems with PCA. Once you get get it work, you don’t need anymore any heavy Java based patch management software provided by Sun/Oracle.

Another very useful piece of information was Oracle article, explaining Patch download automation for Sun products using wget. In fact, Oracle has migrated all Sun servers hosting Solaris patches to its own servers. Presumably, it is not yet integrated into the Oracle Support portal and Solaris system utilities. I hope Oracle is not going to abandon Solaris 10 in sake of their new Solaris 11 Express release. So if you have valid Oracle CSI, you can manually download and install required Solaris patches using this howto. I did it for the recommended patch cluster (10_x86_Recommended.zip).

And finally, when my system was up to date, the only thing that I was missing is a patching tool for the ongoing system maintenance. I looked into the PCA code, it was quite easy to merge base patch URLs to work with new Oracle locations. As a result I’ve got PCA that is fully functional and works with Oracle CSI credentials.

Download it: PCA for system registered with Oracle CSI

Update (December, 13)

I just received email from Martin Paul saying that he also had prepared an updated version of PCA and it is going to be released any time soon. Please check his site for the update. Oracle has completed migration of Sun servers over the last weekend. I will check My Oracle Support site for patches availability and its integration with Oracle Configuration Manager.

Update (December, 14)

Blastwave PCA package is still not updated with latest PCA version. Download it manually from the PCA Home page. Oracle Configuration Manager is not updated either. Patches should be downloaded manually, although with PCA it is much easier to do.

Moving MySQL from RHEL (CentOS) to Solaris

After quite successful evaluation we have finally decided to move our production MySQL Database servers to Solaris. Why Solaris? Well, there are couple of good reasons for it.

First of all about database part of system architecture design. If your system is designed according to SOA principles you probably have one (or more) relatively small databases for each service app. Let’s assume that small database term describes a database up to 200-300GB of the overall size and up to 50 millions rows in its largest table (no more than 1-2 such tables in a db). This is where MySQL is performing quite well. If your database is much bigger and/or it cannot be segmented because of application nature, then, perhaps, MySQL is not the right choice.

Traffic pattern is also important factor in the system architecture design. In most common, Web oriented application, pattern of accessing data usually doesn’t not much differs from the read/write ratio value of 90/10. In practice, many systems that I saw are dealing with 95% of reads against 5% of write requests. For example, if we are designing simple Users Directory service, we would expect a lot of logins (reads) and modest amount of new user registrations (writes).

Scalability. This is one of most popular arguments of almost every technical discussion about system architecture of an application backed by database engine. The truth is that database scalability issue in most cases exists only for read requests to the database. Read-only database traffic scales out quite good using simple technique of separating read and write queries on different servers using MySQL replication. Using this approach, application is querying MySQL slave(s) when reading data, and all data modification statements go to the master. All the complicity of software design is that application code should distinguish between read and write database connection. And, of course there system overhead of setting up the MySQL replication. Pros and cons of this modus operandi is beyond the scope of the discussion. The only fact that matters is that master database instance has much higher SLA than slave. And one more important requirement for the MySQL server, is that in sake of maintainability of DB servers, the only database(s) that belongs to the application should exist on master and slave(s) instances of MySQL. This limitation comes up as MySQL defines replication per instance and not per database. This requirement restricts putting any additional (even micro-small) on the master server.

Hardware. Normally, we prefer cheap and small servers in our system. Standard server is 1U Dell PowerEdge R410 like machine. We made exception for database hosts and set it up on bigger Dell PowerEdge R710 with 12 cores and 64GB of RAM. We equipped database servers with redundant power supplies and remote access (DRAC) cards. Each database host is running multiple instances of MySQL. Currently, master database is the only piece of system that must not have any downtime and requires immediate attention in case of failure.

Platforms and versions. The system is 100% Linux. We love Debian. For the database server we made exception again, and installed CentOS on it. The reason for this is MySQL. Packages shipped with Debian are MySQL 5.0 wich is as stable as Debian. But 5.0 lacks many features that modern web development demands. Newer, 5.1 packages are not maintained by the Debian team. Respecting well known fact that MySQL 5.1 is relatively buggy piece of software, it has been decided to stick with official binary builds provided by Sun (Oracle now). Today prebuilt  packages are provided for the following Linux Distros: RedHat, Suse Enterprise and Generic Linux, which is basically an archive that should be manually unpacked and placed in right places. We have chosen CentOS as most popular, binary compatible with RedHat system.

MySQL setup layout. We configured our server with CentOS for running multiple instances of MySQL. Each instance has its own data directory, listening on its own TCP port and UNIX socket. In order to achieve this there should be created customized versions of init script (/etc/init.d/mysql.<instance_name>), configuration file (/etc/my.<instance_name>.cnf) and actual data and log directories for each instance.

Having the system running more than a year now, I can confirm that it is working fairly well with the following exceptions:

  • Maintenance is somewhat complicated. Configuration of each instance is overhead.
  • Scripting, tracing and debugging requires custom/parametrized scripts. Even console mysql client requires connection parameters for the right instance. Funny and annoying fact is that command history of the mysql client is shared for all instances. Sometimes it drives me crazy. 🙂
  • No way to isolate system resources. We were needed it one-two times in all the history and I cannot say that this is big problem. But having such a possibility is very nice to have candy.
  • RedHat/CentOS software packages are very very outdated. For example, running simple jobs and scripts written in Python (that we use a lot) often becomes a nightmare as CentOS has Python 2.4. Code developed in Python 2.5 and later has no chance to run there without massive refactoring.

So we looked at Solaris as main platform for our database servers. MySQL packages are built by mysql.com and both Solaris and MySQL belong now to the same company, so Solaris seems to be good choice for database server OS. Especially we wanted to evaluate the features described below.

  1. Using Solaris Containers
    In the past I have been playing with a few virtualization technologies attempting to use it for running MySQL database. VMWare ESXi was complete disaster for me, IO performance is horrible and it simply kills MySQL. OpenVZ performs much better, as it use similar approach of operating system level virtualization and performance penalty is minimal there. But it still locks us down to RedHat/CentOS. Solaris Containers with its excellent management tools allows to run multiple instances of MySQL without notable performance loss. Containers are very flexible in everything related to resource management. Almost every system aspect can be either shared between zones or reserved to each container. MySQL installation and management is greatly simplified. We have one single version of the my.cnf configuration file, init script, ports and directories layout are simply set to defaults.
  2. Using DTrace
    Tracing, monitoring and debugging MySQL is not a trivial task. DTrace is awesome troubleshooting technology. There are DTrace probes support in MySQL. It makes the server problem finding and solving much easier. We just want it. Period.
  3. Using ZFS replication and snapshots for DB backup and HA
    Backing up MySQL is always a big headache. Mysqldump is slow and sometimes is not that reliable. There is no binary backup support out of the box. Innodb.com Hot Backup is dead is renamed to MySQL Enterprise Backup and is being sold now as part of commercial editions of MySQL. Percona engineers have developed XtraBackup that does a great job. But it is still interfering with the MySQL engine and quite slow. ZFS replication and snapshots works on much lower level and is extremely fast. Restore is also almost immediate and reliable.

We are running our new Solaris servers now for more than two weeks in production. Everything goes smoothly until now. We will share our future experience further each time we have something new to tell about our MySQL life on Solaris.