Session held at LDC Toronto May 2012.
Presented by Tim Taplin, AccessTelluride
MySQL is the database server of choice for many website and web application developers. Whether we came from the realm of Filemaker development, or from other languages and open source solutions which leverage MySQL in a variety of ways, MySQL is simple to deploy, comes pre-installed on many platforms and out of the box has excellent performance.
There are a number of simple, free or low cost tools that allow us to build the database structure that we need for our solutions. Frameworks create database communication abstractions that help us to be more isolated from that schema and control the query construction.
As developers, we can start with a simple idea and continue to add more functionality, build out the data base structure, add more queries and hopefully our success will lead to larger audiences and heavier traffic for our sites and applications.
When the server begins to overload and things start bogging down or crashing, we start looking at new hardware (Expensive), web server configuration (Tedious), languages or frameworks (Expensive labor and Tedious) and database configuration (Tedious) to get us thru the pinch.
Based on the experiences I’ve had over the last few years working with clients in the Lasso community and elsewhere to perform those optimizations, I’ve found that going over a few basic modifications in MySQL can yield impressive gains in performance and concurrency. I’m going to go over three areas of interest that will be useful to anyone who would like their site/application to be faster and handle more users.
I’m going to approach them as remedial modifications but once you’ve started to think about them, they will inform your decisions even in early phases of development.
We’ll look over the basic MySQL configuration settings so that you have a basic starting point as well as some thoughts on how to maintain them as your solution grows and changes. We’ll review schema and query usage to glean performance gains. We’ll look into monitoring performance directly both via management tools and via log files.
Hopefully a better understanding of our database server of choice will allow us to better maintain them as well as write better, more efficient code.
MySQL out of the box has basic configuration settings that are significantly outdated. We should review a few rules of thumb in beginning to tweak the settings. We need to know whether the database server is dedicated to mysql or if it is part of a complete stack of webserver, middleware and database server. We can almost certainly assume that we’ll increase most memory settings. Lets target a few specific settings.
The key buffer holds the indexes of tables in memory and a bigger key buffer results in faster row lookups. Bigger is better, but prevent swapping at all costs. A good rule of thumb seems to be to use 1/4 of available system memory.
Temp tables are used for sorting and grouping. Since this is created on the fly, don’t set it too high, but know that if this buffer cannot contain the data, a temp file will be created on disk with related performance drag.
To determine what you should set 'thread_cache_size' to, pay close attention to the 'threads_created' status variable. If it keeps going up it means your 'thread_cache_size' is set too low. Just keep bumping up 'thread_cache_size' until 'threads_created' no longer increments. This could be anywhere from 16 to 500 depending on the type of load your system handles. The thread cache should not be higher than the max_connections value.
Number of tables that a thread can keep open at a time. Ideally, this should be able to handle all tables your server hosts, but typically doesnt need to be higher than 512.
Defaults to being turned off, yet can have a significant impact on both basic performance and overall concurrency.
Turn on and allow for as much cache as your applications will use given the limitations of your system.
A valuable tool in monitoring the results of your changes is MySQL Workbench, which can not only show current statistics as well as graphing real time performance.
Another area that can bring great performance results is indexing.
Properly utilized indexing can have a significant positive impact on the performance of your queries. However, we should consider what and how we index so that our queries can take advantage of the gains.
One thing to keep in mind is that mysql chooses which indexes to use for each query and we can find out how a query will be executed using the explain statement. With any select statement you can find the execution plan by prepending Explain to the beginning of the statement and the result set will describe the execution plan.
For each table utilized in the query, there will be an entry in the result set. Each line will have the select_type, table, type, possible keys, key used, key length, type of reference, number of rows inspected and additional information.
The most useful pieces of this data are the possible keys, key used and number of rows inspected. Possible keys will tell you what keys were available in that table and key used will tell which was selected. Number of rows inspected tells us how many rows had to be inspected in order to build this result set.
An important feature of indexing in mysql is the use of multi-column indexes. This means that we use the values of two or more columns to build a unique key. The key can then be used to limit the number of rows that actually have to be inspected directly.
Multiple column indexes can be an extremely powerful tool, but because they bring more overhead on the insert side, we dont want to just make them up without testing that they can be used.
Our friend, the explain statement will tell us what was used, how efficient the change was and whether temporary tables or sort files had to be built to create our result set.
Lets examine a sample statement.
SELECT SQL_CALC_FOUND_ROWS * FROM latitude_db.class_start WHERE (`category` LIKE '36%' AND `form_done` LIKE 'yes%' AND (`status` LIKE 'Charge Card Approved%' OR `status` LIKE 'Free Ad Approved%' OR `status` LIKE 'Live on Website%') AND `created` >= '2009-02-19' ) ORDER BY `date_display` DESC LIMIT 0,100
The class_start table has no indexes other than the primary and this query takes 25ms on average. The query is built by Lasso 8.6 standard inline behavior.
Adding an index on the category column brings substantial improvement, as we can see in the explain statement, before we were scanning all 5409 rows, but now the index allows for mysql to scan just 419 rows before returning 65 rows of results in 7.5 to 10ms.
Lets make it a multiple column index to see if we get further improvements. Create an index on category and form_done which now brings the rows scanned down to 389 with not a huge potential for performance gains, we get a little tighter average closer to the 7-8ms range.
This query has potential to be improved itself, as well as potential for improvements to the schema of the table, but we got a better than 50% increase in performance thru indexing.
So, how can we find the low hanging fruit of opportunities to gain performance? Our next friend is the slow query log and the log_queries_not_using_indexes setting. If we turn on this feature and then monitor the slow query log, we can spot the queries which are not using indexes and determine which have the greatest potential for gains.
Normally, I’m looking for queries with large numbers of rows to scan which are not using indexes.
In one example, here is a query that scans almost 10k rows repeatedly returning 1-10 total rows as a resultset. At the beginning, it takes 50-70 ms to execute even though simple single column indexes are in place and one of them is being used.
SELECT * FROM `refers` AS r WHERE 1 AND `r`.`ref_po_id` = "269" AND `r`.`ref_vh_id` = "436";
Single column index on ref_po_id is used but has low cardinality (16) for the amount of data to be examined.
Indexing the two columns in the where statement yields significant improvement, with results in 1-5ms
The server can now handle 10x more of these requests. This particular query may be called a dozen times or more on a single summary calculation page, yielding a significant improvement in response time.
The following query took 708 ms to search 14k rows of data
select SQL_CALC_FOUND_ROWS listing_id, primaryphoto, display_name, publicdescription, listingoffice, listprice, streetnumber, streetname, streetsuffix, subdivisionname, city, stateorprovince, bedstotal, bathstotal, propertytype from flex_listings where mlsstatus = "active" and listingoffice like "%Nevasca%" order by listprice desc limit 0, 20
After we added an index on mlsstatus the time reduced to 25-35ms
Upgrading to a multi-column index on mlsstatus and listingoffice reduces that to 12-20ms
Even queries like “select distinct city from flex_listings” can use an index. Indexing the city column reduced the number of rows to scan from over 1500 to 233 in the 14k row db with related performance increases.
To summarize this discussion, MySQL can be a powerful tool but also can be a significant blind spot in our development if we don’t pay attention to it. With proper care and feeding, it can be both precise and surprisingly quick. The tips and pointers that I’ve made today should be only the beginning of the journey that will allow you to better manage your application’s data.
Author: Tim Taplin
Created: 26 May 2012
Last Modified: 15 Jun 2012
Please note that periodically LassoSoft will go through the notes and may incorporate information from them into the documentation. Any submission here gives LassoSoft a non-exclusive license and will be made available in various formats to the Lasso community.