MySQL Optimization - A Little Tip

Published in Programming and Scripts on Wednesday, August 8th, 2007

I see the topic of MySQL optimization come up frequently in discussions. Understandably so, given that it doesn't take that much data for a poorly optimized query to become the bottleneck in an otherwise well thought out process.

Some simple suggestions

Well I don't consider myself an expert, I do have experience with working with larger datasets and there are a couple of things that I always do to keep queries performing well.

Optimize Queries with EXPLAIN

Explain is your friend, get to know it well. If you take the time to read thru the Explain documentation on the MySQL site, you will find some valuable information, some of which is hilighted below.

Optimizing joins

Single sweep what?

MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.

Why is this important? Imagine a main table - tableA - with 80,000 rows of data. This table has a corresponding n:n table that maps entries in tableA with a locations table. A query could be written as:

SELECT tableA.*, locations.location from tableA 
Left Join tableA2locations on 
tableA2locations.tableA_id = tableA.id
Left Join locations on 
tableA2locations.location_id = locations.id
where locations.location = 'sometown'

Keeping the above quote in mind, MySQL will read a row from the first table and join the corresponding data from the joined tables for that row and then sweep thru the rest of the data, joining as it goes along.

This leads us into the following section.

Number of rows needed to execute a query

You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query.

From the above, you can determine that for a query on tables that have not been properly indexed, a join can quickly become unwieldy when dealing simply with three tables with records in the thousands (1000*1000*1000 = a slow query). See HackMySQL for a good example of this.

Reducing the number of rows needed to execute a query

So beyond indexing properly for joins, you can still end up with a query that runs in a way that causes a bottleneck.

Taking our example from above, imagine that we use a where clause that limits the tableA selection to half (tableA.foo = 'bar' below):

SELECT tableA.*, locations.location from tableA 
Left Join tableA2locations on 
tableA2locations.tableA_id = tableA.id
Left Join locations on 
tableA2locations.location_id = locations.id
where locations.location = 'sometown' and tableA.foo = 'bar'

This starts us out with 40,000 rows of tableA data to examine. If there are a further 2000 rows from tableA2locations, thats 800,000 rows of data. Not astronomical, but significant. If this was a 3 or 4 table join, things could get ugly. What to do? The answer may be obvious to some: select first with the most limiting table:

SELECT tableA.*, locations.location from locations 
Left Join tableA2locations on 
tableA2locations.location_id = locations.id
Left Join tableA on 
tableA2locations.tableA_id = tableA.id
where locations.location = 'sometown' and tableA.foo = 'bar'

This starts us out with 1 selection from the locations table, then 2000 from tableA2locations. If the join between tableA2locations and tableA is indexed correctly, we are then left with an index join based on ID, rather then having to initially select 40,000 rows from tableA as in the previous example.

When I first started programming, it made sense to me to select from the main table (tableA) and join the lookups. But once you add some data to the mix and start to play with Explain, you quickly realize that selecting from the limiting table can make your server's life a little easier.

For further reading on the topic, I always send people to HackMySQL when they ask, so for more tips and tricks, be sure to have a read thru the optimize section of that site.

Comments and Feedback

Makes sense, but I never thought about it that way.
Thanks!

I'm glad it made sense, Bill. It's a bit abstract without good hard data behind the examples...

For me this is pretty obvious. To get the best performance results you start with a table that needs to scan fewer rows first as possible, to make the joins smarter. Other db like MS SQL server automatically does that.

However I don't do that myself. The reason is that it is not very intuitive to start with a table that will yield the fewer row scan. You generally start with a master table, or any table that makes you think of the relations easier.

I would only go back change the structure of my SQL after I get the right results.

Thanks for the tip - I've also never thought about it that way.

Just using this I reduced a query time of 34.2s to 0.087s (5 joins)

What tells you I had it all wrong first time round ;)

Thanks for letting me know that Barry! Well done, glad it helped :-)

Thanks for the tips. ^_^

I believe every body should learn EXPLAIN and try to optimize every query they make for better and faster queries.

Very nice Tips, Thanks for the tips.\ni have solved my structure issues..

Thanks for sharing really usefull for

Home » Blog » Web Development » Programming and Scripts

Check out the blog categories for older content

The latest from my personal website,
Mike Papageorge.com

SiteUptime Web Site Monitoring Service

Sitepoint's web devlopment books have helped me out on many occasions both for finding a quick solution to a problem but also to level out my knowlegde in weaker areas (JavaScript, I'm looking at you!). I am recommending the following titles from my bookshelf:

The Principles Of Successful Freelancing

I started freelancing by diving in head first and getting on with it. Many years and a lot of experience later I was still able to take away some gems from this book, and there are plenty I wish I had thought of beforehand. If you are new to freelancing and have a lot of questions (or maybe don't know what questions to ask!) do yourself a favor and at least check out the sample chapters.

The Art & Science Of JavaScript

The author line-up for this book says it all. 7 excellent developers show you how to get your JavaScript coding up to speed with 7 chapters of great theory, code and examples. Metaprogramming with JavaScript (chapter 5 from Dan Webb) really helped me iron out some things I was missing about JavaScript. That said each chapter really helped me to develop my JavaScript skills beyond simple Ajax calls and html insertion with libs like JQuery.

The PHP Anthology: 101 Essential Tips, Tricks & Hacks

Like the other books listed here, this provides a great reference for the PHP developer looking to have the right answers from the right people at their fingertips. I tend to pull this off the shelf when I need to delve into new territory and usually find a workable solution to keep development moving. This only needs to happen once and you recoup the price of the book in time saved from having to develop the solution or find the right pattern for getting the job done..