Published in Programming and Scripts on Wednesday, August 8th, 2007
I see the topic of MySQL optimization come up frequently in discussions. Understandably so, given that it doesn't take that much data for a poorly optimized query to become the bottleneck in an otherwise well thought out process.
Well I don't consider myself an expert, I do have experience with working with larger datasets and there are a couple of things that I always do to keep queries performing well.
Explain is your friend, get to know it well. If you take the time to read thru the
Explain documentation on the MySQL site, you will find some valuable information, some of which is hilighted below.
MySQL resolves all joins using a single-sweep multi-join method. This means that MySQL reads a row from the first table, and then finds a matching row in the second table, the third table, and so on. When all tables are processed, MySQL outputs the selected columns and backtracks through the table list until a table is found for which there are more matching rows. The next row is read from this table and the process continues with the next table.
Why is this important? Imagine a main table - tableA - with 80,000 rows of data. This table has a corresponding n:n table that maps entries in tableA with a locations table. A query could be written as:
SELECT tableA.*, locations.location from tableA Left Join tableA2locations on tableA2locations.tableA_id = tableA.id Left Join locations on tableA2locations.location_id = locations.id where locations.location = 'sometown'
Keeping the above quote in mind, MySQL will read a row from the first table and join the corresponding data from the joined tables for that row and then sweep thru the rest of the data, joining as it goes along.
This leads us into the following section.
You can get a good indication of how good a join is by taking the product of the values in the rows column of the EXPLAIN output. This should tell you roughly how many rows MySQL must examine to execute the query.
From the above, you can determine that for a query on tables that have not been properly indexed, a join can quickly become unwieldy when dealing simply with three tables with records in the thousands (1000*1000*1000 = a slow query). See HackMySQL for a good example of this.
So beyond indexing properly for joins, you can still end up with a query that runs in a way that causes a bottleneck.
Taking our example from above, imagine that we use a where clause that limits the tableA selection to half (
tableA.foo = 'bar' below):
SELECT tableA.*, locations.location from tableA Left Join tableA2locations on tableA2locations.tableA_id = tableA.id Left Join locations on tableA2locations.location_id = locations.id where locations.location = 'sometown' and tableA.foo = 'bar'
This starts us out with 40,000 rows of tableA data to examine. If there are a further 2000 rows from tableA2locations, thats 800,000 rows of data. Not astronomical, but significant. If this was a 3 or 4 table join, things could get ugly. What to do? The answer may be obvious to some: select first with the most limiting table:
SELECT tableA.*, locations.location from locations Left Join tableA2locations on tableA2locations.location_id = locations.id Left Join tableA on tableA2locations.tableA_id = tableA.id where locations.location = 'sometown' and tableA.foo = 'bar'
This starts us out with 1 selection from the locations table, then 2000 from tableA2locations. If the join between tableA2locations and tableA is indexed correctly, we are then left with an index join based on ID, rather then having to initially select 40,000 rows from tableA as in the previous example.
When I first started programming, it made sense to me to select from the main table (tableA) and join the lookups. But once you add some data to the mix and start to play with
Explain, you quickly realize that selecting from the limiting table can make your server's life a little easier.
For further reading on the topic, I always send people to HackMySQL when they ask, so for more tips and tricks, be sure to have a read thru the optimize section of that site.
I started freelancing by diving in head first and getting on with it. Many years and a lot of experience later I was still able to take away some gems from this book, and there are plenty I wish I had thought of beforehand. If you are new to freelancing and have a lot of questions (or maybe don't know what questions to ask!) do yourself a favor and at least check out the sample chapters.
Like the other books listed here, this provides a great reference for the PHP developer looking to have the right answers from the right people at their fingertips. I tend to pull this off the shelf when I need to delve into new territory and usually find a workable solution to keep development moving. This only needs to happen once and you recoup the price of the book in time saved from having to develop the solution or find the right pattern for getting the job done..