Kevin van Zonneveld

On Development and Internet System Engineering

Improve MySQL Insert Performance

| Comments

Sometimes MySQL needs to work hard. I’ve been working on an import script that fires a lot of INSERTs. Normally our database server handles 1,000 inserts / sec. That wasn’t enough. So I went looking for methods to improve the speed of MySQL inserts and was finally able to increase this number to 28,000 inserts per second. Checkout my late night benchmarking adventures.

I’m going to show you the result of 3 approaches that I tried to boost the speed of ‘bulk’ queries:

  • Delayed Insert
  • Transaction
  • Load Data

This article focusses on the InnoDB storage engine.

Delayed Insert

MySQL has an INSERT DELAYED feature. Despite the name this is actually meant to speedup your queries ; ) And from what I understand it does a very good job.

Unfortunately it only works with MyISAM, MEMORY, ARCHIVE, and BLACKHOLE tables.

That rules out my favorite storage engine of the moment: InnoDB.

So where to turn?

Transaction

A Transaction basically combines multiple queries in 1 ‘package’. If 1 query in this package fails: you can ‘cancel’ all the queries within that package also.

So that provides additional integrity to your relational data because if record A could not be deleted but depends on record B which could be deleted, you have a broken dependency in your database and that corruption could have easily been avoided using a Transaction.

Let me show you how easy a transaction really is in basic PHP/SQL terms:

1
2
3
4
<?php
mysql_query("START TRANSACTION");
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>
1
2
3
<?php
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>
1
2
3
<?php
mysql_query("INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')");
?>
1
2
3
<?php
mysql_query("COMMIT"); // Or "ROLLBACK" if you changed your mind
?>

OK moving on : )

Transaction performance - The Theory

I showed you the integrity gain. That’s reason enough to ‘go Transactional’ right now. But as an added bonus, Transactions could also be used for performance gain. How?

  • Normally your database table gets re-indexed after every insert. That’s some heavy lifting for you database.

But when your queries are wrapped inside a Transaction, the table does not get re-indexed until after this entire bulk is processed. Saving a lot of work.

Bulk processing will be the key to performance gain.

Bench results

So far the theory. Now let’s benchmark this. What does it gain us in queries per second (qps) terms:

...0&chs=570x250&chxt=x%2Cy%2Ct%2Cr

As you can see

  • I was not able to put this theory into practice and get good results.
  • There is some overhead in the Transaction which actually causes a performance to drop for bulks with less than 50 queries.

I tried some other forms of transaction (showed in a graph below) but none of them really hit the jackpot.

OK so Transactions are good to protect your data, and in theory can have performance gain, but I was unable to produce that.

Clearly this wasn’t the performance boost I was hoping for.

Moving on.

Load Data - The Mother Load

MySQL has a very powerful way of processing bulks of data called LOAD DATA INFILE. The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed.

Bench results

In the following graph I tried to inserts different sized bulks of inserts using different methods. I recorded & calculated in how much time each query could be executed. I use the total time necessary for the entire operation, and divide that by the number of queries. So what you see is really what you get.

OK enough with these so-called facts ; ) Back the the excitement :D

At 10,000 records I was able to get a performance gain of 2,124.09%

...0&chs=570x250&chxt=x%2Cy%2Ct%2Cr

As you can see

  • Where the Transaction method had maximum throughput of 1,588 inserts per second, Load Data allowed MySQL to process process a staggering 28,108 inserts per second.
  • There is no siginifcant overhead in Load Data. e.g. you can use this with 2 queries per bulk and still have a performance increase of 153%.
  • There is a saturation point around bulks of 10,000 inserts. After this point the queries per second rate (qps) didn’t show an increase anymore.
  • My advice would be to start a new bulk every 1,000 inserts. It’s what I concider to be the sweetspot because it keeps buffers small and you will still benefit from a performance gain of 2027.13%.

The next step will make your buffer 1000% bigger and it will only give you an additional performance gain of 4%.

So if you have a heavy-duty MySQL job that currently takes 1 hour to run, this approach could make it run within 3 minutes! Enjoy the remaining 57 minutes of your hour! :D

Load Data Quirks

Of course there is a price to pay for this performance win. Before the data is loaded, The data-file must be:

This is probably not something you want to be bothered with. So why not create a PHP function that handles these quirks for us?

Wrapping this up in a PHP Function

Let’s save this logic inside a function so we can easily reuse it to our benefit.

We’ll name this function mysqlBulk and use it like this:

  • Collect our queries or data in an array (the bulk).
  • Feed that array along with the table name to the mysqlBulk function
  • Have it return the qps for easy benchmarking. Or false on failure.

Source (still working on this, will be updated regularly):

(mysqlBulk.inc.php) download
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
<?php
/**
 * Executes multiple queries in a 'bulk' to achieve better
 * performance and integrity.
 *
 * @param array  $data   An array of queries. Except for loaddata methods. Those require a 2 dimensional array.
 * @param string $table
 * @param string $method
 * @param array  $options
 *
 * @return float
 */
function mysqlBulk(&$data, $table, $method = 'transaction', $options = array()) {
  // Default options
  if (!isset($options['query_handler'])) {
      $options['query_handler'] = 'mysql_query';
  }
  if (!isset($options['trigger_errors'])) {
      $options['trigger_errors'] = true;
  }
  if (!isset($options['trigger_notices'])) {
      $options['trigger_notices'] = true;
  }
  if (!isset($options['eat_away'])) {
      $options['eat_away'] = false;
  }
  if (!isset($options['in_file'])) {
      // AppArmor may prevent MySQL to read this file.
      // Remember to check /etc/apparmor.d/usr.sbin.mysqld
      $options['in_file'] = '/dev/shm/infile.txt';
  }
  if (!isset($options['link_identifier'])) {
      $options['link_identifier'] = null;
  }

  // Make options local
  extract($options);

  // Validation
  if (!is_array($data)) {
      if ($trigger_errors) {
          trigger_error('First argument "queries" must be an array',
              E_USER_ERROR);
      }
      return false;
  }
  if (empty($table)) {
      if ($trigger_errors) {
          trigger_error('No insert table specified',
              E_USER_ERROR);
      }
      return false;
  }
  if (count($data) > 10000) {
      if ($trigger_notices) {
          trigger_error('It\'s recommended to use <= 10000 queries/bulk',
              E_USER_NOTICE);
      }
  }
  if (empty($data)) {
      return 0;
  }

  if (!function_exists('__exe')) {
      function __exe ($sql, $query_handler, $trigger_errors, $link_identifier = null) {
          if ($link_identifier === null) {
              $x = call_user_func($query_handler, $sql);
          } else {
              $x = call_user_func($query_handler, $sql, $link_identifier);
          }
          if (!$x) {
              if ($trigger_errors) {
                  trigger_error(sprintf(
                      'Query failed. %s [sql: %s]',
                      mysql_error($link_identifier),
                      $sql
                  ), E_USER_ERROR);
                  return false;
              }
          }

          return true;
      }
  }

  if (!function_exists('__sql2array')) {
      function __sql2array($sql, $trigger_errors) {
          if (substr(strtoupper(trim($sql)), 0, 6) !== 'INSERT') {
              if ($trigger_errors) {
                  trigger_error('Magic sql2array conversion '.
                      'only works for inserts',
                      E_USER_ERROR);
              }
              return false;
          }

          $parts   = preg_split("/[,\(\)] ?(?=([^'|^\\\']*['|\\\']" .
                                "[^'|^\\\']*['|\\\'])*[^'|^\\\']" .
                                "*[^'|^\\\']$)/", $sql);
          $process = 'keys';
          $dat     = array();

          foreach ($parts as $k=>$part) {
              $tpart = strtoupper(trim($part));
              if (substr($tpart, 0, 6) === 'INSERT') {
                  continue;
              } else if (substr($tpart, 0, 6) === 'VALUES') {
                  $process = 'values';
                  continue;
              } else if (substr($tpart, 0, 1) === ';') {
                  continue;
              }

              if (!isset($data[$process])) $data[$process] = array();
              $data[$process][] = $part;
          }

          return array_combine($data['keys'], $data['values']);
      }
  }

  // Start timer
  $start = microtime(true);
  $count = count($data);

  // Choose bulk method
  switch ($method) {
      case 'loaddata':
      case 'loaddata_unsafe':
      case 'loadsql_unsafe':
          // Inserts data only
          // Use array instead of queries

          $buf    = '';
          foreach($data as $i=>$row) {
              if ($method === 'loadsql_unsafe') {
                  $row = __sql2array($row, $trigger_errors);
              }
              $buf .= implode(':::,', $row)."^^^\n";
          }

          $fields = implode(', ', array_keys($row));

          if (!@file_put_contents($in_file, $buf)) {
              $trigger_errors && trigger_error('Cant write to buffer file: "'.$in_file.'"', E_USER_ERROR);
              return false;
          }

          if ($method === 'loaddata_unsafe') {
              if (!__exe("SET UNIQUE_CHECKS=0", $query_handler, $trigger_errors, $link_identifier)) return false;
              if (!__exe("set foreign_key_checks=0", $query_handler, $trigger_errors, $link_identifier)) return false;
              // Only works for SUPER users:
              #if (!__exe("set sql_log_bin=0", $query_handler, $trigger_error)) return false;
              if (!__exe("set unique_checks=0", $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          if (!__exe("
             LOAD DATA INFILE '${in_file}'
             INTO TABLE ${table}
             FIELDS TERMINATED BY ':::,'
             LINES TERMINATED BY '^^^\\n'
             (${fields})
         ", $query_handler, $trigger_errors, $link_identifier)) return false;

          break;
      case 'transaction':
      case 'transaction_lock':
      case 'transaction_nokeys':
          // Max 26% gain, but good for data integrity
          if ($method == 'transaction_lock') {
              if (!__exe('SET autocommit = 0', $query_handler, $trigger_errors, $link_identifier)) return false;
              if (!__exe('LOCK TABLES '.$table.' READ', $query_handler, $trigger_errors, $link_identifier)) return false;
          } else if ($method == 'transaction_keys') {
              if (!__exe('ALTER TABLE '.$table.' DISABLE KEYS', $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          if (!__exe('START TRANSACTION', $query_handler, $trigger_errors, $link_identifier)) return false;

          foreach ($data as $query) {
              if (!__exe($query, $query_handler, $trigger_errors, $link_identifier)) {
                  __exe('ROLLBACK', $query_handler, $trigger_errors, $link_identifier);
                  if ($method == 'transaction_lock') {
                      __exe('UNLOCK TABLES '.$table.'', $query_handler, $trigger_errors, $link_identifier);
                  }
                  return false;
              }
          }

          __exe('COMMIT', $query_handler, $trigger_errors, $link_identifier);

          if ($method == 'transaction_lock') {
              if (!__exe('UNLOCK TABLES', $query_handler, $trigger_errors, $link_identifier)) return false;
          } else if ($method == 'transaction_keys') {
              if (!__exe('ALTER TABLE '.$table.' ENABLE KEYS', $query_handler, $trigger_errors, $link_identifier)) return false;
          }
          break;
      case 'none':
          foreach ($data as $query) {
              if (!__exe($query, $query_handler, $trigger_errors, $link_identifier)) return false;
          }

          break;
      case 'delayed':
          // MyISAM, MEMORY, ARCHIVE, and BLACKHOLE tables only!
          if ($trigger_errors) {
              trigger_error('Not yet implemented: "'.$method.'"',
                  E_USER_ERROR);
          }
          break;
      case 'concatenation':
      case 'concat_trans':
          // Unknown bulk method
          if ($trigger_errors) {
              trigger_error('Deprecated bulk method: "'.$method.'"',
                  E_USER_ERROR);
          }
          return false;
          break;
      default:
          // Unknown bulk method
          if ($trigger_errors) {
              trigger_error('Unknown bulk method: "'.$method.'"',
                  E_USER_ERROR);
          }
          return false;
          break;
  }

  // Stop timer
  $duration = microtime(true) - $start;
  $qps      = round ($count / $duration, 2);

  if ($eat_away) {
      $data = array();
  }

  @unlink($options['in_file']);

  // Return queries per second
  return $qps;
}

Using the Function

The mysqlBulk function supports a couple of methods.

Array input with method: loaddata (preferred)

What would really give it wings, is if you can supply the data as an array. That way I won’t have to translate your raw queries to arrays, before I can convert them back to CSV format. Obviously skipping all that conversion saves a lot of time.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<?php
$data   = array();
$data[] = array('level' => 'err', 'msg' => 'foobar!');
$data[] = array('level' => 'err', 'msg' => 'foobar!');
$data[] = array('level' => 'err', 'msg' => 'foobar!');

if (false === ($qps = mysqlBulk($data, 'log', 'loaddata', array(
    'query_handler' => 'mysql_query'
)))) {
    trigger_error('mysqlBulk failed!', E_USER_ERROR);
} else {
    echo 'All went well @ '.$qps. ' queries per second'."n";
}
?>

Most of the time it’s even easier cause you don’t have to write queries.

SQL input with method: loadsql_unsafe

If you can really only deliver raw insert queries, use the loadsql_unsafe method. It’s unsafe because I convert your queries to arrays on the fly. That also makes it 10 times slower (still twice as fast as other methods).

This is what the basic flow could look like:

1
2
3
4
<?php
$queries   = array();
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>
1
2
3
<?php
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>
1
2
3
<?php
$queries[] = "INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!')";
?>
1
2
3
4
5
6
7
8
9
<?php
if (false === ($qps = mysqlBulk($queries, 'log', 'loadsql_unsafe', array(
    'query_handler' => 'mysql_query'
)))) {
    trigger_error('mysqlBulk failed!', E_USER_ERROR);
} else {
    echo 'All went well @ '.$qps. ' queries per second'."n";
}
?>

Safe SQL input with method: transaction

Want to do a Transaction?

1
2
3
<?php
mysqlBulk($queries, 'transaction');
?>

Options

Change the ‘query_handler’ from mysql_query to your actual query function. If you have a DB Class with an execute() method, you will have to encapsulate them inside an array like this:

1
2
3
4
5
6
7
8
<?php
$db = new DBClass();
mysqlBulk($queries, 'log', 'none', array(
    'query_handler' => array($db, 'execute')
);
// Now your $db->execute() function will actually
// be used to make the real MySQL calls
?>

Don’t want mysqlBulk to produce any errors? Use the trigger_errors option.

1
2
3
4
5
<?php
mysqlBulk($queries, 'log', 'transaction', array(
    'trigger_errors' => false
);
?>

Want mysqlBulk to produce notices? Use the trigger_notices option.

1
2
3
4
5
<?php
mysqlBulk($queries, 'log', 'transaction', array(
    'trigger_notices' => true.
);
?>

Have ideas on this? Leave me a comment.

Benchmark Details - What did I use?

Of course solid benching is very hard to do and I already failed once. This is what I used.

Table structure

I created a small table with some indexes & varchars. Here’s the structure dump:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
--
-- Table structure for table `benchmark_data`
--

CREATE TABLE `benchmark_data` (
  `id` int(10) unsigned NOT NULL auto_increment,
  `user_id` smallint(5) unsigned NOT NULL,
  `a` varchar(20) NOT NULL,
  `b` varchar(30) NOT NULL,
  `c` varchar(40) NOT NULL,
  `d` varchar(255) NOT NULL,
  `e` varchar(254) NOT NULL,
  `created` timestamp NOT NULL default CURRENT_TIMESTAMP,
  PRIMARY KEY  (`id`),
  KEY `a` (`a`,`b`),
  KEY `user_id` (`user_id`)
) ENGINE=InnoDB  DEFAULT CHARSET=latin1;

Table data

I filled the table with ~2,846,799 records containing random numbers & strings of variable length. No 1000 records are the same.

Machine

I had the following configuration to benchmark with:

1
2
3
4
5
6
7
Product Name: PowerEdge 1950
Disks: 4x146GB @ 15k rpm in RAID 1+0
Memory Netto Size: 4 GB
CPU Model: Intel(R) Xeon(R) CPU E5335 @ 2.00GHz
Operating System: Ubuntu 8.04 hardy (x86_64)
MySQL: 5.0.51a-3ubuntu5.4
PHP: 5.2.4-2ubuntu5.5

- Provided by True.nl

Finally

This is my first second benchmark so if you have some pointers that could improve my next: I’m listening.

Imported comments

These were imported from my old blog. Please use disqus below for new comments

Wei on 2012-09-04 04:21:03
Nice and useful article!

Any idea on why ‘load data infile’ is so much faster than other approaches?

SAIL on 2012-05-09 07:38:47
Col: A B C D E F G H I

Row1: 1 EAR 0 D 20120508 FR
Row2: 3 7 E 20120509 ES COL 10.05
Row3: 7 DEW S KM XZV
Row4: 8 FU 9 JK 3.8 1000

How can I insert these data(variable column data) in bulk method i.e. using

LOAD DATA INFILE of MySql

Similar provision is available in oracle and sqlserver also.


http://www.orafaq.com/wiki/SQL*Loader_FAQ

http://www.abestweb.com/forums/programming-datafeeds-tools-188/mysql-load-data-infile-question-70276.html


Thanks.
Sail

Mikolaj Misiurewicz on 2012-03-12 11:08:51
Just so people won’t get a wrong idea from this article:

Depending on size of your data and types of indexes you have in your InnoDB tables, transactions can either be useless (from the speed-up point of view) or super useful.

DO NOT assume that this is not a good way to speed your data insertion. Test it first.
It only takes ‘START TRANSACTION’ and ‘COMMIT’ around you inserts, so you can have a benchmark results in a matter of minutes.

For the data I work on for years now, transactions are a superb way of making things work faster.

As mentioned in the article LOAD DATA INFILE is even faster than transactions, but I woudn’t use it in normal database usage, unless you really really have to.
Not only it requires a very dangerous ‘FILE’ privilege, but you also have to completely rewrite your code, you can’t use any database abstraction layer (like PDO), and it’s really easy to save wrong data into a file and spend hours debugging it.

In my work, if I’m tasked with making the queries work faster, when I come to this step of optimization I try transactions first. That usually fixes the problem.
I manage tables in which using one large transaction on all supplied data speeded up the insert from ~50 minutes to less than 5.

As mentioned, that doesn’t work all the time, or sometimes the speed-up is not enough.
If you have no other choice - use LOAD DATA INFILE.

pas on 2012-02-02 20:55:14
Could you give an example of the form of the
& quot;query function& quot; that is required by & quot;loaddata& quot; ?

Thanks!

fireh on 2011-12-18 07:59:06
Hi, nice share, sped up import when I tried in acquia devcloud from ~1hr down to ~10mins.

I’ve made some modifications, specifically the regex part:


$parts = preg_split(& quot;/[,()] ?(?=([^'|^\']*['|\']& quot; .
& quot;[^'|^\']*['|\'])*[^'|^\']& quot; .
& quot;*[^'|^\']$)/& quot;, $sql);
$process = 'keys';
$data = array();

foreach ($parts as $k=& gt;$part) {
$tpart = strtoupper(trim($part));
if (substr($tpart, 0, 6) === 'INSERT') {
continue;
} else if (substr($tpart, 0, 6) === 'VALUES') {
$process = 'values';
continue;
} else if (substr($tpart, 0, 1) === ';') {
continue;
}

if (!isset($data[$process])) $data[$process] = array();
$data[$process][] = $part;
}


Is replaced with:

$parts = array();
mb_ereg_search_init($sql, '-?\d+(?:\.\d+)?(?:e-\d+)?|NULL|'(?:\\'|[^'])*'|;$|VALUES|INSERT INTO `\w+`|`\w+`', 'i');
while ($tmp = mb_ereg_search_regs()) {
$parts[] = $tmp[0];
}

$process = 'keys';
$data = array();

$it = new ArrayIterator($parts);
foreach ($it as $k=& gt;$part) {
$tpart = strtoupper(trim($part));
if (substr($tpart, 0, 6) === 'INSERT') {
continue;
} else if (substr($tpart, 0, 6) === 'VALUES') {
$process = 'values';
continue;
} else if (substr($tpart, 0, 1) === ';') {
continue;
}

if (!isset($data[$process])) $data[$process] = array();
if (is_string($part) & amp;& amp; strlen($part) & amp;& amp; $part[0] === ''') {
$part = substr($part, 1, -1);
}
$data[$process][] = $part;
}


Tested against values like:
– 0.00
– -23
– 23.00e-23
– ‘asdd(ssd,s)sfs’sfsf’

Tried preg_match() but I ran into https://bugs.php.net/bug.php?id=45735 :S

Err, your comment’s security code didn’t show in Chrome v15.0 (linux).

Jim on 2011-11-29 16:58:12
Great article and good comments.

I also had an issue with LOAD DATA INFILE not being available to MySQL. Tried all the suggestions in the comments but none of them worked for me.

What DID work for me was editing (as root) the my.cnf file to turn on that feature (‘set-variable=local-infile=1’ on my system), and restarting MySQL.

Thanks again for the sharing this useful information.

Novadenizen on 2011-04-14 23:37:28
Why haven’t you tried multiple-row inserts?


mysql_query(& quot;INSERT INTO `log` (`level`, `msg`) VALUES ('err', 'foobar!'),('err', 'foobar!'),('err', 'foobar!')& quot;);

Brian on 2011-04-14 19:55:42
Yeah this isn’t a really good idea, depending on your isolation level and the amount of concurrency in your system, you would be better off using different methods as your buffer pools and transaction logs will get rather large quickly.

Your thinking correct but you need to adjust what you are doing. Transactions are for grouping logical changes together, not arbitrary large bulk insert operations.

First I am not sure why you are inserting so much data into your table, based upon what you are telling me, innodb will not be a suitable table for this task especially if you have indexes enabled. B-tress do not scale at the level you are talking about.

Suitable table types would be tukodb (fractal), NDB (hash - in memory), Archive (compressed, primary only), Myisam Fixed length type (non-update,non-delete scenario, continuous insert), or an analytics column store table type like ICE. You should also review the benefits of an alternative insert strategy like partitioning.

I would also highly recommend dropping your keys, unless your query strategy is highly lopsided to certain points in your values.

The final point would be to use the multi insert syntax as well, if you re going to use a sql method as opposed to an infile method. it would have more benefits than said above method

stloyd on 2010-10-13 14:34:41
@ Kevin: Quoting dev.mysql.com/docs:
& quot;If you use LOAD DATA INFILE on an empty MyISAM table, all nonunique indexes are created in a separate batch (as for REPAIR TABLE). Normally, this makes LOAD DATA INFILE much faster when you have many indexes. In some extreme cases, you can create the indexes even faster by turning them off with ALTER TABLE … DISABLE KEYS before loading the file into the table and using ALTER TABLE … ENABLE KEYS to re-create the indexes after loading the file.& quot;

Kevin on 2010-09-08 21:53:26
@ Peter: I may need to doublecheck but it was my understanding that load data infile already turns off indexes.

Peter on 2010-08-30 16:35:35
Hi!

Foreach row in the file the server checks and updates all indexes.

In my case I had a table with 20 million datasets. A LOAD-DATA-INFILE of a file with 2.5mio rows took round about 15 minutes.

It was useful to turn all indexes off before inserting the data:

1. ALTER TABLE `abc` DISABLE KEYS
2. LOAD DATE INFILE …
3. ALTER TABLE `abc` ENABLE KEYS

This reduced the time to 30 seconds for the LOAD-DATA-statement!

Cheers,
Peter

Kevin on 2010-06-10 21:02:44
@ Ryan: no, cheers to you, Ryan! : )

@ gabe: Was not part of the benchmark, no. But I'd doubt It'd be any faster than the CSV option that lies under the hood of this

gabe on 2010-06-04 23:36:46
Did you try a single insert with multiple rows in it.
eg: insert ignore into table (fields) values (….),(….),(…)

or similar?

Ryan on 2010-05-06 00:57:41
Excellent. Just what I was looking for. Cheers, man!

Kevin on 2010-03-27 13:23:21
@ sean: These are results to get really excited about. It's like saying, I've just tried out this new car. It's 24 times as fast as my current one :D
Thanks for sharing!

sean on 2010-03-26 22:59:55
Kevin, thanks for turning me on to LOAD DATA INFILE. I had to insert 2.5M rows! Initially it took almost an entire day! After making some code updates, the time was cut down to an hour! You 'da man!

Kevin on 2010-03-24 10:24:06
@ Joe Li: Thanks for your insightful remarks.

@ Lukas: You're welcome!

Lukas on 2010-03-23 12:30:45
Thanks so much. It helped us to save a time and server performance tremendously.

Joe Li on 2009-07-08 07:20:28
Two reminders:
1. To execute LOAD DATA INFILE queries, the user must have the FILE privilege. It may be a concern as many website hosting provider does not allow to grant this to user.

2. Transaction: LOAD DATA INFILE outperforms at record insertion, but it may be a problem for data integrity. I am not sure if LOCK/UNLOCK TABLES queries are still required for such action. Feel free to share and discuss.

Kevin on 2009-06-26 10:40:21
@ FreudianSlip: Thanks for sharing!

FreudianSlip on 2009-06-23 11:15:53
Just to answer my own question below: The helpful chaps (ramirez in particular) in #mysql on irc.freenode spotted the fact that my mysql client versions were back-level on the centos box. I was running 5.0.27 compared to 5.1.32 on the fedora box.

Still a top article though ;-)

FreudianSlip on 2009-06-23 10:43:49
Brilliant article, works fantastic on my fedora 11 install out of the box, however I get a message from MYSQL on a centos 5 machine: \& quot;PHP Fatal error: Query failed. The used command is not allowed with this MySQL version.\& quot;

I've confirmed that –local-infile is ON (in the show variables output from MySQL). I've changed the /dev/shm to be 777 (as on my fedora system) from 755, confirmed the infile.txt file is being created in there ok. I've even changed the infile location to /tmp and done a mkfifo /tmp/infile.txt; chmod 777 /tmp/infile.txt but I get the same message.

Any clues? I'm all googled out…

Tech Blog on 2009-04-20 15:47:53
Thanks for the info, find this very helpful :)

Kevin on 2009-04-19 20:58:25
@ Waqas: thx :) it took some time, but it will save me more :D

Waqas on 2009-04-14 11:28:19
oh man, that is awesome, nice work.

and in fact this bulk of information will definitely save a hell of time.

Keep up the good work.

regards,
Waqas

Comments