Jazzy Coding

The Art of Structured Improvisation. By Carsten Nielsen

Category Archives: Workarounds

When Giants Step On Your Shoulder

Recently I had a job to migrate a MySQL-database of about 20M entries and a guessed size of about 10GB. At this project I stumbled over some issues regarding the speed, memory and disk-space on the servers. This article will hopefully give you some hints about tweaking a (MySQL-) migration process to make it faster and easier.

The database was used by a forum and some of the tables had to be converted and remapped to make them work with the new forum-application. Remapping means in this case for example, that we had to feed three tables of the new database with the composed and manipulated content of five tables of the old database.

One of the biggest quests was to keep the downtime (or read-only-time) as short as possible – so here are some experiences with MySQL in this case.

The migration-process itself had the following dead-simple-structure:

1. Set the production-system to read-only-mode
2. Copy the actual database to the new server
3. Run the migrations for the database

My first idea was to use PDO. I created a simple class and some functions to read the old data, convert it and store it in the new database.
Here is a sketch of the class:

class migrateJob
  public function __construct(array $source, array $target){}

  public function convertToHtml($text){}

  public function migrateUsers(){}
    echo "Importing Users:\n";

    //Setup some default-values for the user in the new db
    $user = array(
      //some default values for the user-dataset

    //Load the user info from two tables of the old database
    $old_data = $this->source->query("SELECT * FROM old_user_table, old_profile_table
                                      WHERE old_user_table.id = old_profile_table.user_id");

    //Prepare the storing statement
    $user_query = $this->target->prepare("REPLACE INTO new_user_table (".implode(",", array_keys($user)).")
                                          VALUES (:".implode(",:", array_keys($user)).");");

    //Iterate through the results and store them
    echo "\nStarting user-migration.\n";
    while($old_user = $old_data->fetch(PDO::FETCH_ASSOC))
      //Create a new user-array from the defaults.
      $new_user = $user;

      //Do the conversions
      $new_user['bio'] = $this->convertToHtml($old_user['profile']);

      //Copy the old values to the new user
      $new_user['id'] = $old_user['id'];
      $new_user['name'] = $user['login'];
      //... some more re-mappings

      //Store the new user
        //maybe nice to have a counter and some status output here.
        $err = $user_query->errorInfo();
        echo "\nError: ".$err[2]."\n";
    echo "\nDone.\n";

With this class the different migrations should run quite easy – I was partly wrong. When running this on the user-tables (about 50k entries) everything went fine. but when I tried to run the same on the forum-posts-table with about 20M entries, the process silently got slower and slower.
Hmmm. I got nagged by a memory leak in my PDO-implementation? I tried different things to get rid of it, but didn’t find any solution for this.

Next try was to create a script which used standard php-mysql-module. Okay, this should be simple. Connect to the database, read the data convert and save it.
I started old-school-style:

$old_db_connection = mysql_connect("localhost", $source_db['username'], $source_db['password'], TRUE);
$result = mysql_query("SELECT id,user_id,user_name,body,title,topic_id FROM posts", $old_db_connection);

$new_db_connection = mysql_connect("localhost", $target_db['username'], $target_db['password'], TRUE);

while($old_data = mysql_fetch_assoc($result))
  //do some conversions to the old _data and store it
  mysql_query("INSERT INTO posts ...", $new_db_connection);

Dammit! Same problem here: The queries got slower and slower.

A closer look at the problem shows that the reading process of the database tries to load the complete select-result-data into the memory and blocks the process sooner or later. An iterating “LIMIT 0,1000” reduces this a bit but then every new SELECT has to search the index to get the next position, which takes more and more time too.

The fix for this is mysql_unbuffered_query(). This fetches the result without buffering the large result-set automatically and saves a considerable amount of memory.

To speed up the query even a bit more we write-lock the table before sending any query, which prevents MySQL to check the integrety of the indexes while I’m processing the tables:

mysql_query('LOCK TABLES posts WRITE',$old_db_connection);

That seems to be everything  we can do for the read-process. Another try shows that the process – even with this leightweight optimization – takes hours. Then let’s check the write-actions too.

The writing process has nearly the same issues as the reading-queries. The used memory is rising query after query and to add new rows requires more and more time when MySQL tries to ensure the integrity of the indexes.

After some testing I found a fix for this – but that’s not that simple. The fastest option to add a large amount of data seems to be to pass a SQL-file with SOURCE directly to the database. This has the positive effect, that I can put several INSERT-queries into one statement – making the insert-process even faster.

INSERT INTO posts (id, user_id, body, title, date) VALUES
(1, 12, "Here the body", "Here the title.", "2011-01-05"),
(2, 14, "Here the next body", "Here the next title.", "2011-04-15"),

I splitted the INSERT-queries into chunks of 25 queries per file, every query inserting a maximum of 2.000 rows = 50.000 inserted rows per file.

I wanted to launch the import-process directly from the command-line so that I’m able to raise the max-allowed-package size so that MySQL don’t tries swap to the disk during the operations.
I stored the commands in a file pointing to the generated files:

SOURCE /path/to/generated-file-00.sql
SOURCE /path/to/generated-file-01.sql

Now we can import our stuff in the script

system("mysql --max_allowed_packet=256M -u 'username' -p'password' 'new_db' < import.sql");

As those are MyISAM we can even speed up the things a bit more by switching off the indexing. This option is only available if you have root access to the data-files of the MySQL-server.

stop mysql
cd /var/lib/mysql/new_db
myisamchk --keys-used=0 -rq posts.MYI
start mysql

After the migration process we have to switch it on again, which can take some time as the indexes are completely rebuild:

stop mysql
cd /var/lib/mysql/new_db
myisamchk -rq posts.MYI --sort_buffer_size=2G
start mysql

As you can see the process brought some special problems up – I think the mixed up parts of setting up this process by tweaking the queries, the database-connection and the server itself is somewhat picky.
I really wonder how this would look like with PostgreSQL! (@Magnus Hagander!)

After this the complete migration looks like this:

1. Lock the source table
2. Select the old data by using mysql_unbuffered_query().
3. Store the converted data as multiple-inserts into sql-files.
4. Create a sql-file that uses SOURCE to import the sql-files.
5. Deactivate the indexing of the MyISAM-tables
6. Pass the this sql-file with a raised –max_allowed_packet=256M to the mysql-client
7. Activate the indexing of the MyISAM tables again.

Good Luck with your next migration!

gnome-shell: reload after freeze

Ok, it happened again.

In a nice productive coding session my gnome-shell unexpectedly freezed. The gnome-session got completely unresponsive… or nearly – I was able to switch to another terminal-session.

Using the top command I was able to see that really the gnome-shell-session was eating memory and cpu-time like hell. But how can I send a gnome-shell-reload-command to another terminal-session? No results after a research on the lazy-web… I had to hard-reset my system reset the gdm-windowmanager – nearly loosing all my unsaved code. (Mainly avoided, lucky as I am with a git commit of the cached files in the coding folder – but that’s another story…)

Hmm – what to do then? I need a method to reset the gnome-shell from another session…

I can’t send any commands to another tty (as far as I know). But hey! The sessions are using the same file system!

As a workaround I wrote a small script that runs under the actual gnome-shell-session, checks the existence of a file every 10 seconds and if this file don’t exist simply reloads the gnome-shell-session.


#!/bin/bash cd ~/.config/gnome-shell-restarter/

while true ;do
if [ -f delete-to-reload-gnome-shell ] ;then
sleep 10;
touch delete-to-reload-gnome-shell;
gnome-shell --replace

As you can see I’ve put the script in the .config folder. I’ve added it to my gnome-autostart by using the “gnome-session-properties” gui-tool.

In case my gnome-shell freezes again, I simply switch to another session with Ctrl-Alt-F1 and delete the “delete-to-reload-gnome-shell” file. Voilá! the gnome-session restarts!

If you have a better solution for this or maybe any hints on gnome-shell, feel free to comment!