Jazzy Coding

The Art of Structured Improvisation. By Carsten Nielsen

git statistics: bash-one-liners

Are you interested in the statistics of your project? Who has done how many commits? On which week of the day or hour of day?
Well, here’s some food for you. Simple one-liners for the shell:

The committers-ranking by number of commits.

$> git shortlog  -ns

The commits of a project by weekday.

$> for i in Mon Tue Wed Thu Fri Sat Sun; do echo $( echo " $i: "; git shortlog  -n --format='%ad %s'| grep "$i " | wc -l); done

The commits of a project by weekday since a defined date. Interesting to see the difference ex. after a project has been released as open-source or the project-team has changed. You can change the option ‘since’ to ‘until’ if you want to count the commits before a defined date.

$> for i in Mon Tue Wed Thu Fri Sat Sun; do echo $( echo " $i: "; git shortlog  -n --format='%ad %s' --since='2011-06-31'| grep "$i " | wc -l); done

The commits of a project by hour of day. (Updated after the hint from Patrik. Thanks!)

$> for i in `seq -w 0 23`; do echo $( echo " $i:"; git shortlog  -n --format='%ad %s' | grep " $i:" | wc -l); done

The commits of a project by hour of day before a defined date. Even here you can change the option ‘until’ to ‘since’ if needed.

$> for i in `seq -w 0 23`; do echo $( echo " $i:"; git shortlog  -n --format='%ad %s' --until='2011-06-31' | grep " $i:" | wc -l); done

Signing git commits with your GPG key

With the release of Git v1.7.9 it’s possible to sign your commits now.
To activate this on your machine, you have to configure git:

1. Get your key-id in the terminal:

$> gpg --list-keys

pub   1024D/123ABC89 2011-09-27
uid                  Carsten Nielsen

Locate your relevant key and copy the ID to git:

$> git config --global user.signingkey 123ABC89

Now you are able to sign your commits by using the option -S

$> git commit -S

If you want to check the origin of a commit you can now show the signing by:

$> git log --show-signature

commit 01b817f3e3bfe1fb1143a172539162bf6bf7aca5
gpg: Signature made 2012-02-09T11:30:57 CET using RSA key ID 123ABC89
gpg: Good signature from "Carsten Nielsen "
Author: Carsten Nielsen 
Date:   Thu Feb 9 11:30:48 2012 +0100

    Lets sign something

With this we should be able to create an infrastructure on the repo- or deploy-server to check and ensure the origin of the commits.
But this is another story for another time…

When Giants Step On Your Shoulder

Recently I had a job to migrate a MySQL-database of about 20M entries and a guessed size of about 10GB. At this project I stumbled over some issues regarding the speed, memory and disk-space on the servers. This article will hopefully give you some hints about tweaking a (MySQL-) migration process to make it faster and easier.

The database was used by a forum and some of the tables had to be converted and remapped to make them work with the new forum-application. Remapping means in this case for example, that we had to feed three tables of the new database with the composed and manipulated content of five tables of the old database.

One of the biggest quests was to keep the downtime (or read-only-time) as short as possible – so here are some experiences with MySQL in this case.

The migration-process itself had the following dead-simple-structure:

1. Set the production-system to read-only-mode
2. Copy the actual database to the new server
3. Run the migrations for the database

My first idea was to use PDO. I created a simple class and some functions to read the old data, convert it and store it in the new database.
Here is a sketch of the class:

class migrateJob
  public function __construct(array $source, array $target){}

  public function convertToHtml($text){}

  public function migrateUsers(){}
    echo "Importing Users:\n";

    //Setup some default-values for the user in the new db
    $user = array(
      //some default values for the user-dataset

    //Load the user info from two tables of the old database
    $old_data = $this->source->query("SELECT * FROM old_user_table, old_profile_table
                                      WHERE old_user_table.id = old_profile_table.user_id");

    //Prepare the storing statement
    $user_query = $this->target->prepare("REPLACE INTO new_user_table (".implode(",", array_keys($user)).")
                                          VALUES (:".implode(",:", array_keys($user)).");");

    //Iterate through the results and store them
    echo "\nStarting user-migration.\n";
    while($old_user = $old_data->fetch(PDO::FETCH_ASSOC))
      //Create a new user-array from the defaults.
      $new_user = $user;

      //Do the conversions
      $new_user['bio'] = $this->convertToHtml($old_user['profile']);

      //Copy the old values to the new user
      $new_user['id'] = $old_user['id'];
      $new_user['name'] = $user['login'];
      //... some more re-mappings

      //Store the new user
        //maybe nice to have a counter and some status output here.
        $err = $user_query->errorInfo();
        echo "\nError: ".$err[2]."\n";
    echo "\nDone.\n";

With this class the different migrations should run quite easy – I was partly wrong. When running this on the user-tables (about 50k entries) everything went fine. but when I tried to run the same on the forum-posts-table with about 20M entries, the process silently got slower and slower.
Hmmm. I got nagged by a memory leak in my PDO-implementation? I tried different things to get rid of it, but didn’t find any solution for this.

Next try was to create a script which used standard php-mysql-module. Okay, this should be simple. Connect to the database, read the data convert and save it.
I started old-school-style:

$old_db_connection = mysql_connect("localhost", $source_db['username'], $source_db['password'], TRUE);
$result = mysql_query("SELECT id,user_id,user_name,body,title,topic_id FROM posts", $old_db_connection);

$new_db_connection = mysql_connect("localhost", $target_db['username'], $target_db['password'], TRUE);

while($old_data = mysql_fetch_assoc($result))
  //do some conversions to the old _data and store it
  mysql_query("INSERT INTO posts ...", $new_db_connection);

Dammit! Same problem here: The queries got slower and slower.

A closer look at the problem shows that the reading process of the database tries to load the complete select-result-data into the memory and blocks the process sooner or later. An iterating “LIMIT 0,1000” reduces this a bit but then every new SELECT has to search the index to get the next position, which takes more and more time too.

The fix for this is mysql_unbuffered_query(). This fetches the result without buffering the large result-set automatically and saves a considerable amount of memory.

To speed up the query even a bit more we write-lock the table before sending any query, which prevents MySQL to check the integrety of the indexes while I’m processing the tables:

mysql_query('LOCK TABLES posts WRITE',$old_db_connection);

That seems to be everything  we can do for the read-process. Another try shows that the process – even with this leightweight optimization – takes hours. Then let’s check the write-actions too.

The writing process has nearly the same issues as the reading-queries. The used memory is rising query after query and to add new rows requires more and more time when MySQL tries to ensure the integrity of the indexes.

After some testing I found a fix for this – but that’s not that simple. The fastest option to add a large amount of data seems to be to pass a SQL-file with SOURCE directly to the database. This has the positive effect, that I can put several INSERT-queries into one statement – making the insert-process even faster.

INSERT INTO posts (id, user_id, body, title, date) VALUES
(1, 12, "Here the body", "Here the title.", "2011-01-05"),
(2, 14, "Here the next body", "Here the next title.", "2011-04-15"),

I splitted the INSERT-queries into chunks of 25 queries per file, every query inserting a maximum of 2.000 rows = 50.000 inserted rows per file.

I wanted to launch the import-process directly from the command-line so that I’m able to raise the max-allowed-package size so that MySQL don’t tries swap to the disk during the operations.
I stored the commands in a file pointing to the generated files:

SOURCE /path/to/generated-file-00.sql
SOURCE /path/to/generated-file-01.sql

Now we can import our stuff in the script

system("mysql --max_allowed_packet=256M -u 'username' -p'password' 'new_db' < import.sql");

As those are MyISAM we can even speed up the things a bit more by switching off the indexing. This option is only available if you have root access to the data-files of the MySQL-server.

stop mysql
cd /var/lib/mysql/new_db
myisamchk --keys-used=0 -rq posts.MYI
start mysql

After the migration process we have to switch it on again, which can take some time as the indexes are completely rebuild:

stop mysql
cd /var/lib/mysql/new_db
myisamchk -rq posts.MYI --sort_buffer_size=2G
start mysql

As you can see the process brought some special problems up – I think the mixed up parts of setting up this process by tweaking the queries, the database-connection and the server itself is somewhat picky.
I really wonder how this would look like with PostgreSQL! (@Magnus Hagander!)

After this the complete migration looks like this:

1. Lock the source table
2. Select the old data by using mysql_unbuffered_query().
3. Store the converted data as multiple-inserts into sql-files.
4. Create a sql-file that uses SOURCE to import the sql-files.
5. Deactivate the indexing of the MyISAM-tables
6. Pass the this sql-file with a raised –max_allowed_packet=256M to the mysql-client
7. Activate the indexing of the MyISAM tables again.

Good Luck with your next migration!

Userstories: google+ circle intersections

As a g+ user
I want to create intersections of circles
So that I can manage my recipients fast and effective

Scenario: Send stream to intersection

Given I am creating a post
And I have selected the circle “developers”
And I have selected the circle “funny”
When I select “intersect”
And I click “Share”
Then the post should be sent to “funny developers” only

What’s this?

This is a userstory with a scenario notated with Gherkin – a Domain-Specific-Language. I’m using it mainly for Behaviour-Driven-Development with Behat and Mink. And as you can see, it’s quite useful to note some ideas quick and clean too.

Testing Eclipse Indigo

I’m primarily developing PHP on a Ubuntu system with the Eclipse IDE. Even if the startup is slow, the performance is not really amazing and the Galileo-version drove me nuts with annoying null-pointer-exceptions – I like Eclipse. Maybe it’s just a spleen like a mechanic who always use the same screwdriver or the painter who’s using the same brush – as long as Eclipse is not breaking down my workflow I’ll continue to use it.

Recently I’ve tried the actual Indigo release and it seems quite stable. No problems with installing plugins, startup speed acceptable. I’ll try Indigo for the next weeks in production – with a working Helios as backup ;-) – and will post my experience after a while.

Every time I’m checking a new Eclipse release I try to reduce the number of plugins and only install additional features if I really need them. So I guess this list will expand over time.

I’m testing Eclipse Indigo with the following plugins:

  • PDT
    The base plugin for PHP coding. I’m using the latest release from http://download.eclipse.org/tools/pdt/updates/3.0/milestones/
  • Marketplace
    Really a nice feature. The marketplace-concept makes it very easy to hit required plugins or check out interesting stuff.
    Open “Help > Install New Software…” and select the standard Indigo repo. Search after “market” and you’ll get it.

Actually now nearly everything can be installed by using “Help > Eclipse Marketplace…”

Screenshot of the Eclipse Marketplace

The Marketplace is tightly integrated into Eclipse

Plugins continued:

  • PathTools
    Simple and effective context-menu to switch to the terminal or the file-explorer into a given folder.
  • Zen Coding
    Nice script to make especially nested-html-coding much easier.

Let’s see which plugins I’ll add – sooner or later.

Do you have any suggestions about usable but not overloaded plugins to use for PHP/Webdevelopment in Eclipse?

gnome-shell: reload after freeze

Ok, it happened again.

In a nice productive coding session my gnome-shell unexpectedly freezed. The gnome-session got completely unresponsive… or nearly – I was able to switch to another terminal-session.

Using the top command I was able to see that really the gnome-shell-session was eating memory and cpu-time like hell. But how can I send a gnome-shell-reload-command to another terminal-session? No results after a research on the lazy-web… I had to hard-reset my system reset the gdm-windowmanager – nearly loosing all my unsaved code. (Mainly avoided, lucky as I am with a git commit of the cached files in the coding folder – but that’s another story…)

Hmm – what to do then? I need a method to reset the gnome-shell from another session…

I can’t send any commands to another tty (as far as I know). But hey! The sessions are using the same file system!

As a workaround I wrote a small script that runs under the actual gnome-shell-session, checks the existence of a file every 10 seconds and if this file don’t exist simply reloads the gnome-shell-session.


#!/bin/bash cd ~/.config/gnome-shell-restarter/

while true ;do
if [ -f delete-to-reload-gnome-shell ] ;then
sleep 10;
touch delete-to-reload-gnome-shell;
gnome-shell --replace

As you can see I’ve put the script in the .config folder. I’ve added it to my gnome-autostart by using the “gnome-session-properties” gui-tool.

In case my gnome-shell freezes again, I simply switch to another session with Ctrl-Alt-F1 and delete the “delete-to-reload-gnome-shell” file. Voilá! the gnome-session restarts!

If you have a better solution for this or maybe any hints on gnome-shell, feel free to comment!