Sunday, June 15, 2008

WideFinder II

My entrant to WideFinder II has performed a lot better than expected.

WideFinder Results

Ideally you would not have to write any customised code, so you would end up with the following, config file and groovy report.

Config File

Groovy Report

But for performance reasons, it uses an optimised (non-regex) line parser.

Customised Config

Java Report

With this combination, it ran the 42GB dataset in 13 minutes 26 seconds. Which I am happy enough with the leave it at that. I think I could get it below 10 minutes, because currently the 32 threads seem to be simultaneously either doing IO or processing a chunk of the file.

I don't expect Kolja to ever beat a custom designed low level approach, since Kolja does a lot of extra work, because its a generalised approach.

However, the key advantage of this approach is that once you have written the config file you can do any of the following

- View the file interactively in a much easier format
- Tail the file
- Run your own specific report
- Run an existing report e.g. the frequency report on the url field.
- Run a report with a multithreaded version or across machines via gridgain.

Kolja Log Tools

Kolja Log Tools

My occasional side-project, Kolja, is getting quite feature complete.

Its a set of tools for viewing log files. Designed to be more developer oriented improved versions of less, cat, tail, awk, sed.

The general approach is to define a config file to support your log file format.

- Line Parser
- Output Format (pretty printing log lines)
- Important Events e.g. Exceptions or 500 errors
- Request Grouping i.e. request id in each log line

An example for HTTP access log files


But this can be customised i.e. your own custom LineParser

Here are some demos

Interactive i.e. Less on steroids

- View file with pretty printing or plain text output
- Search for a regex with highlighting
- Find significant events e.g. exception
- Jump to a specific request

Command Line tools

- Tail a file with pretty printing and pause
- Run scripted reports on your files
- Run existing (customised) reports i.e. frequency reports

Here is the source control, currently the only substitute for the lacking documentation.