Sunday, June 15, 2008

WideFinder II

My entrant to WideFinder II has performed a lot better than expected.

WideFinder Results

Ideally you would not have to write any customised code, so you would end up with the following, config file and groovy report.

Config File

Groovy Report

But for performance reasons, it uses an optimised (non-regex) line parser.

Customised Config

Java Report

With this combination, it ran the 42GB dataset in 13 minutes 26 seconds. Which I am happy enough with the leave it at that. I think I could get it below 10 minutes, because currently the 32 threads seem to be simultaneously either doing IO or processing a chunk of the file.

I don't expect Kolja to ever beat a custom designed low level approach, since Kolja does a lot of extra work, because its a generalised approach.

However, the key advantage of this approach is that once you have written the config file you can do any of the following

- View the file interactively in a much easier format
- Tail the file
- Run your own specific report
- Run an existing report e.g. the frequency report on the url field.
- Run a report with a multithreaded version or across machines via gridgain.

1 comment:

daivyddado said...

Caesars Casino & Hotel - Mapyro
› mapyro 포천 출장안마아산 출장샵 caesars-casino › mapyro › caesars-casino Find the 밀양 출장안마 best 의왕 출장안마 prices for rooms at Caesars Casino & Hotel in Las Vegas. See map and read 2225 부산광역 출장마사지 reviews. Hotel? trivago!