My entrant to WideFinder II has performed a lot better than expected.
Ideally you would not have to write any customised code, so you would end up with the following, config file and groovy report.
But for performance reasons, it uses an optimised (non-regex) line parser.
With this combination, it ran the 42GB dataset in 13 minutes 26 seconds. Which I am happy enough with the leave it at that. I think I could get it below 10 minutes, because currently the 32 threads seem to be simultaneously either doing IO or processing a chunk of the file.
I don't expect Kolja to ever beat a custom designed low level approach, since Kolja does a lot of extra work, because its a generalised approach.
However, the key advantage of this approach is that once you have written the config file you can do any of the following
- View the file interactively in a much easier format
- Tail the file
- Run your own specific report
- Run an existing report e.g. the frequency report on the url field.
- Run a report with a multithreaded version or across machines via gridgain.
Sunday, June 15, 2008
My entrant to WideFinder II has performed a lot better than expected.
Kolja Log Tools
My occasional side-project, Kolja, is getting quite feature complete.
Its a set of tools for viewing log files. Designed to be more developer oriented improved versions of less, cat, tail, awk, sed.
The general approach is to define a config file to support your log file format.
- Line Parser
- Output Format (pretty printing log lines)
- Important Events e.g. Exceptions or 500 errors
- Request Grouping i.e. request id in each log line
An example for HTTP access log files
But this can be customised i.e. your own custom LineParser
Here are some demos
Interactive i.e. Less on steroids
- View file with pretty printing or plain text output
- Search for a regex with highlighting
- Find significant events e.g. exception
- Jump to a specific request
Command Line tools
- Tail a file with pretty printing and pause
- Run scripted reports on your files
- Run existing (customised) reports i.e. frequency reports
Here is the source control, currently the only substitute for the lacking documentation.
Wednesday, April 18, 2007
I just read James Strachan's Maven Car Analogy in Go Sonatype. My experience echoes his comments.
I'm a big maven fan, and think its enabled us to achieve so much more in a repeatable way, especially the two step release process.
But I still can't say we have had a net saving of effort. We have had various issues with build plugins, lived with surefire issues for 6 months until the recent release, patched around the hopeless maven-xdoclet-plugin. We lived (for a while) with the bugs and quirks and constant refresh,
rebuild cycle of the Maven Eclipse integration M2Eclipse. Eventually
we dumped it for "mvn eclipse:eclipse".
We even built an inhouse plugin for Agitar, which was complete, easy to use and required little configuration, especially compared with the vanilla Ant tasks they provide which were only usable with trivial projects. But then we stopped using Agitar.
However, the benefits are less time related, they are quality and process related. I think Maven 2 has enabled us to refactor at the project and module level, much more easily than with Ant. It's allowed us to split development into various shared framework components that have their own release cycle and can be used by others.
And it has so much potential to improve. I'm hopeful that the release and scm plugins will support branching on release etc. In general the various plugins are constantly becoming more functional.
Back to the analogy, it really feels like Ant doesn't have much more to give, apart from being stable and predictable. The Ant car never breaks down, gets great mileage and everyone knows how to drive it.
The Maven 2 Car makes life dangerous and exciting, but also gives hope that
once they finally get the maven-antigravity-plugin working, the commute to work will be so much better.
at 10:06 pm
Thursday, December 21, 2006
Warne confirms retirement
Am I the only one thinking we should Cryogenically freeze Shane Warne right now?
We can save him for a time when human cloning is accepted, and therefore ensure eons of Australian cricketing success.
But the added benefit is that he still has a couple of series in him. If things turn dire after the upcoming retirement of half the Australian team, we can thaw him out for a few morale boosting series victories.
Sunday, September 24, 2006
Toomas Hendrik Ilves has been elected the next President of Estonia. This is great for Estonia, as Arnold Rüütel was a bumbling fool.
Political games were used in a failed attempt by two parties to keep him in place, even though public opinion and more importantly, the majority of the parliament was against him.
As Rüütel's term will end in October 2006. On 7 June 2006, he ended speculation about his possible candidature, saying that he would be a candidate for re-election.  Rüütel's candidature has raised some concerns, as he turned 78 in May 2006, and has performed some 'slips' (attributed to his age), such as congratulating people on 'Victory Day' while the event was the Independence Day (February 24, 2005) (Estonians have their Victory Day in June). His declarations and speeches are also regarded as hard to comprehend by the public, and Rüütel's appearances are often later 'deciphered' by his adviser Eero Raun.
In late August, the parliament failed to elect a president. The election of Toomas Hendrik Ilves by the parliament was blocked by Rüütel's supporters, who boycotted the vote (Rüütel said that he would only stand for election if the vote was decided by the electoral college, which occurs only if the parliament fails to elect a president) and thus prevented him from obtaining the necessary two-thirds of votes in parliament. Rüütel was supported by People's Union and Centre Party. Throughout the presidential election campaign, Rüütel was been criticised for not having participated in the Riigikogu round and not taking part in debates. The electoral college met to choose a president on September 23.
Sunday, September 17, 2006
Some of the interesting talks.
HAML - Interesting DSL for XHTML instead of RHTML. Best features would be clean output and almost no redundant text in the template. Would definately highlight any uesful markup, and they have good feedback from designers. The presenter swearing like a trooper and starting a second beer while presenting, was impressive.
JRuby - Obviously very important to those stuck in Java land. But seriously, IMHO its going to eventually be the premier runtime for Ruby, and I hope that sun really gets behind it. Spring Remoted services being used in your Rails controllers looks so much more attractive than a big JSF setup.
Kathy Sierra Keynote - Really interesting talk about creating passionate users.
DHH's talk on vendoritis, and the general theme of "I don't owe you shit!", and "F$%^ You!". Got a lot of claps.
There was some dodgy talks but overall the standard was great.
Two weeks ago was Google's London Test Automation Conference.
I have to say it was the highest standard for any conference I have been to. There was only one room, i.e. no selecting from multiple streams. But Allen Hutchison and his team selected some interesting talks. Probably one of the benefits of having the organisers selecting things they are really interested in, rather than competing goals with paid vendor talks and differing talks with little common theme at most conferences.
The questions from the audience were very lively and interesting. One of the consistent things that popped up was the varying ways of achieving the same goals. The presenters of many of the talks had interesting solutions to the problems they faced, while audience members had different solutions to similar problems. One of the best examples was a talk on literate functional testing of Web Applications. The demonstrated solution was a jmock based fluent API Literate, the domain language was clicks and html option lists etc. While others used business domain languages and FIT style testing for similar purposes.