Thursday, 30 December 2010

Postprocessing scanned crayon images

I have some scans of crayon drawings, but the colours are washed out (well, the colours are pretty light on paper too):

First, some due dilligence. Google for solutions. (Actually, first fire up GIMP and try a few ideas. But let's pretend I hit the books before hitting the lab).

Lit review:

  1. google'd: post processing scanned crayon drawings - no help, but kids drawing reenacted was entertaining.

  2. google'd: scanning crayon drawings - better, results in a yahoo answers entry... which suggests GIMP. Also a couple of references warning about crayon wax sticking to scanners (I didn't have this problem).

  3. searched flickr for examples in the hope of finding a discussion in comments. Photos tagged with crayon art was the best of the flickr searches. Several people, including Steve brandon, used a camera, rather than a scanner.

Ok, that's enough searching. Let's play with some balances. Here's screenshots of the settings (from GIMP's colour menu) and the changed versions of the image.


take 1

take 2

I bumped up blue similarly.

take 3

Applied these settings after inverting the image.

take 4

take 5

I expanded this settings box out to make fine control easier.

In the end I went with the first take, as the least abstract of the bunch.

Tuesday, 28 December 2010

data: list of top sites from alexa

Alexa has a free list of the top 1m websites:



A few curiosities:

  • while most entries have are just domains, 10007 have path information:


  • Two of the entries with path info contain commas:


    which causes weirdness when using R's parse.csv() command.

    Script I used to find where the ranks diverged from the indexes (in before I found the CSV had unescaped commas):

    #the data from the CSV file is in scores
    onem = seq(1, 1000002)
    head(onem[scores$rank != onem])

Oh, and here are the two extra rows that parse.csv silently created:

> scores[scores$domain == "",]
rank domain
490728 1
936300 2

Wednesday, 1 December 2010

The Hive Brisbane, 2010-11-30 - interesting people with exciting projects

I went to a The Hive event today yesterday, featuring Richard Slatter from We Are Hunted.

I enjoyed the event. They talk was good but the best bit of the evening was talking to some interesting people about their exciting projects:

  • The speaker, Richard Slatter, about We Are Hunted (music charts based on online chatter) and the advantages of RERO.

  • Alice and Leo from Davinway Marketing, who are applying agile methodologies to marketing (which is an idea that appeals to me).

  • Mike Boyd, part of The Hive's Brisbane team, who's working on Cupstart, a project that will let you "Order your coffee online using Cupstart and collect it as you arrive". Oh, he also has a survey.