District 9

Posted by Kelly McCauley on Dec 26, 2009

My wife got me District 9 as a gift. Watched it last light.  The best movie that I have seen this year (not that I have seen that many).  Very engaging plot, believable acting, and good special effects.

Rant about Ant

Posted by Kelly McCauley on Dec 18, 2009

Ant, as in the Java-based build tool, is finally getting on my nerves. I'm not upset with how Ant functions or performs. It does its intended job well. My problem is with the xml configuration syntax. After a while, my eyes start to bleed while trying to read xml, any xml or sgml markup and I'm getting tired of writing it. Java apps/libraries are the worst at making me write xml configuration files by hand. Though the trend now seems to be moving away from using xml for configuration, but there is still a long way to go.

Ant's xml configuration syntax is not intended for scripting build behavior. Its intent is to define an execution tree and to define tasks to perform at each execution point. Some scripting-like behavior, such as conditionals, selective execution, and macro definitions, has been bolted on and feels awkward and is sometimes extremely verbose.

Ant takes the stance that if you have complex tasks to perform, you should create a task, which is a Java class that extends the Task class. What that means to me is that I need a build script to build the complex build logic stuff that I need to run in another build script. Its the requirement of having to compile the java code before it can be used that I find cumbersome about this process. For a very long time, this was the only way (at least when talking to a Java developer).

From my experience, script-able build tools provide the flexibility that I need for complex build systems. (see Rake, Module::Build, and GNU Make - remember each have their own quirks, just like ant; use the right tool for the right job, so no bitching please)

Groovy has helped ease some of the pain. Its ant task lets you embed groovy script directly into the build xml configuration file. The groovy task gives you access to the ant properties and ant task. This is great for short snippets of logic, like programmatically setting of properties. The groovy task can also execute external groovy scripts which is nice for reusable chunks of code.

Gradle takes it a step further. Gradle is a Groovy based build tool that in addition to its own internal tasks it also utilizes Ant's API which gives you access to ant tasks and ant properties. As of this writing, version 0.80 is mostly stable, but still has some very rough edges, suffers from the JVM start up overhead as well as the Groovy overhead at each and every invocation. A shell backed by a long running daemon process could help with that. I'm hopeful that development will continue to improve the tool.

I guess that's enough venting for now. I'll continue to use ant since its the defacto standard for Java build tools. It does what it is supposed to do and I'll just have to live wit the xml configuration for a while longer.

P.S. I have used maven, but it lacks the flexibility that I need for some projects. Its just not my cup of tea.

Only for the geeks: Scala is/ain't Java

Posted by Kelly McCauley on Nov 23, 2009

Finally a bit of code that makes me happy:

def makeClosure(bindMe: Int) = (x: Int) => x + bindMe

A closure that binds at the time the closure is created.

Dead blog

Posted by Kelly McCauley on Nov 19, 2009

Dead blog? Quiet for two years would be a YES. Between work and a 3 year old son, my free time is pretty slim. I haven’t programmed in Ruby in two years and I have forgotten a lot. I’ve been coding Java at work and I’m spending a little free time learning Scala. From my first attempts at Scala, I think I like the language pretty well. And I believe it is a better language than Groovy.

Getting a grip on archiving mail threads

Posted by Kelly McCauley on Dec 21, 2007

I’m subscribed to both the Ruby Talk and Ruby on Rails mailing lists. Both are high volume. I typically don’t have enough time to read all that is going on, but I do like to have the emails around so I can search for a specific topic.

I like to keep my high volume mailing lists’ threads archived by month. This means that the topic thread head’s Date header determines where the entire thread is archived, even if the thread children’s Date header is a different month. For a low volume lists, this can be done by hand using any mail client. For high volume lists, doing it by hand is tedious and prone to mistakes. Computers are for this type of task. It is time to work hard at being lazy…

Here’s what I did to tackle this problem. My time was limited, I only had a couple of hours to create something to do the above for my two high volume lists. I had two Maildirs containing the Ruby Talk (~/.maildir/.ruby.talk) and Ruby on Rails (~/.maildir/.ruby.rails) mailing lists. Each contained more than 50,000 emails stored in individual files in the lists’ /cur directory.

So my ~/.maildir is organized like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
.ruby.rails/
.ruby.rails.200603/
.ruby.rails.200604/
.ruby.rails.200605/
.ruby.rails.200606/
.ruby.rails.200607/
.ruby.rails.200608/
.ruby.rails.200609/
.ruby.rails.200610/
.ruby.rails.200611/
.ruby.talk/
.ruby.talk.200603/
.ruby.talk.200604/
.ruby.talk.200605/
.ruby.talk.200606/
.ruby.talk.200607/
.ruby.talk.200608/
.ruby.talk.200609/
.ruby.talk.200610/
.ruby.talk.200611/

The requirements were:

  • Archive threads into the appropriate archive directories (should correctly archive 99.9% of the time).
  • Keep track of thread heads and their associated archive location so subsequent runs catch thread children dated after the previous run.
  • Shouldn’t consume excessive amounts of memory.

Since I intended to be the sole user of this program and the scope of functionality was so narrow, I decided to write a self contained script to flesh out the logic and behavior. This meant that testing by hand was OK for me (if this was developed for someone else, I would not choose this path). Future development iterations, I will break out the functionality into classes and modules along with real test specs.

The next decision I had was to decided how to process email headers. Since TMail is being maintained again, I decided to use it instead of parsing the email headers my self.

The following is the heavily commented script that I created. The most current source can be found at http://svn.drotner.org/repos/unix_tools/trunk/bin/mail_sort.rb

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
#!/usr/bin/env ruby
# Author: Kelly McCauley
# Copyright 2007 Kelly McCauley
# Source: http://svn.drotner.org/repos/unix_tools/trunk/bin/mail_sort.rb
# License: version 0.1 is Public Domain

require 'rubygems'
require 'optparse'        # Parses commandline options
require 'tmail'           # Handles the email parsing
require 'date'            # Ruby's date library
require 'fileutils'       # File and directory manipulation libarary

$VERBOSE = true

@version = '0.1'
@debug = 0
@quiet = false

@days_ago = 30              # Default "Sort and archive mail up to @days_ago".
@src_mail_dir = nil         # Maildir to sort and archive.
@thread_heads = {}          # Maps a thread head's Message-ID to
                            # its associated archive directory.
@thread_head_index = nil    # Location of a saved version of @thread_heads
                            # from a previous run.
@total_orphans = 0          # Count of thread children that have no parents.
@total_emails = 0           # Total emails read.
@total_emails_archived = 0  # Total emails that were moved to an archive
                            # directory.


#
# Methods
#

# Prints out the given msgs and opts to STDERR and then exits
def error_exit(opts, *msgs)
  msgs.each {|m| $stderr.print m}
  $stderr.puts opts
  exit(1)
end


# Loads a saved @thread_heads from a previous run into memory.
def load_thread_head_index(index_file)
  if File.file?(index_file)
    File.open(index_file) do |file|
      file.each_line do |line|
        key, year, sum, mon = line.chomp.split(/\t/)
        @thread_heads[key.to_sym] = [year.to_sym, sum.to_i, mon.to_sym]
      end
    end
  end
end

# Dumps @thread_heads that are less than 365 days ago to a file.
#
# I didn't serialize it to YAML because I didn't want the extra processing
# overhead or memory consumption.  I didn't Marshal it since I wanted the
# saved file to be tied to the particular version of Marshal.
def dump_thread_head_index(index_file)
  File.open(index_file, 'w') do |file|
    @thread_heads.each do |key,value|
      next if value[1] < @th_index_cutoff_sum
      file << "#{key.to_s}\t#{value.map{|x| x.to_s}.join("\t")}\n"
      @th_index_dump_count += 1
    end
  end
end


# Adds the given email to the @thread_heads lookup table.
def add_thread_head(email)
  unless @thread_heads.key?(email['message-id'].id.to_sym)
    $stderr.puts "th subject: '#{email['subject'].to_s}'" if @debug > 2
    @thread_heads[email['message-id'].id.to_sym] = [
      email['date'].date.year.to_s.to_sym,
      (
        email['date'].date.year.to_s +
        sprintf('%02d', email['date'].date.mon) +
        sprintf('%02d', email['date'].date.day)
      ).to_i,
      sprintf('%02d', email['date'].date.mon).to_sym,
    ]
  end
end

# Creates the archive maildir
def create_archive_maildir(root_archive_dir)
  sub_dirs = []
  sub_dirs << File.join(root_archive_dir, 'cur')
  sub_dirs << File.join(root_archive_dir, 'new')
  sub_dirs << File.join(root_archive_dir, 'tmp')
  options = {}
  options[:noop] = true if @debug > 2
  options[:verbose] = true if @debug > 1

  sub_dirs.each do |dir|
    unless File.directory?(dir)
      FileUtils.mkdir_p(dir, options)
    end
  end
  return sub_dirs
end


# Archives the given file to the give archive directory
def archive_email(root_archive_dir, filename)
  archive_dir = create_archive_maildir(root_archive_dir).shift
  options = {}
  options[:noop] = true if @debug > 2
  options[:verbose] = true if @debug > 1

  if @debug > 0
    FileUtils.cp(filename, archive_dir, options)
  else
    FileUtils.mv(filename, archive_dir)
  end

  @total_emails_archived += 1

end


# Archives the thread child email into the appropriate maildir
def archive_thread_child(thread_head, src_mail_dir, filename)
  $stderr.puts "tc #{filename}:  #{@thread_heads[thread_head][1]} <= #{@cutoff_sum}" if @debug > 2
  if (@thread_heads[thread_head][1] <= @cutoff_sum)
    $stderr.puts "tc filename: #{filename}" if @debug > 2
    root_archive_dir = "#{File.expand_path(src_mail_dir)}.#{@thread_heads[thread_head].first.to_s}#{@thread_heads[thread_head].last.to_s}"
    archive_email(root_archive_dir, filename)
  end
end


# Archives the thread head email into the appropriate maildir
def archive_thread_head(email, src_mail_dir, filename)
  # Determine this email's date sum.
  email_sum = (
    email['date'].date.year.to_s +
    sprintf('%02d', email['date'].date.mon) +
    sprintf('%02d', email['date'].date.day)
  ).to_i
  $stderr.puts "th #{filename}:  #{email_sum} <= #{@cutoff_sum}" if @debug > 2

  # Is the email before the cutoff date?
  if email_sum <= @cutoff_sum
    # Yes.
    $stderr.puts "th filename: #{filename}" if @debug > 2
    root_archive_dir = "#{File.expand_path(src_mail_dir)}.#{email['date'].date.year}#{sprintf('%02d', email['date'].date.mon)}"

    # Archive it.
    archive_email(root_archive_dir, filename)
  end
end


#
# Handle the commandline arguments
#

opts = OptionParser.new do |opts|
  opts.banner = "Usage: #{$0} [OPTIONS] MAILDIR"
  opts.separator("")
  opts.separator("OPTIONS")

  opts.on(
    '-D','--days-ago NUMBER',
    'Sort and archive mail up to --days-ago'
  ) do |days|
    @days_ago = days
  end

  opts.on(
    '-i','--thread-head-index FILE',
    'Specify the thread head index file'
  ) do |file|
    @thread_head_idx = file
  end

  opts.on_tail(
    '-q','--quiet',
    'Turns off all output including error output'
  ) do |q|
    @quiet = true
  end

  opts.on_tail(
    '-d','--debug',
    'Turns on debugging output'
  ) do |debug|
    @debug += 1
  end

  # help
  opts.on_tail(
    '-h', '--help', 'Shows this message'
  ) do ||
    error_exit(opts)
  end

  # version
  opts.on_tail(
    '-V', '--version',
    'Shows the version and copyright of db_diff'
  ) do ||
    error_exit(opts, "#{$0} version #{@version}\n")
  end

end

opts.parse!(ARGV)

# Make sure that the source Maildir is given and that the directory exists.
@src_mail_dir = ARGV.shift
error_exit(
  opts,
  "ERROR: failed to specify a MAILDIR\n"
) unless @src_mail_dir

error_exit(
  opts,
  "ERROR: MAILDIR does not exist: #{@src_mail_dir}\n"
) unless File.directory?(@src_mail_dir)


#
# Determine the cut-off dates.  Used in simple numerical comparison of dates.
#

# The cut-off date for determining if thread heads are targeted for archival.
@cutoff = Date.today - @days_ago
@cutoff_sum = (
  @cutoff.year.to_s +
  sprintf('%02d', @cutoff.mon) +
  sprintf('%02d', @cutoff.day)
).to_i

# The cut-off date for storing thread heads in @thread_heads.
thi = Date.today - 365
@th_index_cutoff_sum = (
  thi.year.to_s +
  sprintf('%02d', thi.mon) +
  sprintf('%02d', thi.day)
).to_i
@th_index_dump_count = 0

# Compose the location of the thread head index file
if @thread_head_index.nil?
  @thread_head_index = "#{File.expand_path(@src_mail_dir)}.mail_sort.idx"
end


# Pre-run debugging
if @debug > 0
  $stderr.puts "@debug: '#{@debug}'"
  $stderr.puts "@src_mail_dir: '#{@src_mail_dir}'"
  $stderr.puts "@thread_head_index: '#{@thread_head_index}'"
  $stderr.puts "@days_ago: '#{@days_ago}'"
  $stderr.puts "@cutoff: '#{@cutoff.to_s}'"
  $stderr.puts "@cutoff_sum: '#{@cutoff_sum.to_s}'"
  $stderr.puts "@th_index_cutoff_sum: '#{@th_index_cutoff_sum}'"
end


#
# Do the run.
#

# Load the thread head index if it exists.
load_thread_head_index(@thread_head_index)

# The location of the Maildir's cur directory.
src_mail_dir_cur = File.join(File.expand_path(@src_mail_dir),'cur')

# Iterate through each file in the Maildir's cur directory.
Dir.foreach(src_mail_dir_cur) do |filename|
  # Skip . and ..
  next if filename == '.'
  next if filename == '..'

  filename = File.join(src_mail_dir_cur, filename)

  # Skip any directories.
  next unless File.file?(filename)

  $stderr.puts "filename: #{filename}" if @debug > 2

  # Parse the file into an email.
  email = TMail::Mail.parse(IO.read(filename))

  if email['references'].nil? && email['in-reply-to'].nil?
    # This email is a thread head

    if email['message-id'].id.nil?
      # This email is a malformed email.
      $stderr.puts "No message-id for #{filename}" unless @quiet

    else
      # Add this email as a thread head.
      add_thread_head(email)

      # Archive this email.
      archive_thread_head(email, @src_mail_dir, filename)

    end

  else
    # This email is a thread child
    thread_head = nil

    # Determine the thread's head (Simple case first since it is the most
    # common)
    if !email['references'].nil? && !email['references'].ids.empty?
      # This email as a References header and it is not empty
      thread_head = email['references'].ids.first.to_sym

    elsif !email['in-reply-to'].nil? && !email['in-reply-to'].empty?
      # This email only has a In-Reply-To header which is not empty
      thread_head = email['in-reply-to'].to_s.to_sym

    end

    # Lookup the thread head in @thread_heads.
    if @thread_heads.key?(thread_head)
      # Found it, so archive this email in the thread head's archive directory.
      archive_thread_child(thread_head, @src_mail_dir, filename)

    else
      # Possibly an orphaned child.  See if any of its other references are
      # known thread heads.
      thread_head = nil

      if email['references'].nil? && !email['in-reply-to'].empty?
        # No References header so use the In-Reply-To header.
        ref = email['in-reply-to'].to_s.to_sym
        thread_head = ref if @thread_heads.key?(ref)

      elsif !email['references'].nil? && !email['references'].empty?
        # Use References header.  Iterate through each of the references and
        # use the first that matches as the thread's head.
        email['references'].ids.each do |ref|
          ref = ref.to_s.to_sym
          if @thread_heads.key?(ref)
            thread_head = ref
            break
          end
        end

      end

      # Do we now have the thread's head?
      if thread_head
        # Yes, so archive this email in the thread head's archive directory.
        archive_thread_child(thread_head, @src_mail_dir, filename)

      else
        # No.  We have an orphan.
        $stderr.puts "th orphan" if @debug > 2
        @total_orphans += 1

        # Archive it as a thread head.
        add_thread_head(email)
        archive_thread_head(email, @src_mail_dir, filename)

      end

    end

  end

  @total_emails += 1

end

# The run is done, so save @thread_heads.
dump_thread_head_index(@thread_head_index)

# Post-run debugging.
if @debug > 0
  $stderr.puts "@thread_heads.length: #{@thread_heads.length}"
  $stderr.puts "@total_orphans: #{@total_orphans}"
  $stderr.puts "@total_emails: #{@total_emails}"
  $stderr.puts "@total_emails_archived: #{@total_emails_archived}"
  $stderr.puts "@th_index_dump_count: #{@th_index_dump_count}"
end

Invoking it is as simple as ./mail_sort.rb -h.

Ruby Method of the Day - Array.reject!

Posted by Kelly McCauley on Nov 21, 2007

Signature

array.reject! {|element| block}  #=> array or nil

array.reject {|element| block} does the exact same thing as Array.delete_if except that it returns nil if no changes were made to array.

Examples

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

b = a.clone                 #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b.reject! {true}            #=> []
b                           #=> []

b = a.clone                 #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b.reject! {false}           #=> nil
b                           #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

b = a.clone                 #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b.reject! {|n| n == 3}      #=> [1, 2, 4, 5, 6, 7, 8, 9, 10]
b                           #=> [1, 2, 4, 5, 6, 7, 8, 9, 10]

b = a.clone                 #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
b.reject! {|n| n % 2 == 0}  #=> [1, 3, 5, 7, 9]
b                           #=> [1, 3, 5, 7, 9]

Documentation Reference

Ruby version 1.8.6

www.ruby-doc.org : Array.reject!

Ruby Method of the Day - Holiday Break

Posted by Kelly McCauley on Nov 20, 2007

I’m taking a few weeks off from writing rmotds so I can catch up on some other little pet projects. I’ll start them back up on 2008/01/01.

Ruby Method of the Day - Array.reject

Posted by Kelly McCauley on Nov 20, 2007

Signature

array.reject {|element| block}    #=> new_array

array.reject {|element| block} iterates over array’s elements and returns new_array that contains any element in array where the block returns either nil or false.

Examples

1
2
3
4
5
6
7
8
9
10
11
a = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

a.reject {|n| nil}                        #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a.reject {|n| false}                      #=> [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
a.reject {|n| true}                       #=> []
a.reject {|n| ''}                         #=> []
a.reject {|n| 0}                          #=> []

a.reject {|n| n == 3}                     #=> [1, 2, 4, 5, 6, 7, 8, 9, 10]
a.reject {|n| n % 2 == 0 }                            #=> [1, 3, 5, 7, 9]
a.reject {|n| true if (n % 3 == 0) || (n % 5 == 0) }  #=> [1, 2, 4, 7, 8]

Documentation Reference

Ruby version 1.8.6

www.ruby-doc.org : Array.reject

Ruby Method of the Day - Array.last

Posted by Kelly McCauley on Nov 19, 2007

Signature

array.last            #=> object or nil
array.last(number)    #=> new_array

array.last returns the last element of array or it returns nil if array is empty. array.last(number) returns the last number elements of array or it returns an empty array if array is empty.

Examples

1
2
3
4
5
6
7
8
9
10
a = ["a", "b", "c", "d", "e", "f"]

a.last        #=> "f"
[].last       #=> nil

a.last(0)     #=> []
a.last(1)     #=> ["f"]
a.last(4)     #=> ["c", "d", "e", "f"]
a.last(99)    #=> ["a", "b", "c", "d", "e", "f"]
[].last(10)   #=> []

Documentation Reference

Ruby version 1.8.6

www.ruby-doc.org : Array.last

Ruby Method of the Day - Array.first

Posted by Kelly McCauley on Nov 16, 2007

Signature

array.first           #=> object or nil
array.first(number)   #=> new_array

array.first returns the first element of array or it returns nil if array is empty. array.first(number) returns the first number elements of array or it returns an empty array if array is empty.

Examples

1
2
3
4
5
6
7
8
9
10
a = ["a", "b", "c", "d", "e", "f"]

a.first           #=> "a"
[].first          #=> nil

a.first(0)        #=> []
a.first(1)        #=> ["a"]
a.first(99)       #=> ["a", "b", "c", "d", "e", "f"]
[].first(10)      #=> []

Documentation Reference

Ruby version 1.8.6

www.ruby-doc.org : Array.first

Older posts: 1 2 3 ... 8