Being lazy, or how not to expand control files

Paul Williams tumble-users@lists.brouhaha.com
Wed Dec 17 14:28:02 2003


I have recently been using control files with tumble and wondering how 
the language could be expanded to cover some things I would like to do, 
such as to producing proper blank pages (rather than copying a blank 
TIFF across) and compositing images.

This evening I have taken a step back from the problem and actually 
wondered whether it would be simpler to do nothing at all with tumble. 
After all, without help from the control file, tumble does one 
clearly-defined job, concatenating various images into a PDF, and it 
does it very well. The most obvious advantage it has over my previous 
scanning toolset is support for JPEGs.

So, how else could I do the things that the control files offers? For 
the past two hours I've been looking at the Perl module PDF::API2. It 
appears to be very powerful, supporting not only the creation of new 
PDFs (which Thomas Merz's PDFlib has done for ages, for free), but also 
handling the modifying of existing ones (which PDFlib's free version 
doesn't do).

Now I'm quite happy with hybrid solutions, and I can't get enough of 
hacking Perl, but starting to use PDF::API2 has been complicated. It has 
bagloads of subclasses, but there isn't a clear roadmap to show how they 
relate to each other, and the single page beginnings of a tutorial that 
exists on the web has one example (and _that_ doesn't work!)

However, two hours on, and I have produced my first scanned document by 
modifying a file that tumble generated to replace blank-pages-as-images 
with real blank pages, add the Document Info section and even replace a 
few scanned pages with pages containing proper graphics and text. In the 
process, my 300-page scanned document has dropped from 7.6 MiB to 5.7 MiB.

I think I'll have to write up this experiment as a web page to really 
show how it's done, but I'd be happy to send some examples of using 
PDF::API2 to anyone who'd like to experiment.

As a final thought, the control file, or perhaps some simplified form of 
it, could still be used to provide non-programmers with a way of 
specifying Document Info and Outlines, and tumble could generate a Perl 
script using PDF::API2 to actually do the work.

-- 
Paul