Being lazy, or how not to expand control files
Paul Williams
tumble-users@lists.brouhaha.com
Wed Dec 17 14:28:02 2003
I have recently been using control files with tumble and wondering how
the language could be expanded to cover some things I would like to do,
such as to producing proper blank pages (rather than copying a blank
TIFF across) and compositing images.
This evening I have taken a step back from the problem and actually
wondered whether it would be simpler to do nothing at all with tumble.
After all, without help from the control file, tumble does one
clearly-defined job, concatenating various images into a PDF, and it
does it very well. The most obvious advantage it has over my previous
scanning toolset is support for JPEGs.
So, how else could I do the things that the control files offers? For
the past two hours I've been looking at the Perl module PDF::API2. It
appears to be very powerful, supporting not only the creation of new
PDFs (which Thomas Merz's PDFlib has done for ages, for free), but also
handling the modifying of existing ones (which PDFlib's free version
doesn't do).
Now I'm quite happy with hybrid solutions, and I can't get enough of
hacking Perl, but starting to use PDF::API2 has been complicated. It has
bagloads of subclasses, but there isn't a clear roadmap to show how they
relate to each other, and the single page beginnings of a tutorial that
exists on the web has one example (and _that_ doesn't work!)
However, two hours on, and I have produced my first scanned document by
modifying a file that tumble generated to replace blank-pages-as-images
with real blank pages, add the Document Info section and even replace a
few scanned pages with pages containing proper graphics and text. In the
process, my 300-page scanned document has dropped from 7.6 MiB to 5.7 MiB.
I think I'll have to write up this experiment as a web page to really
show how it's done, but I'd be happy to send some examples of using
PDF::API2 to anyone who'd like to experiment.
As a final thought, the control file, or perhaps some simplified form of
it, could still be used to provide non-programmers with a way of
specifying Document Info and Outlines, and tumble could generate a Perl
script using PDF::API2 to actually do the work.
--
Paul