Or: How to convert multipage TIFF to PDF in PHP.

Let's say you have a fax with multiple pages that has been stored as a TIFF and you want to convert it to PDF using PHP for digital document flow. In this article I will show you a tiff2pdf function for PHP, because it cannot be done directly with ImageMagick.

Requirements

  • php5 (I use php5-cli for running php from the command line)
  • Imagick (native PHP extension for ImageMagick available through PECL)
  • ps2pdfwr (gs-common)

In Ubuntu, this would translate to:

sudo aptitude update \
 && sudo aptitude install make php5-cli php5-gd php5-dev php-pear gs-common ghostscript \
 && sudo aptitude remove php5-imagick \
 && sudo apt-get install libmagick9-dev \
 && sudo pecl install imagick

Why use PECL to install Imagick and not apt you say? Because currently, Imagick from Ubuntu Gutsy repositories contains a nasty bug.

Function

You can just copy & paste this and check out the example below, or read the comments if you want to understand how it works.

<?php
function tiff2pdf($file_tif, $file_pdf){
    // Initialize
    $errors     = array();
    $cmd_ps2pdf = "/usr/bin/ps2pdfwr";
    $file_tif   = escapeshellarg($file_tif);
    $file_pdf   = escapeshellarg($file_pdf);

    // Initial Error handling
    if (!file_exists($file_tif)) $errors[] = "Original TIFF file:".$file_tif." does not exist";
    if (!file_exists($cmd_ps2pdf)) $errors[] = "Ghostscript PostScript to PDF converter not found at: ".$cmd_ps2pdf;
    if (!extension_loaded("imagick")) $errors[] = "Imagick extension not installed or not loaded";
    // to include the imagick extension dynamically use an optional:

    dl('imagick.so');
    // Only continue if there aren't any errors
    if (!count($errors)) {
        // Determine the file base
        $base = $file_pdf;
        if(($ext = strrchr($file_pdf, '.')) !== false) $base = substr($file_pdf, 0, -strlen($ext));

        // Determine the temporary .ps filepath
        $file_ps = $base.".ps";

        // Open the original .tiff
        $document = new Imagick($file_tif);

        // Use Imagick to write multiple pages to 1 .ps file
        if (!$document->writeImages($file_ps, true)) {
            $errors[] = "Unable to use Imagick to write multiple pages to 1  .ps file: ".$file_ps;
        } else {
            $document->clear();
            // Use ghostscript to convert .ps -> .pdf
            exec($cmd_ps2pdf." -sPAPERSIZE=a4 ".$file_ps." ".$file_pdf, $o, $r);

            if ($r) {
                $errors[] = "Unable to use ghostscript to convert .ps(".$file_ps.") -> .pdf(".$file_pdf."). Check rights. ";
            }
        }
    }

    // return array with errors, or true with success.
    if (!count($errors)) {
        return true;
    } else {
        return $errors;
    }
}
?>

Example

This is how you could call the function

<?php
// converts /dir/fax.tiff to /dir/fax.pdf
if (($return = tiff2pdf("/dir/fax.tif", "/dir/fax.pdf")) !== true) {
    // error
    echo "Error:\n";
    print_r($return);
} else {
    // success
    echo "success!\n";
}
?>

Read on for More Background Info

People are usually rushing for a quick solution so that's why I split up my article and will put all the background information here. So for the curious:

Approach

Every time I've directly tried to convert any format to PDF with only ImageMagick, it has brought me nothing more than distorted files.

There's little documentation about doing this in PHP but the key in my approach is in using 2 steps.

  • tiff2ps: Convert TIFF to PostScript using Imagick
  • ps2pdf: Convert PostScript to PDF using Ghostscript

Imagick (The tiff2ps Step)

Imagick is a native PHP extension to create and modify images using the ImageMagick API. It's twice as fast as making system calls to ImageMagick commands and in this case I am using Imagick to create the in-between .ps (PostScript) file.

About Imagick's Syntax change

Imagick recently changed quite a bit. I was used to simply call:

<?php
$image = imagick_readimage("/dir/file1");
imagick_writeimage($image, "/dir/file2");
?>

But nowadays, Imagick has become object oriented and the correct syntax is:

<?php
$image = new Imagick("/dir/file1");
$image->writeImage("/dir/file2");
?>

Though I greatly approve of this change as it offers great flexibility:

<?php
// Make a thumbnail of all JPG files in a directory
$images = new Imagick(glob('images/*.jpg'));
foreach($images as $image) {
    // Providing 0 forces thumbnailImage to maintain aspect ratio
    $image->thumbnailImage(1024,0);
}
$images->writeImages();

// from: https://nl3.php.net/imagick
?>

.. it does force you to recode your existing scripts.

About Handling Multipage Documents

As you probably know, one TIFF is capable of having multiple pages. Load a multipage TIFF and Imagick stores every page separately. So the code above that used for thumbnailing multiple files with in one dir with glob, could just as well be used for looping over the pages in our document like so:

<?php
// Saving every page of a TIFF separately as a JPG thumbnail
$images = new Imagick("/dir/file1.tif");
foreach($images as $i=>$image) {
    // Providing 0 forces thumbnailImage to maintain aspect ratio
    $image->thumbnailImage(1024,0);
    $image->writeImage("/dir/file1_page".$i.".jpg");
}

$images->clear();
?>

This also explains a problem I ran into. When I tried to store the TIFF to PS, I first just basically used:

<?php
$image = new Imagick("/dir/file1.tif");
$image->writeImage("/dir/file2.ps");
?>

The above resulted in only the first page of my TIFF being saved in the PostScript file.

The problem was solved by simply using writeImages like this:

<?php
$image = new Imagick("/dir/file1.tif");
$image->writeImages("/dir/file2.ps");
?>

One letter can make a big difference.

Ghostscript (The ps2pdf Step)

Ghostscript is a suite of software based on an interpreter for Adobe Systems' PostScript and Portable Document Format (PDF) page description languages.

Unfortunately to convert the .ps to .pdf we still have to make one system call to ps2pdfwr, which is a Ghostscript command included in the gs-common package.