If you're simply looking for a way to convert a one-off document, upload it to Google Docs or Scribd and let them take care of the conversion for you. Be mindful of your privacy settings so you don't accidentally share your document with the whole world.
For command-line conversion of documents, here's how to get going on Ubuntu:
OpenOffice provides the core conversion facilities. They do a pretty respectable job at conversion of most document types, including spreadsheets and presentations.
You need to actually run an instance of OpenOffice in order to send it the request to convert the document. Installing the headless version means you can run in on a server without a windowing system, which you need to be able to do if you want to run a massive document conversion farm on EC2.
apt-get install openoffice.org-headless
apt-get install openoffice.org-java-common
apt-get install openoffice.org-writer
One simple way to do this is to use CUPS-PDF.
apt-get install cups-pdf
Use OpenOffice to print the doc to a PDF file using CUPS-PDF. Note that the output path can be found in /etc/cups/cups-pdf.conf (I'm using cups-pdf v 2.5.0)
Here's a script to do the conversion and open the output PDF in evince.
#!/bin/bash
#
# Prints a file to PDF using OpenOffice and
# CUPS-PDF. Opens output file with evince doc
# viewer.
##
CUPS_HOME="$HOME/PDF"
file=`basename $1`
prefix=${file%.[^.]*}
outfile="${CUPS_HOME}/${prefix}.pdf"
echo Printing ${prefix}.pdf
soffice -norestore \
-nofirststartwizard \
-nologo \
-headless \
-pt PDF $1
echo Sleeping 10 seconds
sleep 10
echo Opening $outfile
evince "$outfile"
Note that in the -pt PDF option, "PDF" is the printer device
name of the CUPS-PDF printer device. It may differ on your
platform. Check /etc/cups/printers.
The printing method works okay, but it doesn't detect the orientation of your document, so you may notice presentations are rendering in portrait mode instead of landscape mode in the output PDF.
unoconv is a Python utility that talks to an OpenOffice process via an Uno bridge.
apt-get install unoconv
Because unoconv requires an OpenOffice instance, it's best to have a process running before doing a lot of document conversion. unoconv can start one for you:
unoconv --listener > /dev/null 2>&1 &
Verify the process started:
$ ps -ef | grep soffice
Look for something like this (line breaks inserted for legibility):
soffice.bin -nologo -nodefault
-accept=socket,host=localhost,port=2002;urp;StarOffice.ComponentContext
Now you can bust out a script like this:
#!/bin/bash
#
# Converts a document to a PDF file using
# unoconv and opens it with the evince viewer.
##
file=`basename $1`
dir=`dirname $1`
prefix=${file%.[^.]*}
outfile="${prefix}.pdf"
echo Generating $outfile
unoconv -f pdf $1
echo Opening $outfile
evince "${dir}/$outfile"
This option is much nicer.
Java folks should check out the JODConverter project, which provides similar functionality.
Document thumbnails
Want to grab a thumbnail of the first page of your converted PDF? First install Imagemagick:
apt-cache install imagemagick
Then run:
convert example.pdf[0] -thumbnail 120x120! -gravity center thumbnail.jpg
This will generate a fixed 120x120 thumbnail in JPG format of the first page of your PDF. Note the array-like syntax: example.pdf[0]. This constrains the thumbnail generation to the first page. If omitted, a thumbnail of every page is generated.
Indexing document content
Want to add document search? Run:
unoconv -f txt sample.doc
This generates a text version of the document that you can index with Lucene or the indexer of your choice.
Read documents in the browser
You can generate an HTML version of your document to display on the web:
unoconv -f html sample.doc
For presentations and some other document formats, separate HTML files are generated for each page. A little messy...
You could get fancy like Scribd and render the document using Flash.
apt-get install swftools
Check out pdf2swf. Put in a little work to build navigation controls for your document and you've got yourself a pretty nice viewer.