The Docvert service is designed to be run on a separate server. Not your normal web server.
Because the docvert process has a number of requirements not usually found (or advisable) on a dedicated web server, this architecture makes sense. However, on a development box, or a server you have root on and don't mind doing very strange things to, it's also possible to run the docvert service on the same machine as your Drupal and Apache. Don't do that though.
Ideally, you should install Docvert somewhere your web server can talk to it. The following instructions are for the docvert server, NOT your web host. Your docvert server will need a hostname (or IP) so that the Drupal site can find it.
There are (2011-07) two very different versions of Docvert, Either should work. BOTH require the (Libre/Open)Office Back-end on the target server, one is PHP/Apache based, The other requires Python, and can run without Apache.
On Debian,
sudo apt-get install libreoffice docvert
Note that in recent versions, you need to list libreoffice explicitly, the dpkg of docvert doesn't always include everything it really needs on a new server.
This should set you up with all the required libraries, including the full OpenOffice/LibreOffice suite. (This is the extra stuff you don't want on a real webserver)
You should see the docvert docs (Installed in /usr/share/docvert/docs) for troubleshooting and more instructions, but the short next step is :
Enable docvert by uncommenting the alias in /etc/apache2/conf.d/docvert
sudo sed -i s/#Alias/Alias/ /etc/apache2/conf.d/docvert
The web service needs to be able to write to temp files, and on some servers the www-data was not allowed to write to its own home directory. On Ubuntu Natty server, the following steps fixed that
sudo apache2ctl stop sudo usermod --home /tmp www-data sudo apache2ctl start
This service works by receiving large binary files from web forms. You probably want to adjust your php.ini settings to increase the upload_max_filesize and post_max_size. It can be an intensive process, so you should also check the max_execution_time is sufficient.
sudo apache2ctl configtest; sudo apache2ctl graceful;
This should now be publishing the service under the path /docvert on your server. This is your docvert server path that Drupal needs to know in the module configs, http://yourdocvertserver/docvert/
Visit that web front end and you should get a result like http://holloway.co.nz/docvert/screenshots.html
May not work out-of-the-box 2011-07, some features were missing from version 5, including dependency checking and an auto-load script. Instructions here will only get you so far...
First: Version 5 of Docvert does not check dependencies enough. On a new machine, I found I had to explicitly add
sudo apt-get install -y libreoffice python-lxml python-imaging
For the release dated March 2011, you can fetch http://holloway.co.nz/docvert/download.html
Place that package somewhere (eg /usr/share/docvert should be fine)
wget http://holloway.co.nz/docvert/docvert-5.tar.gz tar -xzf docvert-5.tar.gz sudo rsync -av holloway-docvert-*/* /usr/share/docvert/
cd /usr/share/docvert; sudo python docvert-web.py &
It should start up its own web server on port 8080 (by default) so the docvert service endpoint (and Web UI) will now be available at http://yourserver:8080/ This is your docvert server path that Drupal needs to know.
Docvert expects you to start an internal daemon to manage office application calls. This was managed in Version 4, but in 5.0 you have to do it yourself. The command is something like
sudo /usr/bin/soffice -headless -norestore -nologo -norestore -nofirststartwizard -accept="socket,port=2002;urp;" &
... but the distribution should probably include the old /etc/init.d/docvert-converter script instead.
TODO - add a startup script to make this service always available on your machine, or it will not be there after the next restart.
Eventually, I did get a working result from Docvert 5, but have been unable to get it ALL going since some OS upgrades. :-()
For troubleshooting, it's probably best to look at http://holloway.co.nz/docvert/ or the docs that come with the package, usually found at /usr/share/docvert/doc
Debug the server directly first, if it's not working through the supplied Web UI, you won't have any luck with the Drupal module.
If the Web UI isn't working, next try running it from the commandline directly on the machine, there is a docvert-cli.py script that is intended to be called directly, or you can call the internal pyodconverter.py script like so:
/usr/share/docvert/core/lib/pyodconverter/pyodconverter.py /usr/share/docvert/doc/sample/sample-document.doc /tmp/sample.zip
Depending on dozens of reasons, your network may not co-operate with port 8080. You may have to ensure that it's open to the firewall etc, or you may have other things running there. The python script does not (yet) provide an option to change it, so the code must be tweaked. The following patch should do that for you, if you are not already running a webserver on that machine and you want the Python service to just take over port 80 (HTTP)
sudo sed -i s/port=8080/port=80/ /usr/share/docvert/docvert-web.py
You should probably only do this on a dedicated appliance.
By default, the service binds only to 'localhost' at the beginning. That's OK when testing directly on the machine, but if we want to talk to it from the Drupal server, that's not good enough.'
sudo sed -i s/host=\'localhost\'/host=\'0.0.0.0\'/ /usr/share/docvert/docvert-web.py
Will make it listen to any requests, no matter what you call the server. You should probably only do this on a dedicated appliance.
This seems to keep re-appearing on the original Version 4 Try running
/usr/share/docvert/core/lib/pyodconverter/pyodconverter.py /usr/share/docvert/doc/sample/sample-document.doc /tmp/sample.zip
to see if the commandline works at all. Despite the messages, it seems is really caused by permissions, in this case "file:///tmp/docvert-547854/" being an URL that is not writable.
Means we need to
sudo /etc/init.d/docvert-converter start
Seen on the python server if using one of the sample pipelines or the self-test. Fix, unknown.