The Docvert Server

The Docvert service is designed to be run on a separate server. Not your normal web server.

Because the docvert process has a number of requirements not usually found (or advisable) on a dedicated web server, this architecture makes sense. However, on a development box, or a server you have root on and don't mind doing very strange things to, it's also possible to run the docvert service on the same machine as your Drupal and Apache. Don't do that though.

Ideally, you should install Docvert somewhere your web server can talk to it. The following instructions are for the docvert server, NOT your web host. Your docvert server will need a hostname (or IP) so that the Drupal site can find it.

There are (2011-07) two very different versions of Docvert, Either should work. BOTH require the (Libre/Open)Office Back-end on the target server, one is PHP/Apache based, The other requires Python, and can run without Apache.

Docvert version 4. PHP-based Apache Web Service.

On Debian,

Install the packages

sudo apt-get install libreoffice docvert

Note that in recent versions, you need to list libreoffice explicitly, the dpkg of docvert doesn't always include everything it really needs on a new server.

This should set you up with all the required libraries, including the full OpenOffice/LibreOffice suite. (This is the extra stuff you don't want on a real webserver)

You should see the docvert docs (Installed in /usr/share/docvert/docs) for troubleshooting and more instructions, but the short next step is :

Enable the service

Enable docvert by uncommenting the alias in /etc/apache2/conf.d/docvert

sudo sed -i s/#Alias/Alias/  /etc/apache2/conf.d/docvert

Add permissions to the web server user

The web service needs to be able to write to temp files, and on some servers the www-data was not allowed to write to its own home directory. On Ubuntu Natty server, the following steps fixed that

sudo apache2ctl stop
sudo usermod --home /tmp www-data
sudo apache2ctl start

Check PHP settings

This service works by receiving large binary files from web forms. You probably want to adjust your php.ini settings to increase the upload_max_filesize and post_max_size. It can be an intensive process, so you should also check the max_execution_time is sufficient.

Restart Apache

sudo apache2ctl configtest;
sudo apache2ctl graceful;

Results?

This should now be publishing the service under the path /docvert on your server. This is your docvert server path that Drupal needs to know in the module configs, http://yourdocvertserver/docvert/

Visit that web front end and you should get a result like http://holloway.co.nz/docvert/screenshots.html

Docvert version 5. Python rewrite - stand-alone Web Service

May not work out-of-the-box 2011-07, some features were missing from version 5, including dependency checking and an auto-load script. Instructions here will only get you so far...

Fetch the latest release

First: Version 5 of Docvert does not check dependencies enough. On a new machine, I found I had to explicitly add

sudo apt-get install -y libreoffice python-lxml  python-imaging 

For the release dated March 2011, you can fetch http://holloway.co.nz/docvert/download.html

Place that package somewhere (eg /usr/share/docvert should be fine)

wget http://holloway.co.nz/docvert/docvert-5.tar.gz
tar -xzf docvert-5.tar.gz 
sudo rsync -av holloway-docvert-*/* /usr/share/docvert/

Start the stand-alone web service

cd /usr/share/docvert;
sudo python docvert-web.py &

It should start up its own web server on port 8080 (by default) so the docvert service endpoint (and Web UI) will now be available at http://yourserver:8080/ This is your docvert server path that Drupal needs to know.

A missing step

Docvert expects you to start an internal daemon to manage office application calls. This was managed in Version 4, but in 5.0 you have to do it yourself. The command is something like

sudo /usr/bin/soffice -headless -norestore -nologo -norestore -nofirststartwizard -accept="socket,port=2002;urp;" &

... but the distribution should probably include the old /etc/init.d/docvert-converter script instead.

TODO - add a startup script to make this service always available on your machine, or it will not be there after the next restart.

But the above may not work

Eventually, I did get a working result from Docvert 5, but have been unable to get it ALL going since some OS upgrades. :-()

Troubleshooting

For troubleshooting, it's probably best to look at http://holloway.co.nz/docvert/ or the docs that come with the package, usually found at /usr/share/docvert/doc

Debug the server directly first, if it's not working through the supplied Web UI, you won't have any luck with the Drupal module.

If the Web UI isn't working, next try running it from the commandline directly on the machine, there is a docvert-cli.py script that is intended to be called directly, or you can call the internal pyodconverter.py script like so:

/usr/share/docvert/core/lib/pyodconverter/pyodconverter.py /usr/share/docvert/doc/sample/sample-document.doc /tmp/sample.zip

Port 8080 no good?

Depending on dozens of reasons, your network may not co-operate with port 8080. You may have to ensure that it's open to the firewall etc, or you may have other things running there. The python script does not (yet) provide an option to change it, so the code must be tweaked. The following patch should do that for you, if you are not already running a webserver on that machine and you want the Python service to just take over port 80 (HTTP)

sudo sed -i s/port=8080/port=80/ /usr/share/docvert/docvert-web.py

You should probably only do this on a dedicated appliance.

Not responding from outside?

By default, the service binds only to 'localhost' at the beginning. That's OK when testing directly on the machine, but if we want to talk to it from the Drupal server, that's not good enough.'

sudo sed -i s/host=\'localhost\'/host=\'0.0.0.0\'/ /usr/share/docvert/docvert-web.py

Will make it listen to any requests, no matter what you call the server. You should probably only do this on a dedicated appliance.

"URL seems to be an unsupported one"

This seems to keep re-appearing on the original Version 4 Try running

/usr/share/docvert/core/lib/pyodconverter/pyodconverter.py /usr/share/docvert/doc/sample/sample-document.doc /tmp/sample.zip

to see if the commandline works at all. Despite the messages, it seems is really caused by permissions, in this case "file:///tmp/docvert-547854/" being an URL that is not writable.

"Failed to connect to OpenOffice.org on port 2002"

Means we need to

  sudo /etc/init.d/docvert-converter start

"ImportError('type core.pipeline_type.convertimages.core.pipeline_type is unknown',)"

Seen on the python server if using one of the sample pipelines or the self-test. Fix, unknown.