[Download] [Usage] [Testing]

Introduction

html2db is a small utility to convert HTML to Docbook SGML/XML. It uses TidyLib for parsing the HTML. For information on Docbook, visit Docbook official site

Demo

The following are some of the documents converted using the tool. Note: I made some manual changes to the LDP documents

Binaries

These binaries are provided for your convenience. Use them at your own risk.

Instructions for compiling

You need tidy sources for compiling this.

Usage

    Usage: ./html2db [option] <html file>
    Options:
        -dbsgml         Docbook SGML output
        -dbxml          Docbook XML output
        -help           Print Help Message
This produces the output in docbook SGML or XML. The docbook public identifiers are hardcoded for now. In future, I will add options.

The output is far from perfect, but it does most of the dirty work. I will prepare a web page explaining the transformations. It will take some time :-) For the impatient, look in the source.

For more information on docbook, visit Docbook Official Site.

Testing

If you want to test the tool, send me a mail informing me about it and follow these instructions.