| [Download] [Usage] [Testing] |
Introduction
html2db is a small utility to convert HTML to Docbook SGML/XML. It uses TidyLib for parsing the HTML. For information on Docbook, visit Docbook official siteDemo
The following are some of the documents converted using the tool.- NCURSES Introduction in html converted to Docbook SGML
- I converted some LDP Documents to docbook. They are here.
- VB6 To Tcl Cheat-Sheet converted to docbook on author's request.
Binaries
These binaries are provided for your convenience. Use them at your own risk.- A linux binary compiled with gcc on mandrake 8.2 for i386 architecture.
- Windows binary, coming soon.
Instructions for compiling
You need tidy sources for compiling this.
- Download the latest tidy source from here (5 July 2005 version)
- Unzip it
tar zxvf tidy_src.tgz - Download this patch and apply it
pacth -p0 < newpatch - Compile tidy
cd build/gmake make - Download html2db source from here
- Unzip it
tar zxvf html2db.tar.gz - Edit the Makefile. Change the TIDYDIR to point to tidy source dir
- Compile it
make
Usage
Usage: ./html2db [option] <html file>
Options:
-dbsgml Docbook SGML output
-dbxml Docbook XML output
-help Print Help Message
This produces the output in docbook SGML or XML.
The docbook public identifiers are hardcoded for now. In future, I will
add options.
The output is far from perfect, but it does most of the dirty work. I will prepare a web page explaining the transformations. It will take some time :-) For the impatient, look in the source.
For more information on docbook, visit Docbook Official Site.
Testing
If you want to test the tool, send me a mail informing me about it and follow these instructions.
- Download the binary
- Run tidy
./html2db <html file> 2>&t.err This would put errors in t.err file
- Then validate the sgml
nsgmls -s <sgml file> 2>&s.err This would put errors in s.err file
- Send me both the error files along with html source and anything interesting you found. If you are sending lot of files use some naming convention like 1.html t1.err s1.err and zip them.
- DO NOT send jade output. It's not necessary.
