7/30/2023 0 Comments Djvulibre macBien, a Greenstone user on the mailing list, has recommended it as being of possible use to Greenstone users, as it's a front-end to OCR programs. For instance, Linux users can install the ocrodjvu package and use its djvu2hocr tool to extract the text content in HTML format. There are several tools out there to convert a DjVu document into text or HTML. DjVu is used by hundreds of academic, commercial, governmental, and non-commercial web sites around the world."In this part of the tutorial we'll see how to get Greenstone to not just include a collection's DjVu documents, but make them searchable too. DjVu images display instantly and can be smoothly zoomed and panned with no lengthy re-rendering. DjVu content downloads faster, displays and renders faster, looks nicer on a screen, and consume less client resources than competing formats. DjVu can advantageously replace PDF, PS, TIFF, JPEG, and GIF for distributing scanned documents, digital documents, or high-resolution pictures. "a web-centric format and software platform for distributing documents and images. DjVuLibre, which provides open source tools for processing DjVu documents, describes DjVu as Working with DjVu documents in GreenstoneDjVu (pronounced like the French phrase déjà vu) is a document format suited for archiving digital documents. The result should be that the djvu files in your Greenstone collection are now searchable. So in this case, you could try using the UnknownConverterPlugin with the commandline tool on djvu files that you've gathered. However, there's a free commandline tool available that can convert from djvu to one of the text based formats that Greenstone can process, text or html. It will launch the commandline conversion tool with the command provided, and the expected output files as specified can then be processed by Greenstone in the usual manner.Īn example scenario would be if your collection contained djvu files, for which Greenstone provides no custom plugin. Once configured, the UnknownConverterPlugin will be used during building to process documents that match the specified file format. In place of the input file and the output file or folder, you provide placeholders in the command to run. If you know how to launch this tool from the commandline to do the conversion, then you would configure the UnknownConverterPlugin by supplying the file format (file extension) of the documents it should process, the expected output file format (text, html or paged images), and the tool's conversion command that the UnknownConverterPlugin should launch to perform the conversion. The UnknownConverterPlugin extends the UnknownPlugin's abilities by letting you launch a tool you have installed on your own PC that can be run from the commandline to convert from the "unknown" file format to either text, html or gif/jpg/png images, or a folder of these. It can also be made to handle documents with known file extensions in a custom manner. The UnknownConverterPlugin builds on the idea of the UnknownPlugin, in that it can be configured to handle documents of unknown format and file extension. This is an advanced tutorial, in that it not only supposes you have familiarised yourself with most of what you've learned in preceding tutorials, but that you're also comfortable with downloading and installing software from the web, and have a little familiarity with using image editing software. Modified for Greenstone version: 2.87|3.10 Using the UnknownConverterPlugin to make unsupported document formats searchable Devised for Greenstone version: 2.88|3.09
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |