JavaScript EditorFreeware JavaScript Editor     Perl Tutorials 



Main Page Previous Section Next Section

Recipe 20.5 Converting HTML to ASCII

20.5.1 Problem

You want to convert an HTML file into formatted, plain ASCII. For example, you want to mail a web document to someone.

20.5.2 Solution

If you have an external formatter like lynx, call an external program:

$ascii = `lynx -dump $filename`;

If you want to do it within your program and don't care about the things that the HTML::FormatText formatter doesn't yet handle well (tables and frames):

use HTML::FormatText 3;
$ascii = HTML::FormatText->format_file(
  $filename,
  leftmargin => 0, rightmargin => 50
);

20.5.3 Discussion

These examples both assume the HTML is in a file. If your HTML is in a variable, you need to write it to a file for lynx to read. With HTML::FormatText, use the format_string( ) method:

use HTML::FormatText 3;
$ascii = HTML::FormatText->format_string(
  $filename,
  leftmargin => 0, rightmargin => 50
);

If you use Netscape, its "Save as" option with the type set to "Text" does the best job with tables.

20.5.4 See Also

The documentation for the CPAN modules HTML::TreeBuilder and HTML::FormatText; your system's lynx(1) manpage; Recipe 20.6

    Main Page Previous Section Next Section
    


    JavaScript EditorJavaScript Verifier     Perl Tutorials


    ©