1. Trang chủ >
  2. Công Nghệ Thông Tin >
  3. Kỹ thuật lập trình >

[Chapter 17] 17.4 The HTML Module

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (6.44 MB, 72 trang )


[Chapter 17] 17.4 The HTML Module

q



eof



When the parse or parse_file method is called, it parses the incoming HTML with a few internal methods.

In HTML::Parser, these methods are defined, but empty. Additional HTML parsing classes (included in the

HTML modules or ones you write yourself) override these methods for their own purposes. For example:

package HTML::MyParser;

require HTML::Parser;

@ISA=qw(HTML::MyParser);

sub start {

your subroutine defined here

}

The following list shows the internal methods contained in HTML::Parser:

q comment

q



declaration



q



end



q



start



q



text



17.4.2 HTML::Element

The HTML::Element module provides methods for dealing with nodes in an HTML syntax tree. You can get or

set the contents of each node, traverse the tree, and delete a node.

HTML::Element objects are used to represent elements of HTML. These elements include start and end tags,

attributes, contained plain text, and other nested elements.

The constructor for this class requires the name of the tag for its first argument. You may optionally specify initial

attributes and values as hash elements in the constructor. For example:

$h = HTML::Element->new('a', 'href' => 'http://www.oreilly.com');

The new element is created for the anchor tag, , which links to the URL through its href attribute.

The following methods are provided for objects of the HTML::Element class:

q as_HTML

q



attr



q



content



q



delete



q



delete_content



q



dump



q



endtag



q



extract_links



q



implicit



q



insert_element



q



is_empty



http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_04.htm (2 of 5) [2/7/2001 10:37:07 PM]



[Chapter 17] 17.4 The HTML Module

q



is_inside



q



parent



q



pos



q



push_content



q



starttag



q



tag



q



traverse



17.4.3 HTML::TreeBuilder

The HTML::TreeBuilder class provides a parser that creates an HTML syntax tree. Each node of the tree is an

HTML::Element object. This class inherits both HTML::Parser and HTML::Elements, so methods from both of

those classes can be used on its objects.

The methods provided by HTML::TreeBuilder control how the parsing is performed. Values for these methods

are set by providing a boolean value for their arguments. Here are the methods:

q implicit_tags

q



ignore_unknown



q



ignore_text



q



warn



17.4.4 HTML::FormatPS

The HTML::FormatPS module converts an HTML parse tree into PostScript. The formatter object is created with

the new constructor, which can take parameters that assign PostScript attributes. For example:

$formatter = new HTML::FormatPS('papersize' => 'Letter');

You can now give parsed HTML to the formatter and produce PostScript output for printing. HTML::FormatPS

does not handle table or form elements at this time.

The method for this class is format. format takes a reference to an HTML TreeBuilder object, representing a

parsed HTML document. It returns a scalar containing the document formatted in PostScript. The following

example shows how to use this module to print a file in PostScript:

use HTML::FormatPS;

$html = HTML::TreeBuilder->parse_file(somefile);

$formatter = new HTML::FormatPS;

print $formatter->format($html);

The following list describes the attributes that can be set in the constructor:

PaperSize

Possible values of 3, A4, A5, B4, B5, Letter, Legal, Executive, Tabloid, Statement, Folio, 10x14, and

Quarto. The default is A4.

PaperWidth

Width of the paper in points.



http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_04.htm (3 of 5) [2/7/2001 10:37:07 PM]



[Chapter 17] 17.4 The HTML Module



PaperHeight

Height of the paper in points.

LeftMargin

Left margin in points.

RightMargin

Right margin in points.

HorizontalMargin

Left and right margin. Default is 4 cm.

TopMargin

Top margin in points.

BottomMargin

Bottom margin in points.

VerticalMargin

Top and bottom margin. Default is 2 cm.

PageNo

Boolean value to display page numbers. Default is 0 (off).

FontFamily

Font family to use on the page. Possible values are Courier, Helvetica, and Times. Default is Times.

FontScale

Scale factor for the font.

Leading

Space between lines, as a factor of the font size. Default is 0.1.



17.4.5 HTML::FormatText

The HTML::FormatText takes a parsed HTML file and outputs a plain text version of it. None of the character

attributes will be usable, i.e., bold or italic fonts, font sizes, etc.

This module is similar to FormatPS in that the constructor takes attributes for formatting, and the format

method produces the output. A formatter object can be constructed like this:

$formatter = new HTML::FormatText (leftmargin => 10, rightmargin => 80);

The constructor can take two parameters: leftmargin and rightmargin. The value for the margins is given

in column numbers. The aliases lm and rm can also be used.

The format method takes an HTML::TreeBuilder object and returns a scalar containing the formatted text. You

can print it with:

print $formatter->format($html);



http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_04.htm (4 of 5) [2/7/2001 10:37:07 PM]



[Chapter 17] 17.4 The HTML Module



17.3 The HTTP Modules



17.5 The URI Module



[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 | Programming Perl | Advanced Perl Programming | Perl

Cookbook ]



http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_04.htm (5 of 5) [2/7/2001 10:37:07 PM]



[Chapter 17] 17.5 The URI Module



Chapter 17

The LWP Library



17.5 The URI Module

The URI module contains functions and modules to specify and convert URIs. (URLs are a type of URI.)

There are three URI modules: URL, Escape, and Heuristic. Of primary importance to many LWP

applications is the URI::URL class, which creates the objects used by LWP::UserAgent to determine

protocols, server locations, and resource names.

The URI::Escape module replaces unsafe characters in URL strings with their appropriate escape

sequences. URI::Heuristic provides convenience methods for creating proper URLs out of short strings

and incomplete addresses.



17.5.1 URI::Escape

This module escapes or unescapes "unsafe" characters within a URL string. Unsafe characters in URLs

are described by RFC 1738. Before you form URI::URL objects and use that class's methods, you should

make sure your strings are properly escaped. This module does not create its own objects; it exports the

following functions:

q uri_escape

q



uri_unescape



17.5.2 URI::URL

This module creates URL objects that store all the elements of a URL. These objects are used by the

request method of LWP::UserAgent for server addresses, port numbers, file names, protocol, and

many of the other elements that can be loaded into a URL.

The new constructor is used to make a URI::URL object:

$url = new URI::URL($url_string [, $base_url])

This method creates a new URI::URL object with the URL given as the first parameter. An optional base

URL can be specified as the second parameter and is useful for generating an absolute URL from a

relative URL.

The following list describes the methods for the URI::URL class:

q abs

http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_05.htm (1 of 2) [2/7/2001 10:37:09 PM]



[Chapter 17] 17.5 The URI Module



q



as_string



q



base



q



crack



q



default_port



q



eparams



q



epath



q



eq



q



equery



q



frag



q



full_path



q



host



q



netloc



q



params



q



password



q



path



q



port



q



query



q



rel



q



scheme



q



strict



q



user



17.4 The HTML Module



VII. Perl/Tk



[ Library Home | Perl in a Nutshell | Learning Perl | Learning Perl on Win32 | Programming Perl | Advanced Perl

Programming | Perl Cookbook ]



http://www.crypto.nc1uw1aoi420d85w1sos.de/documents/oreilly/perl/perlnut/ch17_05.htm (2 of 2) [2/7/2001 10:37:09 PM]



Xem Thêm
Tải bản đầy đủ (.pdf) (72 trang)

Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×