HTMLParser (AppPerfect Scripting API)

Class

FRAMES NO FRAMES

SUMMARY: NESTED | FIELD | CONSTR | METHOD

DETAIL: FIELD | CONSTR | METHOD

com.gargoylesoftware.htmlunit.html
Class HTMLParser

java.lang.Object
  com.gargoylesoftware.htmlunit.html.HTMLParser

public final class HTMLParser
extends java.lang.Object
extends java.lang.Object

SAX parser implementation that uses the neko HTMLConfiguration to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.

Note that the parser currently does not handle CDATA or comment sections, i.e. these do not appear in the resulting DOM tree

Version:: $Revision: 1.3 $
Author:: Christian Sell, David K. Taylor, Chris Erskine, Ahmed Ashour

Method Summary
`static IElementFactory`	`getFactory(java.lang.String tagName)`
`static boolean`	`getIgnoreOutsideContent()` Get the state of the flag to ignore content outside the BODY and HTML tags
`static HtmlPage`	`parse(WebResponse webResponse, WebWindow webWindow)` parse the HTML content from the given WebResponse into an object tree representation
`static void`	`parseFragment(DomNode parent, java.lang.String source)` Parses the HTML content from the given string into an object tree representation.
`static void`	`setIgnoreOutsideContent(boolean ignoreOutsideContent)` Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags.

Methods inherited from class java.lang.Object
`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`

Method Detail

getFactory

public static IElementFactory getFactory(java.lang.String tagName)

Parameters:: tagName - an HTML element tag name
Returns:: a factory for creating HtmlElements representing the given tag

getIgnoreOutsideContent

public static boolean getIgnoreOutsideContent()

Get the state of the flag to ignore content outside the BODY and HTML tags

Returns:: - The current state

parse

public static HtmlPage parse(WebResponse webResponse,
                             WebWindow webWindow)
                      throws java.io.IOException

parse the HTML content from the given WebResponse into an object tree representation

Parameters:: webResponse - the response data; webWindow - the web window into which the page is to be loaded
Returns:: the page object which forms the root of the DOM tree, or null if the <HTML> tag is missing
Throws:: java.io.IOException - io error

parseFragment

public static void parseFragment(DomNode parent,
                                 java.lang.String source)
                          throws org.xml.sax.SAXException,
                                 java.io.IOException

Parses the HTML content from the given string into an object tree representation.

Parameters:: parent - the parent for the new nodes; source - the (X)HTML to be parsed
Throws:: org.xml.sax.SAXException - if a SAX error occurs; java.io.IOException - if an IO error occurs

setIgnoreOutsideContent

public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)

Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags. This flag is false by default to maintain compatibility with current NekoHTML defaults.

Parameters:: ignoreOutsideContent - - boolean flag to set

Overview

Package