com.gargoylesoftware.htmlunit.html
Class HTMLParser

java.lang.Object
  extended by com.gargoylesoftware.htmlunit.html.HTMLParser

public final class HTMLParser
extends java.lang.Object

SAX parser implementation that uses the neko HTMLConfiguration to parse HTML into a HtmlUnit-specific DOM (HU-DOM) tree.

Note that the parser currently does not handle CDATA or comment sections, i.e. these do not appear in the resulting DOM tree

Version:
$Revision: 1.3 $
Author:
Christian Sell, David K. Taylor, Chris Erskine, Ahmed Ashour

Method Summary
static IElementFactory getFactory(java.lang.String tagName)
  
static boolean getIgnoreOutsideContent()
 Get the state of the flag to ignore content outside the BODY and HTML tags
static HtmlPage parse(WebResponse webResponse, WebWindow webWindow)
 parse the HTML content from the given WebResponse into an object tree representation
static void parseFragment(DomNode parent, java.lang.String source)
 Parses the HTML content from the given string into an object tree representation.
static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
 Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags.
 
Methods inherited from class java.lang.Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

getFactory

public static IElementFactory getFactory(java.lang.String tagName)
Parameters:
tagName - an HTML element tag name
Returns:
a factory for creating HtmlElements representing the given tag

getIgnoreOutsideContent

public static boolean getIgnoreOutsideContent()
Get the state of the flag to ignore content outside the BODY and HTML tags

Returns:
- The current state

parse

public static HtmlPage parse(WebResponse webResponse,
                             WebWindow webWindow)
                      throws java.io.IOException
parse the HTML content from the given WebResponse into an object tree representation

Parameters:
webResponse - the response data
webWindow - the web window into which the page is to be loaded
Returns:
the page object which forms the root of the DOM tree, or null if the <HTML> tag is missing
Throws:
java.io.IOException - io error

parseFragment

public static void parseFragment(DomNode parent,
                                 java.lang.String source)
                          throws org.xml.sax.SAXException,
                                 java.io.IOException
Parses the HTML content from the given string into an object tree representation.

Parameters:
parent - the parent for the new nodes
source - the (X)HTML to be parsed
Throws:
org.xml.sax.SAXException - if a SAX error occurs
java.io.IOException - if an IO error occurs

setIgnoreOutsideContent

public static void setIgnoreOutsideContent(boolean ignoreOutsideContent)
Set the flag to control validation of the HTML content that is outside of the BODY and HTML tags. This flag is false by default to maintain compatibility with current NekoHTML defaults.

Parameters:
ignoreOutsideContent - - boolean flag to set


Copyright © 2003-2016 AppPerfect Corporation. All Rights Reserved.