Get The Avaliable XPaths Of An Html Page?
I've taken and adapted this code of how to retrieve the XPath expressions of an XML document. I Would like to do the same but using an html page to retrieve its avaliable XPaths (
Solution 1:
As far as I can see, HtmlAgilityPack has a very similar class structures to XmlDocument
. So I believe you can easiliy adapt current solution to cope with HtmlDocument
, something like this :
Public Function GetXPaths(ByVal Document As HtmlDocument) As List(Of String)
Dim XPathList As New List(Of String)
Dim XPath As String = String.Empty
For Each Child As HtmlNode In Document.DocumentNode.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child'
Return XPathList
End Function
Private Sub GetXPaths(ByVal Node As HtmlNode,
ByRef XPathList As List(Of String),
Optional ByVal XPath As String = Nothing)
XPath &= "/" & Node.Name
If Not XPathList.Contains(XPath) Then
XPathList.Add(XPath)
End If
For Each Child As HtmlNode In Node.ChildNodes
If Child.NodeType = HtmlNodeType.Element Then
GetXPaths(Child, XPathList, XPath)
End If
Next ' child'
End Sub
Worked fine when tested using HTML that is XML compliant. But I can't guarantee about how far this will work against malformed HTML documents.
Post a Comment for "Get The Avaliable XPaths Of An Html Page?"