Skip to content Skip to sidebar Skip to footer

Get The Avaliable XPaths Of An Html Page?

I've taken and adapted this code of how to retrieve the XPath expressions of an XML document. I Would like to do the same but using an html page to retrieve its avaliable XPaths (

Solution 1:

As far as I can see, HtmlAgilityPack has a very similar class structures to XmlDocument. So I believe you can easiliy adapt current solution to cope with HtmlDocument, something like this :

Public Function GetXPaths(ByVal Document As HtmlDocument) As List(Of String)
    Dim XPathList As New List(Of String)
    Dim XPath As String = String.Empty
    For Each Child As HtmlNode In Document.DocumentNode.ChildNodes
        If Child.NodeType = HtmlNodeType.Element Then
            GetXPaths(Child, XPathList, XPath)
        End If
    Next ' child'
    Return XPathList
End Function

Private Sub GetXPaths(ByVal Node As HtmlNode,
                  ByRef XPathList As List(Of String),
                  Optional ByVal XPath As String = Nothing)
    XPath &= "/" & Node.Name
    If Not XPathList.Contains(XPath) Then
        XPathList.Add(XPath)
    End If
    For Each Child As HtmlNode In Node.ChildNodes
        If Child.NodeType = HtmlNodeType.Element Then
            GetXPaths(Child, XPathList, XPath)
        End If
    Next ' child'
End Sub

Worked fine when tested using HTML that is XML compliant. But I can't guarantee about how far this will work against malformed HTML documents.


Post a Comment for "Get The Avaliable XPaths Of An Html Page?"