HtmlAgilityPack 库 StackOverflowException 解决方案

     最近试用HtmlAgilityPack 来解析html,试用过程中程序会抛出StackOverflowException异常,从MSDN上可以看到,从 .NET Framework 2.0 版开始,将无法通过 try-catch 块捕获 StackOverflowException 对象,并且默认情况下将终止相应的进程。

 

    调查原因,发现,当一个html结构非常复杂时,HtmlAgilityPack 的递归次数会非常多,于是就报StackOverflowException异常,google了一下,找到下面的解决方案

首先,在库中新增一个类:

public class StackChecker

{

    public unsafe static bool HasSufficientStack(long bytes)

    {

        var stackInfo = new MEMORY_BASIC_INFORMATION();



        // We subtract one page for our request. VirtualQuery rounds UP to the next page.

        // Unfortunately, the stack grows down. If we're on the first page (last page in the

        // VirtualAlloc), we'll be moved to the next page, which is off the stack! Note this

        // doesn't work right for IA64 due to bigger pages.

        IntPtr currentAddr = new IntPtr((uint)&stackInfo - 4096);



        // Query for the current stack allocation information.

        VirtualQuery(currentAddr, ref stackInfo, sizeof(MEMORY_BASIC_INFORMATION));



        // If the current address minus the base (remember: the stack grows downward in the

        // address space) is greater than the number of bytes requested plus the reserved

        // space at the end, the request has succeeded.

        return ((uint)currentAddr.ToInt64() - stackInfo.AllocationBase) >

            (bytes + STACK_RESERVED_SPACE);

    }



    // We are conservative here. We assume that the platform needs a whole 16 pages to

    // respond to stack overflow (using an x86/x64 page-size, not IA64). That's 64KB,

    // which means that for very small stacks (e.g. 128KB) we'll fail a lot of stack checks

    // incorrectly.

    private const long STACK_RESERVED_SPACE = 4096 * 16;



    [DllImport("kernel32.dll")]

    private static extern int VirtualQuery(

        IntPtr lpAddress,

        ref MEMORY_BASIC_INFORMATION lpBuffer,

        int dwLength);



    private struct MEMORY_BASIC_INFORMATION

    {

        internal uint BaseAddress;

        internal uint AllocationBase;

        internal uint AllocationProtect;

        internal uint RegionSize;

        internal uint State;

        internal uint Protect;

        internal uint Type;

    }

}

 

然后,在递归次数较多的地方(such as HtmlNode.WriteTo(TextWriter outText) andHtmlNode.WriteTo(XmlWriter writer)):)添加下面的代码:

if (!StackChecker.HasSufficientStack(4*1024))

                throw new Exception("The document is too complex to parse");

 

OK,大功告成!

你可能感兴趣的:(exception)