HTML Agility Pack


This entry is part 1 of 5 in the series HTML Agility Pack

Alternate Solutions

Before I begin, I would like to point out that you might be able to web scrape with Microsoft’s Power Automate for Desktop without any coding. Check out a YouTube video by Leila Gharani called Web Scraping Made EASY With Power Automate Desktop – For FREE & ZERO Coding. You might be familiar with R language. If so, have a look at this YouTube video called Web Scrape Text from ANY Website – Web Scraping in R (Part 1). He uses an extension called SelectorGadget. He also uses two libraries: rvest and dplyr. Another way to do this might be to use Python. You would use BeautifulSoup.

The HTML Agility Pack is a very popular program as it has been downloaded more than 9 million times. What is it? HTML Agility Pack is a free and open source HTML parser for .NET. It is a NET code library that allows you to parse “out of the web” HTML files. The input is an HTML file and the output is a subset of the input file. This is an agile HTML parser that builds a read/write DOM and supports plain XPATH or XSLT (you actually don’t HAVE to understand XPATH nor XSLT to use it, don’t worry…).

For example, you could extract the data from within all of the HTML tables in an HTML file or website. You could include the HTML markup in the output, or you could exclude it. You could extract all of the table data from a website or HTML file and output a CSV file.

Installation

In Visual Studio you can install this library with the Package Manager. In Visual Studio, the Package Manager Console can we opened by going to the menus Tools, NuGet Package Manager, Package Manager Console. It is a command line interface in Visual Studio. To install the HTML Agility Pack, simply tpe the following at the PM> command line:

Install-Package HtmlAgilityPack -Version 1.8.8

You can get the latest version by going to the download page and clicking the red button called NuGet Download which takes you to this download page.

Series NavigationHTML Agility Pack Website Tables >>