'Web scraping application doesn't work anymore when moved onto docker container
public static HtmlNode GetHtml(string link)
{
_scrapingBrowser.IgnoreCookies = true;
_scrapingBrowser.Timeout = TimeSpan.FromMinutes(15);
_scrapingBrowser.Headers["User-Agent"] = "Mozilla/4.0 (Compatible; Windows NT 5.1; MSIE 6.0)" +
" (compatible; MSIE 6.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)";
_scrapingBrowser.Encoding = System.Text.Encoding.UTF8;
WebPage _webPage = _scrapingBrowser.NavigateToPage(new Uri(link));
return _webPage.Html;
}
This is the code I'm using to grab the data off the web page, it works fine when I run it on my desktop but when I run it in a docker container it stops working. I think it has to do with the encoding but I'm not sure.
FROM mcr.microsoft.com/dotnet/runtime:3.1 AS base
WORKDIR /app
FROM mcr.microsoft.com/dotnet/sdk:3.1 AS build
WORKDIR /src
COPY NuGet.Config ./
COPY ["OnlineFindsBot.csproj", "."]
RUN dotnet restore "./OnlineFindsBot.csproj"
COPY . .
WORKDIR "/src/."
RUN dotnet build "OnlineFindsBot.csproj" -c Release -o /app/build
FROM build AS publish
RUN dotnet publish "OnlineFindsBot.csproj" -c Release -o /app/publish
FROM base AS final
WORKDIR /app
COPY --from=publish /app/publish .
COPY config.json ./
ENTRYPOINT ["dotnet", "OnlineFindsBot.dll"]
Here is my Dockerfile I am using.
The html.Content isn't at all readable when it is in the docker container, just a bunch of characters
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
