'Accessing RSS feed with URL is working but URLResource is giving 403 forbidden error

I am trying to fetch rss feed but it is giving error when feed is accessed using org.springframework.core.io.URLResource class. I am using latest spring boot version 2.6.3. Am attaching sample code as well and GH repo.


@EnableIntegration
@Configuration
public class IntegrationRssFetch {

  @Value("https://www.reutersagency.com/feed/?post_type=reuters-best")
  private UrlResource urlResource;

  @Value("https://www.reutersagency.com/feed/?post_type=reuters-best")
  private URL url;

  @Bean
  public MetadataStore metadataStore() {
    PropertiesPersistingMetadataStore metadataStore = new PropertiesPersistingMetadataStore();
    metadataStore.setBaseDirectory("src/main/resources");
    return metadataStore;
  }

  @Bean
  public MessageChannel rssOutputChannel() {
    return MessageChannels.direct("rss_feed_flow").get();
  }

  @Bean
  public IntegrationFlow feedFlow() {
    return IntegrationFlows
        .from(Feed.inboundAdapter(this.urlResource, "feedTest")
                .metadataStore(metadataStore()),
            e -> e.poller(p -> p.fixedDelay(100)))
        .channel("rss_feed_flow")
        .get();
  }

  @Bean
  public IntegrationFlow rssReadFlow() {
    return IntegrationFlows
        .from("rss_feed_flow")
        .handle(message -> {
          SyndEntry entry = (SyndEntry) message.getPayload();
          System.out.println(entry.getTitle());
        })
        .get();
  }

}

Following is stack trace of error

2022-02-23 15:46:51.181 ERROR 14352 --- [   scheduling-1] o.s.integration.handler.LoggingHandler   : org.springframework.messaging.MessagingException: Failed to retrieve feed for 'FeedEntryMessageSource{feedUrl=null, feedResource=URL [https://www.reutersagency.com/feed/?post_type=reuters-best], metadataKey='feedTest', lastTime=-1}'; nested exception is java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.reutersagency.com/feed/?post_type=reuters-best
    at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:234)
    at org.springframework.integration.feed.inbound.FeedEntryMessageSource.populateEntryList(FeedEntryMessageSource.java:201)
    at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:176)
    at org.springframework.integration.feed.inbound.FeedEntryMessageSource.doReceive(FeedEntryMessageSource.java:57)
    at org.springframework.integration.endpoint.AbstractMessageSource.receive(AbstractMessageSource.java:142)
    at org.springframework.integration.endpoint.SourcePollingChannelAdapter.receiveMessage(SourcePollingChannelAdapter.java:212)
    at org.springframework.integration.endpoint.AbstractPollingEndpoint.doPoll(AbstractPollingEndpoint.java:444)
    at org.springframework.integration.endpoint.AbstractPollingEndpoint.pollForMessage(AbstractPollingEndpoint.java:413)
    at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$4(AbstractPollingEndpoint.java:348)
    at org.springframework.integration.util.ErrorHandlingTaskExecutor.lambda$execute$0(ErrorHandlingTaskExecutor.java:57)
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:50)
    at org.springframework.integration.util.ErrorHandlingTaskExecutor.execute(ErrorHandlingTaskExecutor.java:55)
    at org.springframework.integration.endpoint.AbstractPollingEndpoint.lambda$createPoller$5(AbstractPollingEndpoint.java:341)
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54)
    at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Server returned HTTP response code: 403 for URL: https://www.reutersagency.com/feed/?post_type=reuters-best
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1894)
    at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1492)
    at sun.net.www.protocol.https.HttpsURLConnectionImpl.getInputStream(HttpsURLConnectionImpl.java:263)
    at org.springframework.core.io.UrlResource.getInputStream(UrlResource.java:186)
    at org.springframework.integration.feed.inbound.FeedEntryMessageSource.getFeed(FeedEntryMessageSource.java:224)
    ... 21 more

Link to GH https://github.com/pinkeshsagar-harptec/code-sample/tree/main/rssfeedissue



Solution 1:[1]

So, looks like that www.reutersagency.com doesn't like some User-Agent HTTP header values. For example it returns 403 for my default Java/1.8.0_251, but at the same time it is OK for Java/17.0.1 or Java/8, Java/20. But still doesn't work for Java/1.8 or 1.6, 1.7 etc. Java/11 is OK, too.

So, I suggest to upgrade to a newer Java anyway. Looks like Java 8 is already out of support on that RSS server.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Artem Bilan