'Java How to Normalise a URL and Remove Fragment
How to normalise a URL in Java to remove the fragment. I.e. from https://www.website.com#something to https://www.website.com
This is possible with the URL.Normalize code, although in this specific use case I've only got a full absolute URL which needs to remain intact.
I'd like to be able to modify this code slightly to remove the fragment from the URL;
//The website below is just an example. In reality, this URL is unknown and could be anything. Both with and without a fragment depending on the use case
URL absUrl = new URL("https://www.website.com#something");
My thoughts so far is that this is only going to be possible by breaking down the URL into the Protocol + Domain + Path then joining it all back together which does appear to work, but there must be a more elegant way of doing this.
Solution 1:[1]
Fragment removal is fairly simple using the conversion methods toURI
and toURL
. So to convert a URL to a URI:
URL url = /*what have you*/ …
URI u = url.toURI();
To remove any fragment from the URI:
if( u.getFragment() != null ) { // Remake with same parts, less the fragment:
u = new URI( u.getScheme(), u.getSchemeSpecificPart(), /*fragment*/null ); }
In reconstructing a URI
from its parts like that, it’s important to use the decoded getters (as shown), not the corresponding raw ones. For authority on this usage, see e.g. the Identity section of the API.
To convert the result back to a URL:
url = u.toURL();
Solution 2:[2]
Fragments do not exist as a separate entity in Java URLs. But you can convert a URL into a URI and back to remove a fragment. I did it like this:
URL url;
...
if (url.toString().contains("#")) {
URI uri = null;
try {
uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), null);
String file = "";
if (uri.getPath() != null) {
file += uri.getPath();
}
if (uri.getQuery() != null) {
file += uri.getQuery();
}
url = new URL(uri.getScheme(), uri.getHost(), uri.getPort(), file);
} catch (URISyntaxException e) {
...
} catch (MalformedURLException e) {
...
}
}
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Michael Allan |
Solution 2 | Dave Moten |