What is the Deep Web and why you’re probably already using it!
For most people when someone mentions the Deep Web or even the Dark Web, images of hackers and a shadowy underworld are thrown up. The Deep Web is huge when compared to the Surface Web which is what search engines index, in fact it is estimated that the Deep Web is 500 times bigger than the Surface Web!
This blog article explains what the Surface, Deep and Dark Web actual are, why you probably use the Deep Web every day without realising and even how parts of the Dark Web can be an incredible force for good.
So What Actually Is The Deep Web?
The deep web is content that is not indexed by search engines. The stuff that they can index is often referred to as the Surface Web. Deep web content can be roughly divided into the following bins:
- Data that needs to be accessed by a search interface
- Results of database queries
- Password protected data
- Page not linked to by any other page
- Stuff that might require a Captcha image to view
- Content that resides outside of http and https protocols
Let’s take each one of these in turn and provide a bit more detail.
Data behind a search interface
This is quite a simple one. Essentially if a web page is not linked from another page anywhere on the site and the only way to access this information is by typing something into a search box then that page will now be part of the Deep Web. This is because a search engine can crawl links but it won’t “type” things into a search box to find out information.
Results of database queries
The best way to explain this one is by way of example. On sites powered by our Kontrolit Content Management System we keep a version history of our pages every time there is an edit made. The reason for this is to allow admins to track changes over time and more importantly, restore an older version in case a mistake was made (I’ve personally used this lots!). I’ve done a quick check on our kontrolit.net site and can see that until the 10th June 2014 we have made 87 edits to our home page.
Each version could be restored at any time by an administrator, including the very first version from December 2008. Search engines are generally only going to have a record of, at most, one or two changes. In addition we may find that sites like Way Back Machine will take snapshots of changes. However the vast majority of our version history is only viewable to administrators for the site. Hence the historic versions of our home page form part of the Deep Web.
If you add up the version history for all the pages on our kontrolit.net site you can see that we actually have much more Deep Web content than Surface Web content.
Password Protected Data
A good example of this data will be your online bank statements. For example if you type Tim Howes Bank Statements into Google you are not (hopefully) going to view my bank information. But when I view my bank statements I am essentially just viewing a web page. Of course to do this I have had to enter a correct username and password. So my bank statements online form part of the Deep Web.
Our customers that run a members area on their sites will again produce web pages that require their site visitors to login to view. This might be because the information is sensitive or because their members have paid to see it.
Other examples of common password protected data are Intranets and Extranets. We run an Intranet here at Kontrolit HQ which allows us to organise common information into one central place. We can also access this remotely providing we login with the correct username and password. All of our company Intranet then forms part of the Deep Web.
Pages not linked by any other page
Another simple one. That is a page on a website which is not linked to by any other page. Thus search engines can’t crawl this information as they can’t get to this page. For example on our Kontrolit sites customers have a form builder where they can create their own custom forms. So we might, for example, want to create a customer feedback form. We could create this form and publish it without linking to it so our normal site visitors won’t find the form anywhere. We could then email this form to our customers asking them for feedback. They can click on the link to find the form but it has remained hidden from search engines. Thus our feedback form has formed part of the Deep Web.
Stuff that requires a Captcha image to view
Again a fairly simple one. If you need to fill in a Captcha image (to prove you’re a human) then the content behind this is going to be invisible to search engines.
Adding to our form builder example above, our customers have the option of specifying a unique page to redirect their visitors to when a visitors has filled out the form. This page is usually just a quick thank you page; however it could be a page with a download on it or more information. Everything on this page then forms part of the Deep Web as you needed to fill in a Captcha image to view it.
Content outside the normal http or https protocol
Most web content will start with either http or https. For example https://www.kontrolit.net. Content that may reside on something different, for example sop:// or sometimes event ftp://, will not get indexed by search engines and thus forms part of the Deep Web.
So How Big Is The Deep Web?
It is estimated that less than 0.03% of the internet is part of the surface web. That’s not a mistype, I really do mean 0.03%. Another way of thinking about this is that the deep web is about 500 times the size of the surface web.
You can see why the surface web is actually often described as just the tip of the iceberg. We saw from our earlier examples of the Kontrolit version history and company intranet that it is quite easy for this size difference to actually occur.
So how does the Dark Web fit into all of this?
The Dark Web is a subset of the Deep Web. Yes this does contain a shadowy underworld of hackers, drugs and guns etc. But it is much more than that. The Dark Web is run on the principle of everything being anonymous. From the actual sites to the visitors who use them, everything is geared around protecting privacy.
Most sites on the Dark Web can’t actually be accessed with a normal web browser such as Google Chrome or Mozilla Firefox. Instead a special browser called Tor is required. This essentially will route your visitor session automatically when you open the browser via a tangled web of computers, providing you with a different IP address and making it very, very difficult to work out who you actually are. This browser will then let you access sites with a .onion domain extension.
Within the Dark Web you can find file sharing services and email services which, when used, will be anonymous. Crypto currencies such as Bitcoin have also become the dominant form of payment for these services as this also means that payments can be anonymous.
So how is this Dark Web good?
There are many good reasons for visitors wanting to be anonymous. The clearest example of this is to consider someone who lives in a country that is not democratic or where the Government censors the public. For those people who want to discuss what’s going on or organise meetings, the Dark Web provides them with a platform to do this.
This is one of the principle aims of the Tor Project, to “make the web safe for whistleblowers” which started life at the US Navel Research Laboratory. Indeed the project is still part funded by the United States Government along with the Swedish Government, amongst other organisations.
The Deep Web for the main is just a natural consequence of how things are structured and there are lots of valid reasons for content items like Bank Statements, Intranets and document version histories being part of the Deep Web.
The Dark Web is a subset of the Deep Web and you need special software like the Tor browser to access information here. The main principle of the Dark Web is that you’re anonymous. This can be a force for good or bad.
If you have content on your website that should be on the surface web to help attract new visitors make sure that:
- It does not need a login to view it
- It can be reached by a hyperlink or sequence of hyperlinks from your home page
A report about a raid on over 400 dark web sites engaging in illegal activity, thanks to the collaberative efforts of European and US law enforcement agencies.