Hello, friends and strangers! It’s been quite a while since I’ve blogged, I hope you will forgive me. I’ve been rather busy working on my Masters Degree in IT, but now… at long last… all is done, and I’ve walked the aisle. Now that my brain is a little fuller, and my calendar is a little less complex, I’m hoping to get back on the blogging horse.
Today’s topic is something I’ve been doing some independent research on, and found rather interesting. Hopefully you’ll find it informative, or at least mildly amusing. The Internet is frequently called the World Wide Web – but in actuality, it is made up of three different types, or levels, of Web.
Chances are, you are reading this blog on what is called the Clearweb. This is what we commonly consider the Internet that we know and love. The Clearweb is also called the Openweb or Surfaceweb. The most notable characteristic of the Clearweb is that it is Indexed. That’s a fancy way of saying that Google (or Bing or any number of other search engines) routinely read through the Clearweb and index it, so you can find information more easily. If you fire up a new tab right now, and search for “Hagrid’s Rock Cakes”, you will get a list of hits, or web pages that match your search description. Those results appear because Google routinely reads through and catalogs (Indexes) portions of the Internet with a Web Spider (huh… a Spider on the web? Go figure!). Google uses a complex algorithm to determine what search results show up at the top (usually those come from the highest paying advertiser). Within minutes, you are reading up on recipes for Hagrid’s cakes. A Spider (or Crawler) goes through web pages and indexes them based on key words and metadata (embedded key words) within a web page. Though we are the most familiar with web pages on the Clearweb, research indicates that these pages make up only about 4% of the Internet.
Examples of the Clearweb are pretty obvious: Google and its search results, entertainment websites such as Starwars dot com, church websites such as Liberti church dot org, and the Pennsylvania Department of Motor Vehicles website. All of these sites will turn up from a basic Google search, as they are indexed.
The second Web I want to discuss is the Deepweb. This web is not Indexed, though it is readily available from a Web Browser, if you know where to look. And that is the key here – because these pages are not indexed, you can’t find them directly from a search engine. These pages are still hosted on the main Public Internet, but their contents aren’t there for everyone to see from a search engine. Pages on the Deepweb are deliberately not indexed, so as to remain private and secure. The fancy security word for this is Confidentiality – the information on a Deepweb page is only shared with those who SHOULD have access to it, whereas the information on a Clearweb page is indexed and searchable by anyone. Research indicates that Web Pages on the Deepweb make up approximately 96% of the pages on the Internet. That’s a lot that you don’t see – but it’s there. If the Internet is an iceberg, the Deepweb is what’s below the surface. Typically, there is a public facing page that acts as a login portal to an un-indexed Deepweb.
Examples of the Deepweb are: Government web pages. A company Intranet page. For example, if you log into your company’s Outlook Webmail page, the main login page may be indexed and found from a search engine – but the actual page where you view your mail is private and not indexed. It is on the Deepweb. When you log into your bank’s website, you typically hit their main Indexed page. But once you log in, you enter a private Deepweb that is not indexed. Private, or subscription web pages. When you log into Facebook, you are entering the realm of the private Deepweb that are hidden from public view – but only if you lock it down and choose not to share your page with everyone. People may search for your name followed by Facebook, and see your public profile – but they won’t necessarily see the photos you took of your kids, unless you allow that. You’re cloaking that information on the Deepweb.
The Darkweb (also called Darknet) is a small portion of the Deepweb that is not indexed. In addition, web sites hosted on the Darkweb are encrypted and are not available using a normal web browser. These sites must be accessed using a special encrypted browser, such as TOR. TOR, short for The Onion Router, is a program that uses several cloaking and encryption techniques to attempt confidentiality and secure access. This small portion of the Deepweb makes up about 6% of the entire Internet. The Darkweb is of particular interest to us, as parents, as most of the really bad stuff on the Internet is hosted here. If the Internet were a large flat rock, the Darkweb is the gruesome underside of that rock. As parents, our goal should be to keep our children off the Darkweb as much as possible. Because you need special software such as TOR to even access the Darkweb, it should be rather obvious that we don’t allow little Johnny to install any software he wants onto a computer. The Darkweb is one of the biggest threats to a company’s information, which explains why many companies lock down their employee resources (ie. computers) and don’t allow employees to install their own software.
Examples of the Darkweb are: The Darkweb is host to many DNMs (Darknet Marketplaces) that buy and sell things so horrific, I won’t speak of them. They are the Internet Voldemort – and must not be named. Traffic is made possible on DNMs through the use of Cryptocurrencies such as Bitcoin, with the goal to ensure a truly anonymous buying and selling environment. It is worth noting here that using TOR on the Darkweb is not a guarantee of anonymity and privacy. DNMs such as the Silk Road are proof that the Government is highly interested in what happens on the Darkweb, and can indeed find out who and where you are.Truth be told, some legitimate companies such as Facebook host a .onion web page on the Darkweb. They do this to cater to proponents of Internet Freedom and privacy advocates. Some in this camp argue that it’s more dangerous to surf the Clearweb than it is to surf the Darkweb, and have a valid point in light of all the malware, adware, metadata, tracking cookies, and so on.
Remember the old quote by Nietzsche? And if you gaze long into an abyss, the abyss also gazes into you. A large risk to connecting onto the Darkweb is that you are now connected to the filthy underbelly of the Internet – and while you are connected, you have in essence created a bridge between your computer and the underbelly. Using the Nietzsche reference, you can see them, and it stands to reason that they could see you. Don’t ever connect to the Darkweb from a computer that you aren’t completely comfortable with it becoming compromised, infected, wiped, locked with Ransomware, and so on. Better yet, just don’t connect to the Darkweb. In the Information Security realm, much of the world is framed by risk. If the benefit far outweighs the potential risk, it may be a good idea. In this specific case, that is rarely the case for a casual user. I highly recommend that – as a parent – you do everything within your power to keep your children off the Darkweb. Prevent your kids from the ability to install software such as TOR. Also consider locking down your computer’s BIOS so that kids cannot boot from a bootable CD, thus circumventing your computer’s security settings. These parasite drives can load a version of Linux such as TAILS, which is designed for confidentiality and secrecy. You don’t want your kids operating in such an environment, so follow the direction most companies have already gone, and lock the computers down.
On The Value of Metadata
I’m going to put on my privacy advocate hat for a moment, and indulge the argument that we are all large fat fruit shrubs on the Clearweb. Our surfing habits are being watched and farmed to other companies for a profit – often without our knowledge. Many privacy advocates would argue that we should have the choice, that this information that makes up our online footprint should be ours to choose to share, as we wish.
This idealistic approach is – in this day and age – not reasonable. When you choose to log into Facebook, you typically sign an agreement to give away some of your privacy. Facebook makes money by farming your metadata, and you get to share pictures of lolcats, or rant about politics, or click a thumbs up that your niece is pregnant. It’s an exchange of sorts. Everything you do on the Clearweb is being gathered, analyzed, and sold. There is no guaranteed confidentiality here. As a consumer, your goal is to strike that balance between what you receive, and what you provide. If you find enough value in a service, you will be more likely to willingly provide sell-able metadata.
Probably the best example of that balance is the streaming music service, Pandora. When you sign up for Pandora, you provide some basic information. For example, you name a band or musician that you like (metadata). Pandora then lets you listen to music that is either identical to, or similar to, the metadata you provide. In exchange for your clicking a thumbs up for a song you like, Pandora gathers more metadata, and rewards you by providing a more accurate customer experience – it will then shape your musical experience towards songs that you like. If you hear a song you really don’t like, clicking the thumbs down button gives them more metadata, and as a result, avoids playing songs similar to that one. Pandora provides you with a “free” customized music streaming service, and you provide them with metadata about your favorite musical styles. It’s the modern Internet equivalent of a Plover Bird and Crocodile symbiotic relationship.