That Bar at the Top of Your Browser: a Peek Under the Hood of the Internet.
There’s probably nothing in the modern world that more people use without understanding than the internet. Even something as simple as entering an address into the address bar in a browser is much more complicated than it first appears. So what really happens when you type an address, say https://www.holbertonschool.com for example, into the address bar and press enter? Well let’s take a look.
To really get a picture of what happens we’ll have to look at eight concepts:
- DNS request
- TCP/IP
- Firewall
- HTTPS/SSL
- Load-balancer
- Web server
- Application server
- Database
They’re not all equally important, but each play’s a role in getting you to a website. Let’s start with what it is you type in, in this case https://www.holbertonschool.com, the websites name and some identifying info (the https://) to let the browser know which protocols to use, we’ll talk about those protocols later, for now lets look at the name part, because really it’s more accurate to say it’s a nickname. Every website has a unique identifying address made up of numbers, called an internet protocol (IP) address, for the website we’re looking at the IP address is 34.234.197.104, if you type that into the address bar instead of the website’s name you’ll go right to the website. Neat, but inconvenient unless you like memorizing 10–11 digit numbers every time you want to visit a website. Words are much easier to remember and so we give the websites a nickname, but the computer still only uses IP addresses so we have to figure out what IP address each nickname corresponds to, if you wanted you could keep a list of all the websites on you computer, but that would be a huge list, not to mention you’d have to be constantly updating it every time a new website pops up, this is where DNS comes in.
DNS stands for domain name system, it’s what we use for figuring out what nickname goes to what IP address. What we’ve been calling a nickname is actually called the domain name of the website. You can think of the DNS as a big phonebook of domain names and ip addresses, when you type in a domain name your browser first talks to a DNS (usually a server somewhere else) and that DNS gives your browser the IP address. This process is known as a DNS request, literally because you request and IP address from a DNS. This is still a bit of an oversimplification, for instance, if you go to a certain website a lot your browser might save that ip address in a temporary spot called a cache to be able to skip talking to the DNS. But the important part to keep in mind is you send a request to the DNS and the DNS sends a response back with the IP address. This sending a receiving requests and responses would be simple if it weren’t computers, you could probably send a letter, but as before computers speak their own language and so a protocol was put in place to formalize how browsers do these requests, that protocol is called TCP.
TCP, or transmission control protocol, isn’t just used for the DNS, it’s used for sending and receiving data all over the internet, probably several times for you to read this article right now. words, pictures, articles, videos, when you break down the internet it’s really just sending and receiving data, TCP is a quick, and often more important, reliable way to transmit that data. That’s because TCP doesn’t just regulate how the data is sent and received, but it also makes sure each piece of it isn’t lost in transit. These pieces of data are called packets, and TCP is used to make sure they all arrive where they should.
So you typed in https://www.holbertonschool.com, you used TCP to talk the the DNS to get the ip address, now you’re well on your way to getting to the website. But first you have to pass security. Most websites use two main security measures, a firewall and ssl certification. Generally you hit the firewall first, so that’s what we’ll look at first too. In building architecture or automobile engineering a firewall is just a fireproof wall that fire can’t easily pass through, if there’s a fire in one location you don’t want it to spread. An internet is the same idea applied to networks. Websites don’t want to let anything bad pass, or anything to pass through in the wrong spot, so they put up a firewall that only allows packets of information to be passed through certain locations (sockets), a firewall could also block certain users/servers from passing bad information (prevent fire from spreading even in open sockets). The firewall is an invaluable security resource to keep any website running smoothly.
The second security measure is ssl certification. the website we’re trying to get to is https://www.holbertonschool.com you may notice it starts with https whereas some website only have http. http is hypertext transfer protocol, it’s how website transmit and use certain data (think what’s inside those packets from earlier). But what if you’re transmitting sensitive data, maybe banking details, if you use http whoever can see the packets can also read them because they are stored in the same way they are read, any computer that can read packets would have access to your banking details. To account for that we use https, similar to http but the s stands for secure. The difference is that data sent over https is encrypted so even if someone sees it partway through being sent to it’s destination they can’t read it. You can think of it like locking the packets of info inside a box. The box has two keys, a public key that anyone can use to lock stuff into the box, but can’t open it, and a private key, that only the website has to open the box and read the packets. With this scheme you can safely communicate with the website without fear of anyone eavesdropping. This system of using a lockbox with two different keys is called asymmetric key encryption, and in https the protocol used is called SSL or secure sockets layer.
So we passed security, we’re finally at the website?
Well almost. We’re safely past security accessing the IP address, but the website itself is hosted on a web server. A web server is a special computer, either real hardware or virtual, setup to host a website (send and receive packets of information over the internet). But, a web server can only handle so many packets, and therefore users, at once. We want everyone to be able to access the website who might want to, so we should setup more than one web server. That’s a good fix, but how do people know which web server to go to? remember they got the address from the DNS and the DNS is a phonebook, it only gives the same IP address for the same domain. If we have more than one server we have to devise some way to decide which one each user should use. Thankfully we have a load balancer for this exact purpose!
A load balancer is a special type of web server, instead of hosting a website, it redirects those people trying to access a website to a certain web server so that every person who wants to can get to the web site and non of the web serves stop working because of too many people trying to use them. once we pass the security measures instead of going straight to the website we take a quick detour to a load balancer to make sure we go to the right web server.
Ok, so now, past all that stuff and the load balancer, we’re actually at the website right?
Yes! Well basically. We’re at the website sure, past the balancer, on the correct server. But remember, the web server is setup to send and receive those packets of information, it has to get that information from somewhere. A web server by itself could host a website, but it would probably be a boring one. If we have that web server get and store that information somewhere else we can do much more with it than we could with just the web server. Instead of storing information on a web server, most websites store it on a dedicated database. It could be all sorts of info, data about users, maybe it’s a website for cooking and recipes are stored there. Anything to make the website work. But before you can see the website the web server has to access that information and send it to you, the user. That’s what an application server is used for.
If a web server sends and receives data over the internet, an application server sends and receives data from a database to another server. The application server goes between the web server (the piece talking to you computer) and the database (the piece that has data to make the website run) and lets them communicate so that you aren’t just accessing the website, but you can actually use it.
So? now?
Yes. Actually yes now. You typed in a website, accessed a DNS and got the IP address, got through the firewall and SSL security layer. Got to the load balancer that sent you to the right web server, then that web server communicated with an application server that used a database to send the info back to the web server to give you the website you were looking for. Phew, not so bad. All you had to do was type it in and hit enter everything else was behind the scenes.