loader spinner

Understanding HTTP protocol

January 13, 2020 5:53am - 3 min read

HTTP protocol is a client-server protocol which allows the fetching of resources such as HTML documents (HTTP can also be used to fetch parts of documents to update Web pages on demand.). Its the foundation of any data exchange on the web.

Clients and servers communicate by exchanging individual messages. The message sent by client, usually a web browser are called requests and the messages sent by the server as an answer are called responses.

HTTP is an extensible protocol that evolved over time, initially designed in early 1990s. It is an application layer protocol that is sent over TCP or over a TLS-encrypted TCP connection.

Components of HTTP-based systems

Mainly there are two parties.

  • User-agent (Most of the time the user-agent is a web browser, but it can be anything, for example a robot that crawls the web to populate and maintain a search engine index)
  • Server

Between the client and the server there are numerous entities, collectively called proxies, which perform different operations and act as gateways or caches. In reality, there are more systems between a browser and the server handling the request: such as routers, modems.

Client: the user-agent

The browser is always the entity initiating the request. It is never the server (though some mechanisms have been added over the years to simulate server-initiated messages).

The web server

On the opposite side of the communication channel, is the server, which serves the document as requested by the client. 

Proxies

Between the web browser and the server, numerous computers and machines relay the HTTP messages. Most of these operate at the transport, network or physical levels and potentially making a significant impact on performance. Those operating at the application layers are generally called proxies. Proxies can alter the message as they receive or pass message to the server without modification!

Proxies may perform numerous functions:

  • caching (the cache can be public or private, like the browser cache)
  • filtering (like an antivirus scan or parental controls)
  • load balancing (to allow multiple servers to serve the different requests)
  • authentication (to control access to different resources)
  • logging (allowing the storage of historical information)

HTTP is stateless, but not sessionless

HTTP is a stateless protocol, which means that the connection between the browser and the server is lost once the transaction ends. Due to this nature of the protocol, neither the client nor the browser can retain information between different request across the web pages.

But some web applications may have to track the user’s progress from page to page, for example when a web server is required to customize the content of a web page for a user. Solutions for these cases include:

  • the use of HTTP cookies.
  • server side sessions,
  • hidden variables (when the current page contains a form), and
  • URL-rewriting using URI-encoded parameters, e.g., /index.php?session_id=some_unique_session_code.

HTTP/2 is stateful. HTTP 1 is stateless. Later additions intended for HTTP 1, like cookies, added state. Those additions are not a part of the “core” HTTP 1 specification. This is why HTTP 1 is said to be a stateless protocol although in practice it is not. HTTP/2 on the other hand was designed with stateful components baked in.

YearHTTP version
19910.9
19961.0
19971.1
20152.0
20183.0

HTTP/3 is the proposed successor to HTTP/2, which is already in use on the web, using UDP instead of TCP for the underlying transport protocol. Like HTTP/2, it does not obsolete previous major versions of the protocol. Support for HTTP/3 was added to Cloudflare and Google Chrome (Canary build) in September 2019. Support in Firefox Nightly arrived in November 2019.

Source: https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

Last updated on: January 13, 2020 5:53am