Understanding HTTP

DON'T post new tutorials here! Please use the "Pending Submissions" board so the staff can review them first.
Post Reply
User avatar
DNR
Digital Mercenary
Digital Mercenary
Posts: 6114
Joined: 24 Feb 2006, 17:00
18
Location: Michigan USA
Contact:

Understanding HTTP

Post by DNR »

HTTP is a request/response standard between a client and a server. A client is the end-user, the server is the web site. The client making a HTTP request - using a web browser, spider, or other end-user tool - is referred to as the user agent.
It establishes a Transmission Control Protocol (TCP) connection to a particular port on a host (port 80 by default). An HTTP server listening on that port waits for the client to send a request message. Upon receiving the request, the server sends back a status line, such as "HTTP/1.1 200 OK", and a message of its own, the body of which is perhaps the requested file, an error message, or some other information.

Image A screen shot of the terminal in Ubuntu showing a telnet session that contains a HTTP request/response made to the main page on the English Wikipedia. Part of the response has been removed.

----

Request methods

HTTP defines eight methods (sometimes referred to as "verbs") indicating the desired action to be performed on the identified resource.

HEAD
Asks for the response identical to the one that would correspond to a GET request, but without the response body. This is useful for retrieving meta-information written in response headers, without having to transport the entire content.
GET
Requests a representation of the specified resource. By far the most common method used on the Web today. Should not be used for operations that cause side-effects (using it for actions in web applications is a common misuse).
POST
Submits data to be processed (e.g. from an HTML form) to the identified resource. The data is included in the body of the request. This may result in the creation of a new resource or the updates of existing resources or both.
PUT
Uploads a representation of the specified resource.
DELETE
Deletes the specified resource.
TRACE
Echoes back the received request, so that a client can see what intermediate servers are adding or changing in the request.
OPTIONS
Returns the HTTP methods that the server supports for specified URL. This can be used to check the functionality of a web server by requesting '*' instead of a specific resource.
CONNECT
Converts the request connection to a transparent TCP/IP tunnel, usually to facilitate SSL-encrypted communication (HTTPS) through an unencrypted HTTP proxy.
----
Status codes
200 OK
301 Moved permanently
302 Found
303 See Other
403 Forbidden
404 Not Found
-----

Request Headers

If-Modified-Since: This contains the date of the last time your web browser fetched this web page. Basically it says to the web server "only send me the page if it's been changed since this date". Deleting this header will force a web page to fully reload even if it's already in your browser's cache (with IE you'll also need to delete the "Last-Modified" header).

Referer: This contains the URL of the web page you previously came from. For security reasons web browser are supposed to give you the option not to send this information, but few browsers do. It's usually safe to delete although it can have an odd effect on things like free web counters, which sometimes use it to track which web page they're keeping count of. If enabled for outgoing messages, You can set a default Referer filter that just sends a page a spoofed URL here - Sam Spade is one such tool that can do that.

User-Agent: This contains information about your web browser and usually your operating system as well. Normally, it's just informational - you can delete this header or even send your own information here if you like.

Host: Contains the host name of the web page your contacting (as in "www.somewhere.com"). It commonly used by many web servers for "virtual hosting" where the same web server is used for several different sites. Generally leave this one alone.

Accept: Contains a list of file and image type your browser understands. Another one you usually need not modify.

Accept-Language: Contains a list of preferred languages - intended for allowing multi-lingual web pages the automatically choose your language (though seldom really used for this). This is another header that the original HTTP specs said should be under user control. Sometimes you may not wish everyone to know your native language.

Accept-Charset: The character sets your browser understands. Generally best let alone.

Cookie: This is perhaps one of the most infamous, and misunderstood of all headers. Cookies contain a line of information originally send by a web server to your browser. All your browser ever does with them is send them back to the server unchanged. Most often they contain some sort of user identification the server uses to tell you apart from other visitors, but the server has to know this information already through other means to create the cookie in the first place. By altering a cookie, you can really "mess with a server's head" so to speak and confuse it to who you really are. Depending on the situation, this can be a good or bad thing.

You can use a cookie filter to stop all cookies from being sent, or by including a URL match, send cookies only to certain trusted sites.

Pragma: no-cache Often sent by web browsers when a page is reloaded. It's intended to tell any remote proxy to send a fresh version of the page (instead of a copy it may have stored).

-----
Reply Headers

Server: Contains the Name and version of the web server. Simply informational and not used by your browser. Sort of the reverse of the "User-Agent" header. Note: This can be Spoofed by the webmaster to confuse skript kiddies into wasting time looking for the wrong exploits or the wrong web hacking tool.

Cache-control: affect how pages are stored in your browser's cache. A value of "Private" indicated the contents should not be stored while "max-age" give an indication of how long the page should be kept.
Pragma: no-cache In a reply, this Indicates that your web browser should not store the page in its cache. Often used for temporary web pages like search engine results.

Expires: Another header used by your browser's cache - contains a date when the web page's contents will have probably changed. It usually is only speculative.

Date: The web server's idea of the current date and time.

Last-Modified: The date and time the web page was last modified. Also used by your browser's cache. For earlier versions of Internet Explorer, you may have to delete this header along with "If-Modified-Since" to "force" the browser to always reload a web page.

Content-Type: Contains the type of data the server is sending, for example "text/html" is used for a web page and "image/gif" for a .GIF file.

Accept-Ranges: Part of HTTP 1.1 - "Bytes" indicates a server supports retrieving arbitrary sections of a file. Used by some download utilities to "resume" an interrupted file transfer.

ETag: A little like a checksum (but only very little). It contains a string that is supposed to change every time the web page is updated. The string has no real meaning other than that. Again something to be used by your browser's cache.

Connection: Another HTTP 1.1 header - "close" indicates no more data will be send on this connection (as was always the case in HTTP 1.0). HTTP 1.1 supports the idea of "persistent connections" where the same connection is "reused" to send multiple items.

Content-Location: The URL where the data came from - not always present.

Location: This is used by webservers to redirect you to a different URL (in a 302 Redirect reply).

Content-Length: The length in bytes of the web page or file being sent.

Set-cookie: A request to your browser to store the information contained in this header and send it back to this sever the next time you visit the web page. Delete this, and your browser will never receive any cookies. In the Set-Cookie filter, the URL match can be used to selectively accept cookies. Also you can make cookies "session only" so they're not permenently stored by your browser. Often this will work better at places where killing the cookie altogether will cause a site to complain.

For fun, Check out suck-o's set-cookie
"Set-Cookie lang=english
expires=Mon, 19 Oct 2009 03:22:17 GMT
Vary Accept-Encoding
Server Uncle Poopy's v1.272
Date Sun, 19 Oct 2008 03:22:17 GMT
Content-Type text/html "

Note Spoofed server type!
--------------------------------------------------------------------------------

In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair. In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be reused for more than one request.

Such persistent connections reduce lag perceptibly, because the client does not need to re-negotiate the TCP connection after the first request has been sent.

Version 1.1 of the protocol made bandwidth optimization improvements to HTTP/1.0. For example, HTTP/1.1 introduced chunked transfer encoding to allow content on persistent connections to be streamed, rather than buffered. HTTP pipelining further reduces lag time, allowing clients to send multiple requests before a previous response has been received to the first one. Another improvement to the protocol was byte serving, which is when a server transmits just the portion of a resource explicitly requested by a client.
---
HTTPS: is a URI scheme syntactically identical to the http: scheme used for normal HTTP connections, but which signals the browser to use an added encryption layer of SSL/TLS to protect the traffic. SSL is especially suited for HTTP since it can provide some protection even if only one side of the communication is authenticated. This is the case with HTTP transactions over the Internet, where typically only the server is authenticated (by the client examining the server's certificate).
-------------------------------------------------------------------------------

Using Telnet is a great way to learn about HTTP requests.

Here is a simple HEAD request to microsoft.com via telnet.

$ telnet microsoft.com 80
Trying 207.46.232.182...
Connected to microsoft.com.
Escape character is '^]'.
HEAD / HTTP/1.0

HTTP/1.1 301 Moved Permanently
Connection: close
Date: Mon, 19 Oct 2009 03:22:17 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
Location: http://www.microsoft.com
Content-Length: 31
Content-Type: text/html
Set-Cookie: ASPSESSIONIDSCAQCSBR=FMPJMMPAMGNBFELIPABIHHMN; path=/
Cache-control: private

Connection closed by foreign host.
The command above was simple. HEAD / HTTP/1.0 followed by 2 line feeds.

The 80 specified in the telnet command is the port that you are hitting when you type http://microsoft.com/ in a browser. If another port is used you will see it after a colon. ex: http://DNR.com:8080/ hits the server running on port 8080.

When doing GET commands you usually end up sending headers with your command. You should always send the Host header (this isn't required for HTTP/1.0 but many servers are running multiple sites so you'll want to send this.)

Here's an example of a GET against suck-o's home page.

$ telnet suck-o.com 80
Trying 62.75.148.170...
Connected to suck-o.com.
Escape character is '^]'.
GET / HTTP/1.1
Host: suck-o.com

HTTP/1.1 200 OK
Date: Mon, 19 Oct 2009 03:22:17 GMT
Server: Uncle Poopy's v1.272
MS-Author-Via: DAV
Last-Modified: Wed, 11 Jul 2007 14:10:28 GMT
ETag: "19cf7aa-68d-4694e4d4"
Accept-Ranges: bytes
Content-Length: 1677
Content-Type: text/html

<!DOCTYPE html PUBLIC "-//W3C//DTD
The contents of the page would be in text form of course. Why did it return the entire page? Because we used a GET instead of HEAD command.

---

DNR
Last edited by DNR on 19 Oct 2008, 12:26, edited 1 time in total.
-
He gives wisdom to the wise and knowledge to the discerning. He reveals deep and hidden things; he knows what lies in Darkness, and Light dwells with him.

TheKingOfHearts
Moderator
Moderator
Posts: 901
Joined: 18 Sep 2006, 16:00
17
Location: on my Throne
Contact:

Post by TheKingOfHearts »

awesome article DNR!

i enjoyed reading it and learned a bit more about HTTP
[url=http://img338.imageshack.us/img338/2034/oopsrg8.gif]/sig[/url]

User avatar
Uner
Fame ! Where are the chicks?!
Fame ! Where are the chicks?!
Posts: 123
Joined: 08 Dec 2007, 17:00
16
Location: 45*26'57.41* N

Post by Uner »

Thumbs up. :D

User avatar
ayu
Staff
Staff
Posts: 8109
Joined: 27 Aug 2005, 16:00
18
Contact:

Post by ayu »

Nice DNR Oo I'll read this later for sure =o (have to study atm)
"The best place to hide a tree, is in a forest"

Post Reply