URL stands for “Uniform Resource Locator”. It is the basic way of addressing on the World Wide Web. Let’s take a look at the URL of this page and break it down into its component parts:
This specifies what kind of Internet resource will be delivered, and thus the method (“protocol”) for delivering it.
|http://||Hypertext Transfer Protocol is the method for delivering web pages. Most of the URLs you see will be this.|
|ftp://||File Transfer Protocol - although http can deliver any kind of file (not just a web page), this older protocol is a bit faster, and there are still many computers on the Internet set up as “FTP sites” with large numbers of files to download. You can learn more about FTP from the BIO 256 assignment.|
|gopher://||Gopher was the forerunner to the World Wide Web. It is organized hierarchically rather than with hyperlinks, and is virtually obsolete.|
|news:// (or nntp://)||NetNews Transfer Protocol is the method of delivering Usenet, or NetNews (for more information, see the BIO 256 assignment). Some web browsers (but not recent Netscape) require a separate program to handle Usenet.|
|telnet://||Telnet is a protocol for connecting to a remote computer as a terminal. If you’ve used the bibliographic terminals in the library, that’s what telnet is like. Some library catalogs are accessible only by telnet. Telnet requires a separate program, but a telnet program is standard in Windows 98 and Windows NT.|
The server is the computer that is supplying the request (the computer you are using to make the request is called the client). On the Internet, servers can have two different kinds of addresses. One, called an IP (Internet Protocol) address, consists of four two-byte numbers, often written with periods like this: 126.96.36.199. IP addresses are the actual addresses of the machines, but because they are hard to remember, there are also DNS (Domain Name Server) addresses. DNS addresses are translated into IP addresses by a computer called a domain nameserver.
Every DNS address consists of parts separated by periods. The rightmost part of a name is the “top-level domain” and the parts to the left of that specify smaller and smaller chunks of the Internet. If we look at www.cpp.edu , the “edu” (short for “educational institution”) tells us that we are talking about a university in the United States. “csupomona” tells us that the university is Cal Poly Pomona. “www” tells us that we are using the specific computer that handles World Wide Web traffic.
The most important thing to a web user about a DNS name is that you can’t reliably change it and still connect to the webserver. You can learn a lot about a server, though, through its name, especially the top-level domain. Here are some domain names and what they mean.
|.edu||Colleges and universities in the United States. Community colleges and k-12 schools are in the .us domain (see below)|
|.com||Originally businesses, primarily in the U.S. or with U.S. offices; now, anyone can be a .com (“commercial”)|
|.org||Originally used by (but not restricted to) non-profit organizations, primarily in the U.S.; now, anyone can be a .org|
|.gov||Federal and state governments, some local governments (most local governments seem to be in the .us domain).|
|.net||Originally, organizations that were major players in the hardware of the Internet; now, anyone can be a .net|
|These domains were part of the original Internet when it was entirely in the United States. As other countries joined, country domain names were added.|
|.ca.us||The .us means United States, the .ca California. K-12 public schools and local governments are here, rather than .edu or .gov.|
|.ca||This .ca, as a top-level domain, is Canada, not California|
|.mx||Our other NAFTA neighbor, México.|
|.ac.uk||The .uk says United Kingdom (Britain), the .ac tells us it is a British university (“academic”). A commercial site in the U.K. would be “.co.uk”.|
|.tv||This is Tuvalu, an island nation in the south Pacific, that sells its domain to people who want to convey “television”, another meaning of the abbreviation. Other countries making use of this angle are Moldova (.md), Niue (.nu), Tonga (.to), and Western Samoa (.ws)|
The World Wide Web began on computers running the Unix operating system. Unix organizes files in directories, much like the directories of MS-DOS and Windows or the folders of Macintosh. The way directories are specified in Unix was the basis for DOS specification (except that DOS switched from the forward slash “/” to the backslash “\”). “/~jcclark/classes/bio190/” tells us (or the web server) to look in the user directory of jcclark (the tilde “~” is a Unix convention meaning your main directory when you log in) for a directory called “classes”, and then to look in “classes” for a directory called “bio190”.
There is one very important feature of Unix that is “missing” from DOS, Windows, and Macintosh that every Web user needs to be aware of: Unix file and directory names are Case-Sensitive (caPITaLizaTIon MaTTerS). The directory names “bio190” and “BIO190” would be the same in DOS, Windows, or Mac, but in Unix they are different names, so that if you tried to get to this page using BIO190 instead of bio190, it wouldn’t work. When you are recording a URL, respect capitalization completely. DNS names are not case-sensitive, so that csupomona.edu and CSUPomona.Edu are identical. Also, not all web servers run Unix. But there’s no point in taking chances: keep the capitalization. Also, if you get a URL that should work but doesn’t, try changing capitalization of the directory or file names.
Another interesting feature of directory and file names most often comes up with files created with Macintosh or Windows 95. Either of these can contain a space, but URLs can’t contain spaces, so the spaces are replaced with the code %20.
Sometimes a URL ends in a directory name: http://www.cpp.edu /~jcclark/. If the directory contains an index file (the name differs with different servers—index.html, welcome.html, homepage.html), that file will be displayed. If not, a listing of the files in that directory will be displayed. To display any file that is not an index file, its name must be specified. Like directory names, file names in Unix are case-sensitive. Most web files end in .html or .htm. Files ending in .cgi, .pl, .asp, .php, or residing in a directory called cgi-bin are programs that create web pages or do other activities in response to user input (for example, filling out a form). The http protocol allows files other than web files to be transfered. Your browser will handle these different ways, depending on how it’s set up. There are graphics files (.jpg, .gif), sound files (.mp3, .au, .aif, .mid, .ra), videos (.mov, .mpg), and much more.
Citation: Clark, Curtis. 2001. BIO 190 - Understanding URLs. California State Polytechnic University, Pomona, http://www.cpp.edu /~jcclark/classes/bio190/urls.html.