BIO 190—Scientific Communication I

Understanding URLs

URL stands for “Uniform Resource Locator”. It is the basic way of addressing on the World Wide Web. Let’s take a look at the URL of this page and break it down into its component parts:

http://www.cpp.edu /~jcclark/classes/bio190/urls.html

http:// www.cpp.edu /~jcclark/classes/bio190/ urls.html
Protocol Server Directories File name

Protocol

This specifies what kind of Internet resource will be delivered, and thus the method (“protocol”) for delivering it.

http:// Hypertext Transfer Protocol is the method for delivering web pages. Most of the URLs you see will be this.
ftp:// File Transfer Protocol - although http can deliver any kind of file (not just a web page), this older protocol is a bit faster, and there are still many computers on the Internet set up as “FTP sites” with large numbers of files to download. You can learn more about FTP from the BIO 256 assignment.
gopher:// Gopher was the forerunner to the World Wide Web. It is organized hierarchically rather than with hyperlinks, and is virtually obsolete.
news:// (or nntp://) NetNews Transfer Protocol is the method of delivering Usenet, or NetNews (for more information, see the BIO 256 assignment). Some web browsers (but not recent Netscape) require a separate program to handle Usenet.
telnet:// Telnet is a protocol for connecting to a remote computer as a terminal. If you’ve used the bibliographic terminals in the library, that’s what telnet is like. Some library catalogs are accessible only by telnet. Telnet requires a separate program, but a telnet program is standard in Windows 98 and Windows NT.

Return to top


Server

The server is the computer that is supplying the request (the computer you are using to make the request is called the client). On the Internet, servers can have two different kinds of addresses. One, called an IP (Internet Protocol) address, consists of four two-byte numbers, often written with periods like this: 134.71.184.20. IP addresses are the actual addresses of the machines, but because they are hard to remember, there are also DNS (Domain Name Server) addresses. DNS addresses are translated into IP addresses by a computer called a domain nameserver.

Every DNS address consists of parts separated by periods. The rightmost part of a name is the “top-level domain” and the parts to the left of that specify smaller and smaller chunks of the Internet. If we look at www.cpp.edu , the “edu” (short for “educational institution”) tells us that we are talking about a university in the United States. “csupomona” tells us that the university is Cal Poly Pomona. “www” tells us that we are using the specific computer that handles World Wide Web traffic.

The most important thing to a web user about a DNS name is that you can’t reliably change it and still connect to the webserver. You can learn a lot about a server, though, through its name, especially the top-level domain. Here are some domain names and what they mean.

.edu Colleges and universities in the United States. Community colleges and k-12 schools are in the .us domain (see below)
.com Originally businesses, primarily in the U.S. or with U.S. offices; now, anyone can be a .com (“commercial”)
.org Originally used by (but not restricted to) non-profit organizations, primarily in the U.S.; now, anyone can be a .org
.gov Federal and state governments, some local governments (most local governments seem to be in the .us domain).
.net Originally, organizations that were major players in the hardware of the Internet; now, anyone can be a .net
.mil U.S. military
These domains were part of the original Internet when it was entirely in the United States. As other countries joined, country domain names were added.
.ca.us The .us means United States, the .ca California. K-12 public schools and local governments are here, rather than .edu or .gov.
.ca This .ca, as a top-level domain, is Canada, not California
.mx Our other NAFTA neighbor, México.
.ac.uk The .uk says United Kingdom (Britain), the .ac tells us it is a British university (“academic”). A commercial site in the U.K. would be “.co.uk”.
.tv This is Tuvalu, an island nation in the south Pacific, that sells its domain to people who want to convey “television”, another meaning of the abbreviation. Other countries making use of this angle are Moldova (.md), Niue (.nu), Tonga (.to), and Western Samoa (.ws)

Return to top


Directories

The World Wide Web began on computers running the Unix operating system. Unix organizes files in directories, much like the directories of MS-DOS and Windows or the folders of Macintosh. The way directories are specified in Unix was the basis for DOS specification (except that DOS switched from the forward slash “/” to the backslash “\”). “/~jcclark/classes/bio190/” tells us (or the web server) to look in the user directory of jcclark (the tilde “~” is a Unix convention meaning your main directory when you log in) for a directory called “classes”, and then to look in “classes” for a directory called “bio190”.

There is one very important feature of Unix that is “missing” from DOS, Windows, and Macintosh that every Web user needs to be aware of: Unix file and directory names are Case-Sensitive (caPITaLizaTIon MaTTerS). The directory names “bio190” and “BIO190” would be the same in DOS, Windows, or Mac, but in Unix they are different names, so that if you tried to get to this page using BIO190 instead of bio190, it wouldn’t work. When you are recording a URL, respect capitalization completely. DNS names are not case-sensitive, so that csupomona.edu and CSUPomona.Edu are identical. Also, not all web servers run Unix. But there’s no point in taking chances: keep the capitalization. Also, if you get a URL that should work but doesn’t, try changing capitalization of the directory or file names.

Another interesting feature of directory and file names most often comes up with files created with Macintosh or Windows 95. Either of these can contain a space, but URLs can’t contain spaces, so the spaces are replaced with the code %20.

Return to top


File name

Sometimes a URL ends in a directory name: http://www.cpp.edu /~jcclark/. If the directory contains an index file (the name differs with different servers—index.html, welcome.html, homepage.html), that file will be displayed. If not, a listing of the files in that directory will be displayed. To display any file that is not an index file, its name must be specified. Like directory names, file names in Unix are case-sensitive. Most web files end in .html or .htm. Files ending in .cgi, .pl, .asp, .php, or residing in a directory called cgi-bin are programs that create web pages or do other activities in response to user input (for example, filling out a form). The http protocol allows files other than web files to be transfered. Your browser will handle these different ways, depending on how it’s set up. There are graphics files (.jpg, .gif), sound files (.mp3, .au, .aif, .mid, .ra), videos (.mov, .mpg), and much more.

Return to top

Citation: Clark, Curtis. 2001. BIO 190 - Understanding URLs. California State Polytechnic University, Pomona, http://www.cpp.edu /~jcclark/classes/bio190/urls.html.

 

These are official class materials of BIO 190 as taught at California State Polytechnic University, Pomona, by Curtis Clark. They are subject to change without notice to anyone but students currently enrolled in the class.

Summer Quarter, 2001
© 2001 by Curtis Clark
jcclark@csupomona.edu