Building a World Wide Web Site


Gary Kessler
September 1996

An edited version of this paper appeared with the title
"Building a World Wide Web Site" in LAN Magazine, December 1996.



The Internet is possibly the most exciting development in networking in this decade, at least in terms of how a network service affects non-techie, end users. And without doubt, the World Wide Web (WWW) is the most exciting development on the Internet in the last several years, and one of the two main applications that bring everyday people to the Net in ever-increasing numbers (the other application being e-mail).

The WWW introduced the ability for Internet documents to incorporate text, image, graphics, audio, video, and animation, as well as links to other documents at other sites. The impact on network bandwidth has been incredible; WWW packets account for more than 40% of all Internet traffic today. This use of bandwidth is felt not only by the network service providers, but has a direct impact on customer's access links and the servers themselves.

One of the most important impacts of the Web is the fact that literally anyone can become an Internet publisher. How pervasive has the Web become? Just look at the growth in the number of hosts named www: 17,000 in July 1995, 76,000 in January 1996, and 212,000 by July 1996. And not all WWW hosts are named www! And a growing number of Internet service providers (ISPs) offer Web page hosting as part of its basic service for customers.

This article will discuss the technical and conceptual steps in setting up a Web server on your own network (or within an intranet); namely, configuring the server software, getting connected to the network, determining performance bottlenecks, and security. In addition, some of the more global design aspects of setting up a site will be discussed, such as focusing the site, planning the layout, and maintenance. As it happens, these two processes must be done somewhat in parallel since decisions about the type of server hardware, software, and access method will be driven by your desired capabilities, and the things you are able to do at the site will be driven by the capabilities of the server and communications facility.

BACKGROUND TERMS AND CONCEPTS

There are a number of Web-related issues that are beyond the scope of this article but that are critical to understanding if you are going to set up a site. To begin with, WWW documents are written in the Hypertext Markup Language (HTML), a simple set of text-formatting commands embedded within the document. HTML will not be discussed further in this article; a large number of books have been written on this subject and an excellent primer on HTML basics is available on the Internet from the National Center for Supercomputer Applications (NCSA) at http://www.ncsa.uiuc.edu/demoweb/html-primer.html.

To view a WWW document (called a page), users must have a local WWW client, called a browser. There are many WWW browsers available that operate on nearly all hardware and software platforms, and nearly all commercial TCP/IP software packages come with a browser. There are also proprietary browsers used by some of the commercial online services (such as AOL), but nearly all such browsers are yielding to "standard" browsers such as Netscape Navigator or Microsoft Internet Explorer. These two companies, in fact, have emerged as the dominant players today in the browser market.

Web pages are stored on servers. HTML documents, which are nothing more than ASCII text-based files, and other WWW files are accessed via an exchange of Hypertext Transfer Protocol (HTTP) messages between the server and the client (browser). A Web server, then, is actually nothing more than a TCP/IP host running an HTTP application over TCP. Note that a separate TCP virtual circuit must be established for every Web file that is downloaded. Thus, if your Web page has text and four embedded graphics files, five TCP connections must be established to download the five files. Note further that the multiple connections may be made simultaneously or sequentially depending on the software configuration, capability of the server, and current traffic load.


TABLE 1. Sample URL Definitions.
PROTOCOL URL FORMAT/EXAMPLE
File reference file://hostname/path/filename
file://altamont.hill.com/winword/papers/website.doc
FTP ftp://username:password@hostname:port/path/filename
ftp://ftp.isi.edu/mbone/faq.txt
Gopher gopher://host:port/gopher-path
gopher://hepnrc.hep.net:70/11/networks/bonding
HTTP http://host:port/directory/filename?searchpart
http://www.tvnet.com/cgi-bin/imagemap/menu?267,241
Telnet telnet://username:password@host:port/
telnet://quake@geophys.washington.edu:79/
E-mail mailto:e-mail_address
mailto:gkessler@bbn.com


Finally, with the use of the File Transfer Protocol (FTP), Telnet, Gopher, HTTP, and other protocols over the Web, it became increasingly cumbersome to write down a host address, user name, password, protocol port number, directory path, file name, search string, and other relevant information in a clear fashion. Uniform Resource Locators, or URLs, were designed to provide such a shorthand, as shown in Table 1.

SELECTING THE WEB SERVER AND COMMUNICATIONS FACILITY

There are Web server products on the market for nearly every type of hardware and every operating system, providing users with significant choice in the selection of hardware and software. All of the products have slightly different capabilities, of course, and the main criteria are cost, adherence to Web standards, processing power, security, and additional features, such as the capability of the server to communicate with local databases. The largest body of server software is available for the various dialects of UNIX, with Windows NT not far behind; but WWW server software is also available for Windows 95, Windows 3.1, NetWare, OS/2 Warp, and MacOS.

Before selecting the server, there are a number of issues that the user must address. The most important one is to match the capabilities of the server hardware, server software, and communications facility to the expected use of the site. This means backing up even one more step and determining why you are building a server, and who the target audience is, what kinds of applications you will be running.

In general, you want the Web server to be a pretty powerful system to avoid common bottlenecks. First, you want the fastest processor that is reasonable; a small Web site may be able to get by with a 486 machine while a large Web site may need multiple Alpha workstations. Second, use software that matches the sophistication of your site. Shareware Web server software may suffice for your intended application but a commercial package may employ a better TCP/IP and HTTP implementation, not to mention better security. Third, load up on memory and disk space. This is particularly important because more memory will allow more simultaneous TCP connections which, in turn, will support more simultaneous users and/or faster downloads. Be sure that you use fast memory and disk controllers. Consider also the Web-based applications that you intend to provide; if your site has only text and simple graphics, you may not need as powerful a processor as if you are providing real-time audio and video.

The communications facility will, of course, play a big role as well. For general Internet users, there are three primary ways to gain access to the Internet; via a dial-up host account with an online provider such as AOL, CompuServe, or Netcom; a dial-up IP connection using SLIP or PPP; or a dedicated connection of some sort. If you want to host a Web server, the only effective option is for the server to be directly connected to the Internet 24 hours a day, 7 days a week. That means that either the Web server is physically a part of your own local network or the server is part of your provider's network; in either case, it is connected to the Internet on a full-time basis and is always accessible.

One technical aspect of this connection is the speed and responsiveness of the communications facility. In the late-1980s, the Internet backbone was comprised of what was then considered to be high-speed links -- 56 kbps! The backbone has since been upgraded to T1 (1.5 Mbps), T3 (45 Mbps), and OC-3 (155.52 Mbps); and OC-12 (622 Mbps) is expected in early 1997 [AUTHOR'S NOTE: MCI's upgrade was completed in November 1996]. So, what's the right access speed to your premises?

The answer lies in several factors: what is available, what you can afford, and what your application requires. Just a few years ago, a "dedicated Internet connection" meant a leased line; since leased lines are billed on a mileage basis, this effectively limited users to a very local ISP. Today, frame relay, Switched Multimegabit Data Service (SMDS), and Asynchronous Transfer Mode (ATM) access allows users to be literally hundreds or thousands of miles from their ISP and still have "dedicated" access. Dedicated access at 56 kbps is about the lowest commonly used speed and can support Internet access by multiple users on a LAN. Faster speeds, of course, are what everyone wants; users today justify high-speed Internet access because it lessens response time rather than the traditional justification of high volume. A T1 local loop, however, may cost up to ten times that of a 56 kbps local loop.

But the application will also drive the solution. If you are running a site that contains simple text and some basic company information, 56 kbps may be quite adequate. If you are selling an online product, however, faster speeds will almost undoubtedly be warranted, especially if customers will be downloading large files or engaging in some real-time application with your server. In some instances, the communications facility operates at an adequate speed but having a single Web server is insufficient!

Another related, but not-so-technical issue, is that of the server's domain name. Every host on the Internet has a numeric IP address; almost all also have a name. The domain portion of the name identifies the owner (or apparent owner) of the subnet where the Web server resides. End user organizations access the Internet through an ISP. For routing and addressing efficiencies, most ISPs are assigned a block of addresses. If a user has a dedicated connection to the Internet through their ISP, they will usually have the option of obtaining a domain name and will be assigned an address within their ISP's address block.

If your company operates a Web server, you probably want the server to have your domain name. In some industries, this is not just a matter of snob appeal anymore, but a very real one of credibility. My family's Internet access, for example, is through a local ISP called Together Networks, and my personal home page can be found at http://www.together.net/~kessler/gck.html. This is fine for me as a person since most people don't expect individuals to necessarily have their own domain name (yet!).

But how much credibility would a network provider, equipment manufacturer, or service organization have without their own domain? BBN's home page, for example, is at http://www.bbn.com, much more "professional" than if it were at an address such as http://www.websrus.vt.us/~bbn/home.html.

Your ISP should be able to help you register a domain name; if they are unable to help you, think twice before doing business with them. If they won't allow you to register a domain name because they won't do the routing or manage the domain name system (DNS) database, then you should probably find another ISP. As an aside, domain name registration in the com, edu, gov, net, and org domains costs $50 per year.

It is well beyond the scope of this article to discuss all of the issues related to finding an ISP. There are, however, between 1500 and 2000 ISPs in the U.S. The so-called "second tier" ISPs -- those directly connected to the Internet backbone network or the Network Access Points (NAPs) -- are national providers such as AT&T, BBN Planet, CompuServe, MCI, Netcom, and Sprint. The "third tier" ISPs connect to the backbone via a second tier provider.

This distinction is important. A second tier network has more control over its network than one of its customers (e.g., the third tier provider), but that may or may not translate into better service for you. One might expect a higher level of technical expertise at a large ISP than at a small one, but that assumption may not be valid; a local ISP may be more responsive to your needs and provide you with a higher level of service than a bigger provider, particularly if you are a small company.

Remember that if you have dedicated access to the Internet, your ISP provides a line to your router. Most of the technical work is on your network; your ISP is primarily a conduit. But do be sure that they will provide DNS support and/or a backup DNS to your local system, and a backup mail relay system. Access to NETNEWS may or not may not also be an important criteria.

There are many places on the Internet to get lists of ISPs. Two places to start are the Internet Society (http://www.isoc.org/~bgreene/nsp-index.html) and Yahoo! (http://www.yahoo.com/Business_and_Economy/Companies/Internet_Services/).

CONFIGURING THE SERVER SOFTWARE

Most Web server software installs easily, but some rudimentary configuration is necessary before operation is possible. After acquiring the software, configuration files must be created (or edited); some server software gives you just a few options while some other software has many configuration files. Usually, default entries can be used for most of the parameters. Parameters that almost always need to be configured include:

As stated above, the number of configurable parameters varies widely. The NCSA HTTPd server, for example, has many configuration files with many configurable parameters (a portion of one such file is shown in Table 2), while NetManage's Personal Web Server for Windows 95 has only four configurable parameters.


TABLE 2. A Portion of the HTTPd Configuration Files (Linux).
# This is the main server configuration file. It is best to 
# leave the directives in this file in the order they are in, or
# things may not go the way you'd like. See URL http://hoohoo.ncsa.uiuc.edu/
# for instructions.

# ServerType is either inetd, or standalone.
ServerType standalone

Port 80

# User/Group: The name (or #number) of the user/group to run httpd as.
User nobody 
Group #-1

ServerAdmin webmaster@websrus.vt.us
ServerRoot /usr/local/httpd
ErrorLog logs/error_log
RefererLog logs/referer_log
ServerName www.websrus.vt.us



==============   ------------   ---------------    -------------   ------------
| Why build  |   | Assemble |   |   Create    |    | Implement |   | Maintain |
|  a site?   |-->|  design  |-->| (& revisit) |-+->|   ideas!  |-->|  Update  |
|            |   |   team   |   |   content   | |  -------------   | Respond  |
|    What    |   ------------   --------------- ^                  ------------
|capabilities|                                  |
|   do you   |   -------------   ---------   ---+------
|    want?   |   |  Select   |   |  Get  |   |  Get   |
|            |-->|  server   |-->|  ISP  |-->| domain |
==============   | h/w & s/w |   | conn. |   |  name  |
                 -------------   ---------   ----------

FIGURE 1. Steps in building a Web site.



GOALS OF YOUR WEB SITE

The first step in designing a web site is to identify one or more goals of the site. If a site is being used for marketing purposes (which will be true for any commercial organization), it is imperative that you articulate why you are building a site.

This is actually harder than it may appear. Every page at the site should have a purpose. Are you trying to sell a product, promote your organization or company, provide a public service, sway public opinion, or what? Answers to these questions will start to dictate the layout and framework of your site, as well as the content, which may range from the serious to the frivolous. Answers to these questions may change over time.

Who is the target audience? Answering this question will help you understand the likely sophistication and Internet access method of the expected user population. As much as possible, orient your site to the capabilities of the audience.

Determine the organizational resources that will be made available. This includes personnel (e.g., the Webmaster), communications facilities and network resources, and computers systems (e.g., the Web server).

Finally, providing services or accepting money over the network requires special attention. If you are using your site to sell products, then care must be taken to ensure that the server and the network can keep up with demand. Web-based commerce is particularly intriguing; just about anything that can be purchased can be purchased over the Web, ranging from clothes and books to vacations and tickets to sporting events. Web-based commerce may be very lucrative; revenues from Web-based transactions reached nearly $500 million in 1995 and other studies suggest that these revenues will reach $6.6 billion by 2000, $46 billion by 1998, and $1 trillion by 2001, indicating that no one really knows what the future holds!

It is important, however, that the value of your site not be judged solely on the direct revenue that it brings in. In many industries today, a Web site is as much a part of basic business communication as a telephone and fax machine. If you must cost-justify your presence on the Web, try to estimate the amount of revenue not realized due to your absence. Look at your competition. If they're on the Web, you'd better be there, too; if they're not, you have a tremendous opportunity.

SITE DESIGN PLAN

The second step in designing the site is to create a site design plan. In some cases, you may want to go to a third-party that specializes in the creation and storage of Web pages. But if you do it within your own organization, create a project team that comprises someone from every corporate entity that has an interest in the server, including network operations, marketing, art designers, etc.

Identify who (or what group) will design the overall layout, as well who will actually design the individual pages and supply content. Identify who has the technical skills to write HTML, Java scripts, PERL code, CGI forms, surveys, etc. Determine where the site will be physically hosted. Go to other Web sites on the Internet and learn from what you like and don't like. Determine how your site can extend -- or define -- your company's "marketing" image.

As you design the page, focus on content, organization, and usefulness. Don't overdo the graphics, audio, video, frames, CGI, Java, animation, and other high-tech jazz; while they may be cute the first one or two times someone visits the site, after that they're potentially annoying and slow down access. In some cases, some of these features are actually security risks. If these features are important to your site's content, then by all means use them; but if they're just their for flash, then consider carefully their necessity. Along these lines, consider carefully before using browser-specific capabilities; always ask "is this feature really necessary?"

Finally, be sure to identify how corporate policies relate to your site. Some policies may affect content; e.g., a "privacy policy" may prohibit use of Web pages containing any personal employee information. Other policies or procedures may affect external access, such as use of a corporate firewall.

BUILDING YOUR SITE

The third step is building your site is to ensure that your site is usable and worth returning to. Provide sufficient information at the very top page so that users know who you are and what you do, and what they can find at the site. Make navigation as easy as possible; provide a site map and local search engine, if applicable.

You also should provide users with a reason to want to return to your site. You might want to sponsor a contest and give away something periodically. You may want to include a periodic newsletter, column, humor, advice, industry insights, etc. You might want to host a discussion list, BBS, or information archive on some topic to demonstrate your commitment to that issue.

It is important that you keep the site dynamic. Realize, however, that you may not be able to stop once you start; if you post a monthly newsletter and let it languish, it may raise questions as to what's up with your organization.

Finally, be mindful of security and build in the appropriate precautions. Not only is it safer for you, but customers appreciate it as well.

TEST AND MAINTAIN THE SITE

The fourth step in running a site to continually test and maintain the server. Before information goes online, examine the content. Is it readable? Is it grammatically right? Are there spelling errors? Does it look ok? Is it logical?

Also, be sure to test all links to ensure accuracy, and keep the links up-to-date; it is surprising how many sites have invalid links and pointers to their own server.

In addition, keep the content current and accurate. Although it is chic to include a date on your Web pages, only do so if you change your pages often and/or the date is relevant.

Provide an e-mail link to the webmaster at your site and solicit input from your viewers. And listen carefully to the suggestions.

Finally, continually monitor your hardware, software, and communications facilities to ensure that they are meeting your needs. Responsive, fast, available servers are a must to successfully provide a Web-based product or service.

ADVERTISE YOURSELF

The fifth step is to get advertised. Publicize your site by registering with the popular search tools, such as Yahoo! (http://www.yahoo.com), Lycos (http://www.lycos.com), and WebCrawler (http://www.webcrawler.com). Note also that while most such registration databases limit you to two or three categories, you can register every page if need be since each page is a different URL!

Intelligent agents such as AltaVista (http://altavista.digital.com) search all "visible" Web pages and do a text search. To ensure that these search engines find your pages under the categories that you choose, include keywords in your HTML files (using the META keyword= label).

There are other ways to advertise your site. If individuals are members of Internet discussion lists and/or USENET newsgroups, have them add your site's URL to their signature line. Put the Web site address on all business cards, literature, and other materials.

Finally, consider carefully before advertising on the Web. Some organizations report tremendous success with paid advertising while others have found it to be a waste of money, making this no different from print ads.

CONCLUSION

This article has discussed both the technical and non-technical aspects of putting a Web server together. Both aspects are important and must go hand-in-hand.

Installing, designing, and running a Web site is actually a fun process. What makes it hard is not the technology but the nature of bringing together many parts of an organization in the design and implementation; but that also adds to the fun. Set reasonable expectations and goals for the site and be familiar with the technology so that you know what the Web site can and cannot do.

And finally, plan for success; almost everyone underestimates the capabilities and potential of the Web.

ABOUT THE AUTHOR: At the time this article was published, Gary Kessler was a Senior Engineer at BBN Systems and Technologies (Cambridge, MA), working out of his home in Colchester, VT. He is the co-author of ISDN, 4th ed. (McGraw-Hill). His e-mail address is kumquat@sover.net.