Artronic Development (ARDE)

Web Site Secrets

David W. Eaton
602-953-0336
dwe@arde.com
Artronic Development
4848 E. Cactus Rd. - Suite 505-224
Scottsdale, Arizona 85254
http://www.arde.com
April 1997

Introduction

My last year's InterWorks presentation, Crafting and Maintaining a Useful Web Site provided some general guidance to assist with planning and implementing a Web site. This year, in addition to updating some of those suggestions, I will take a closer look at some of the specific techniques used on the award winning InterWorks Web site (now retired) that you might want to consider for your own. Specifically, I will discuss:

  1. How the Web is changing
  2. Tricks to try on your site
  3. Automation and interactive features
  4. Converting content for Web viewing
  5. What is next?

How the Web is changing

Not surprisingly, the Web has changed a lot since last year. Below I will highlight just a few of the developments.

HTML Specification

The HTML 3.2 Reference Specification was endorsed as a W3C Recommendation by the World Wide Web Consortium in January 1997. This is intended to be a replacement for HTML 2.0 (RFC 1866) and offers a number of new features.

InterWorks Web site upgrade

Hopefully by now you have visited the InterWorks Web site and seen the extensive face-lift it got last fall: new (but still lean) graphics, more content, better navigational techniques, and more ties to the Interex Web site. This was a collaborative effort by a number of staff members and volunteers, working in part from suggestions submitted by readers like you. All of us who were involved hope you have found the results to your liking. Some of the items in this presentation are the result of knowledge gained during that site upgrade.

Browsers then and now

In April 1996 I reported that of the browsers accessing the InterWorks Web site, Netscape was used 66% of the time (and was increasing) followed by NCSA Mosaic at 13% and Lynx at just under 3% (both were decreasing). All others were well behind these.

So with all the press coverage about "browser wars" and who's "winning", I took another look at the InterWorks access logs over the six months from August, 1996 through the end of February, 1997. This time, on average Netscape use had dropped to 63% of the accesses (and maintaining) followed by (growing) MSIE at 7.5%, then NCSA Mosaic at 1.6% and Lynx at just over 1.3% (both still decreasing). All others were behind these at less than a full percent, though use of NetManage Chameleon recently began growing, averaging 2.4% in the last two months of this period. The remaining 26% of our accesses were scattered across more than 50 various other browsers and robots. Access logs from other sites I checked were similar, though some sites with heavier PC accesses than InterWorks gets were starting to see an increase in the use of MSIE to as much as 20%.

Here are some updates on issues I discussed last year (as well as some new ones):

Continue to watch your server logs to discover what browsers are used to access your site, then plan accordingly. Although Mosaic has entered a hibernation stage, there are other projects ramping up to provide non-commercial Web browsers. One recent such effort has been named Mnemonic. Some other freely available browsers you might want to look into include Grail (written in python), SurfIt (tcl/tk), E-scape (the "official" GNU browser project), Arena (now being done by Yggdrasil), Amaya (from INRIA in France), and Chimera. Perhaps next year there will be more to report.

Drawing attention to your pages

As novices and the advertising community have increased their participation on the Web, there has been an increase in attempts to draw attention to pages by using the Netscape extension tag <BLINK> (which some at Netscape itself admit they regret creating) and with animated GIFs. So-called "JavaScript" (known as "LiveScript" until "Java" became a buzzword, it has almost no relationship to the Java source code and certainly no relationship to Java Byte Code) is beginning to follow closely behind. Only a few of the applications of these features seem to be appropriate as yet. Most simply say "look at me, I'm new to the Web" or "I don't care what you are really trying to read, look at this now".

While I don't find fault with Web advertising as a means to generate revenues which continue to allow other content to be freely available, I suggest that you employ some of the less intrusive methods in pages on your site. This will help to prevent readers from turning off the animation or graphics altogether, thus missing the entire message you are trying to convey. Also, it may help prevent them from avoiding your site in the future.

Remember, by the time someone is looking at your Web page, they have some reason to be there. Perhaps they followed a link from a search engine or from some other related page. Perhaps an associate suggested that they visit your page. In any event, it is different from the typical print-media need to draw attention to a page as someone flips through a magazine or newspaper.

Appropriate registration of your page with search engines and related lists will do more to ensure people will find and visit your page than blinking graphics will. Appropriate content that is kept updated and is easy to navigate will ensure they keep returning to your site.

Tricks to try on your site

These are some of the tricks used on the InterWorks Web site to improve its content integrity, availability, and ease of navigation. You may want to use some similar techniques on your site.

Highlighting new or updated information

Many sites have adopted a means of drawing attention to information which is new or recently changed. On the InterWorks site, I have been using the notations

  <IMG SRC="/Images/new_small.gif" WIDTH=26 HEIGHT=15 ALT="[New]"><!-- ddmmyy -->
  and
  <IMG SRC="/Images/updated.gif" WIDTH=43 HEIGHT=15 ALT="[Updated]"><!-- ddmmyy -->
respectively.

However, leaving pages flagged in such a manner months after the change is posted may mean your readers will stop relying on these aids and miss something you want them to see. Be certain to review pages looking for such notations on a regular basis. To make it even easier to locate those that are obsolete, consider adding a comment to indicate when the notation was posted or when it should be taken down. That way, a relatively simple search can show what pages need attention.

The same technique can be used for other date related information, such as tracking the vendor postings on the InterWorks site. This was done so InterWorks could tell when the vendor's advertising period has passed and either contact them to see if they want to renew or remove their page.

While I am on the subject of dates, it is still a good idea to keep an indication of the date a page was last changed on the page itself. Not only does this serve to let the person reading your page know how current the information is, it also becomes useful to flag outdated information which is cached by some of the search facilities. In some cases, once you detect the old information, you can request their robot to re-visit your site and capture new information.

Controlling what search sites report

Well, in many cases you can't really "control" it, but there are some things you can do to try to help. For starters, there is the "Standard for Robot Exclusion" page at http://info.webcrawler.com/mak/projects/robots/norobots.html. This describes how the "robots.txt" file can be used to tell search robots not to visit certain files at all (assuming the robot adheres to the conventions.)

Some robots now honor certain "META" tags. These can allow you to specify a summary or a set of keywords to be used by the search engine for that page. In addition, the META tag may be used to indicate that a page shouldn't be indexed or that links on the page shouldn't be followed by the robot. This can be particularly useful if you don't have access to the site's "robots.txt" file discussed above.

Finally, you can place pages behind a password scheme. Since the robot won't have the password, it will not be able to visit those pages.

On the InterWorks site, I make use of a combination of these techniques for various circumstances.

Pages for internal or limited use only

One of the buzz words of the past year has been the "Intranet". There is now talk of the "Extranet", which most of us saw as the Internet in the first place. What is really behind it all is who should have access to what information. Certainly, if you plan to place company proprietary information on a Web server to make it available to employees, that Web server should be behind a firewall so the general public cannot gain access.

However, if you have information targeted at a particular group of readers that you would prefer not be seen by others on the same side of the firewall, what can you do? On the InterWorks server, this is addressed by using the user ID and password feature of the Web.

Special sub-directories are set up with unique access passwords on each. The IDs and passwords are shared with the appropriate committee members so they may access the pages required. A single ID can be a member of multiple groups (a concept you have seen before) so that they only need to remember one access mechanism to get to all the pages they need. Of course, you can create a single generic ID to pass to all members of the same group or committee if you prefer.

This scheme was used to provide all members of the InterWorks 97 conference committee access to the submitted abstracts. In this way, they could review all the abstracts and select which ones were most appropriate for each track without the submissions being made available to the world at large. Automation (described elsewhere in this presentation) provided even more capability to the committee. On a broader scale, this technique was used to protect a complete copy of the InterWorks Web site (held on a different server) during the major overhaul that was done last fall. Those participating in the review could watch the progress and make comments at any time, while the site remained hidden from general access and from robots. Changes are still made first on this alternate site so they can be tested before being moved to the public server.

Checking your HTML syntax

If you don't have a method for checking that your pages are created using proper HTML syntax, visit "weblint: Quality Assurance for Web Pages" at http://www.cre.canon.co.uk/~neilb/weblint/. Version 1.014 of this tool was discussed last year, but at this writing it was up to 1.017 -- and the enhancements make it much nicer and more comprehensive. Its Web site gives the primary ftp sites, but it should be on all the CPAN mirrors as well.

Formatted Web pages for e-mail or print

Although most browsers provide a means for saving or printing the displayed Web page as a formatted text file, I continue to get requests from people asking "please forward a copy of page XXX so I can send it to an e-mail list." A quick way to do this (rather than wait for a full-function browser to start up) is to use the -dump switch in Lynx and redirect the output to a file. The result is a flat text rendering of the page which can be included in or attached to your e-mail message and read by the recipient.

If you need the complete source version of a remote page, use the -source switch in Lynx instead.

Think maintenance

A primary advantage of the Web over print media is that it can be changed easily. If a printed article contains an error, magazines often print a correction in a future issue and hope that readers see both copies. On the Web, however, the original post can (and should) be changed to reflect the correct content. Significant changes may be highlighted on the site's "What's New" pages to draw attention to them. Unfortunately, I have seen sites that post "errata pages" and link to them, but never correct the original page!

Perhaps even more important is the concept of gradually building Web pages as new content becomes available. Information may be provided as soon as it is known, then updated or changed as time moves on and plans become firm. Anticipating the changes that will come can make this aspect of Web site maintenance a bit easier.

For example, each year a section of the InterWorks site is dedicated to conference information. When the conference is first announced, all the details, are not available, but it is known that there will be certain features. At the first posting, the dates and a location are known and there is a pretty good idea of what audience the conference will target. Thus, this information can be posted early so people can begin to make their plans. Later, information about keynote addresses, special features and costs can be added, followed by specific abstracts which have been selected early. One of the last items posted might be the detailed schedule. However, as many of you noticed this year, an automated schedule generator allowed InterWorks to post this schedule, then reliably and easily have the conference committee and staff update and re-post it themselves so you could have access to the latest information.

Using a standard conference area template from year to year, it becomes easier for readers to home in on the information they need and easier for InterWorks to maintain and update the pages reliably. Consistent page naming makes it easier for readers to predict where related information from prior years may be found and move directly to the pages they desire.

Another section which requires periodic maintenance is the InterWorks vendor directory and its related pages. Vendors who have booths at the conferences are highlighted with unique "flags" which contain the year. The graphic for vendors displaying at a future conference is green, while the one for the current year is yellow and those for past years are cyan. All graphic flags are small and have the same dimensions. This allows the flags to exist on the same line as the text without forcing extra vertical space for most browsers and allows the graphics to be swapped from year to year to change the color without page modifications. (All that is required is a name change of the graphic file and the appropriate color will be displayed everywhere.) All the indicators have been built through the year 1999 (leaving InterWorks with its own "year 2000" work to do for the Web when that time comes.)

Automation and interactive features

This section of the presentation will discuss some of the automation used on the InterWorks Web site. Perhaps it will give you some ideas for variations that would improve your site.

Forms and feedback

Many Web sites have a variety of forms through which readers may provide feedback or other information. The InterWorks site began with the usual, simple forms to permit readers to send questions and comments to the webmaster in case they had problems reading a particular page. This was soon followed by the on-line membership form, our "Thank You" form, and a few others. As more involved questions began being submitted, InterWorks added an advocacy form. The pull-down menu fields on this form made it easier to identify who could best resolve the question and be certain it could be routed to them more quickly. More recently, a special form was added to allow membership records to be updated easily, without the need to submit a complete new membership form.

All of these forms resulted in a simple e-mail message to the appropriate staff or volunteers. These people would then deal with the e-mail as needed. It is simple, but effective.

Saving data and generating pages

Faxing copies of paper abstract submissions and forwarding multiple e-mailed abstract submissions to the conference committee members for their review worked in 1996, but it was clear there had to be a better way.

This year, the submissions were still automatically e-mailed to a few conference workers, but the master was saved on-line. A Web interface was created that allowed the conference committee members to create a summary of all abstracts submitted up to the current time. This summary listed such items as the abstract title, author, time required, track, target audience, and level of content. Selecting the title allowed them to view the entire abstract and make changes as necessary. In this way, they could mark abstracts as "approved", change track assignments, and even schedule the day, time, and room where the presentation would be held. These changes could then be viewed by other committee members on-line immediately.

Further extensions to this interface allowed automatic creation and saving of the detailed schedule and track breakout that has been available for you on-line. Thus, any of the authorized committee members could create an updated schedule and post it for you as it changed -- without the webmaster being a bottle-neck in the process as had happened in past years. (Hopefully, you found this of value.)

Similar data-storage techniques are being employed in other areas of the InterWorks Web site as well. For example, the replies submitted to the Worldwide Advocacy survey were stored until they were needed for analysis.

Think generality

The need for maintenance is an accepted practice for software. Thus, programs and scripts used for Web page automation (such as generated page content) and interaction (such as forms) should be, and usually are, created with the idea that corrections and changes will be required.

What is sometimes overlooked is the possibility of reusing the script for a similar or related function -- preferably with no changes. Try to anticipate what is likely to change and place that information into a configuration file which is read by your script. In this way, the processing logic can remain unchanged and yet be used to process input from several forms. Data arriving from "hidden" fields on the form can indicate which configuration file needs to be read.

Though much improved for 1997, the same script with a different configuration file would still process the 1996 version of the InterWorks abstract submission form. (As it happens, this means this year's new features would be available for the 1996 conference processing, but we won't be back there again to need it.) The same configuration file technique was used on the Worldwide Advocacy Survey form posted on both the Interex and InterWorks Web sites earlier this year. The benefits will be seen as these scripts are reused for future conference submissions and surveys.

Image maps

Some of the changes to image map implementations were discussed above. This can be a very effective way to help readers of your site navigate to the information they need.

Most pages on the InterWorks site contain two image maps:

  1. the header logo may be used to return to the welcome page, move to the Interex site, get some general information about InterWorks, or view the site's Table of Contents.
  2. the footer graphic allows you to move directly to any of the major portions of the site quickly.
Both were created to provide maximum information using minimum bandwith. As was recommended above, both client-side and server-side implementations are provided so that the majority of readers can use the maps.

Some other uses for maps on the InterWorks site include:

Converting content for Web viewing

As you might suspect, content for the InterWorks Web site is submitted in a wide variety of formats. Perhaps the most common is a simple text e-mail message requesting updates to a local chapter or SIG page. These are small and easy to implement by hand. However, have you ever wondered what happens with other submissions? Have you ever wanted to submit content but didn't know how to do that? Perhaps this section will help. As more applications become able to save their results directly in HTML, such translations as are described here may become obsolete. For now they are necessary.

Templates

On the Web site, there was a fairly lengthy discussion of suggested formats and preferred templates for you to use for submission. I won't repeat that information here; it has been retired since the initial writing of this paper.

MS Internet Assistant

Microsoft has an application add-on available for free on their Web and ftp sites. Newer versions of their products may have it bundled. There are versions of Internet Assistant available for several applications, including MS Word, MS Excel, and MS Power Point. Once loaded, you simply do a "save as", specifying HTML format. In my experience, it does a good job on text, but a fairly poor job on the graphics. As might be expected, the results are tailored toward the MS Internet Explorer browser.

Word Perfect

The version of Word Perfect I purchased for my Linux machine is able to save its output in HTML format on its own. It seems to do a fair job, and I assume there are similar versions available on other platforms.

Cyberleaf

Interleaf's tool, Cyberleaf, is a comprehensive tool available for several UNIX platforms (including HP-UX) as well as MS Windows 95 and NT. It will convert from native Interleaf format as well as Framemaker MIF, MS Word RTF, and ASCII. Features in the tool allow you to specify special constructs if Interleaf didn't anticipate your needs. For example, you might choose to have a specially named component at the end of a document result in multiple lines being dropped into the generated HTML in order to provide your unique page closure.

Since Interleaf's authoring tool relies on document structure and varying component names, it is not surprising to find the same techniques employed by Cyberleaf. In both cases, the best operation of the tool is obtained when different component types are used for different structures such as headings, paragraphs, bullets, etc.

If you will be using similar documents over and over, it will be worth your while to take the time to map the various components to their desired HTML constructs. If your conversion is a one-time activity, try creating a special "web" name (such as "transient") so it will be easier to delete all remnants when you are done. This is particularly important if the document contains a lot of graphics since the "delete document" operation does not delete the generated graphics. (You may hunt these down and delete them manually if you need to, or you may delete the entire structure they call a "web".)

I have found that Cyberleaf tends to do a better job on graphics than MS Internet Assistant does. However, I have had even better luck copying the graphic out of the document, pasting it into an application such as "paint", saving that as a TIFF file, then converting that for use on the Web. For the final step, I often use "xv" on a UNIX machine.

Other tools

There are a number of other tools freely available on the Internet and more showing up on PC software store shelves every day. The quality and capabilities vary -- investigate, test, and ask others who have used the tools. Unfortunately, there is not enough time to do a thorough comparison of them here.

Editing the output from translators

In nearly all cases, I have found it necessary to modify the output of the translators (and HTML editors) manually before posting. Usually this is due to browser-specific constructs used by the translators which would cause problems to other browsers. In some instances, it is because the translators do not support particular HTML constructs I desire to employ. And in still others, it is just too difficult or time consuming to bother with the set-up, and easier to do the final tweaking by hand.

Experiences with Netscape Gold

It is not a translator, but quite a few people are starting to use versions of "Netscape Gold" as their HTML editor. As with other specialized HTML editors, it can reduce some of the HTML errors frequently encountered by those new to HTML. For example, it will ensure that both the begin and end forms exist for paired tags such as headings. Unlike some editors, Netscape Gold can read in HTML pages that it didn't create. If you are not familiar with the HTML specifications, this, or some other special editor may help you create better pages.

However, I do not use it! Some of my reasons include:

I will leave it to you to make your own choices -- I chose to go back to my original HTML version of this document and make the three edits by hand (then add this section). I scrapped the version which came out of Netscape Gold. Perhaps future versions will correct these problems, but I have had similar experiences with some other dedicated HTML editors.

What is next?

If I could predict that ... well, you get the idea. I'd like to think more browsers will implement HTML 3.2 features, support the lossless Portable Network Graphic (PNG) format, and be available on more platforms. However, I suspect that by next year there will be some other hot new topic to discuss instead.


Where is this paper?

This presentation was prepared by Artronic Development for the 1997 InterWorks Conference. An outline and an abstract are available for your convenience. This is an HTML document located at URL "http://www.arde.com/Papers/WebSecrets/paper.html".

To learn more about the Web, please refer to our other papers which are also on this Web site.


(Dave Eaton has been creating and maintaining Web sites since mid-1994 and performs the "webmaster" function for the InterWorks and other Web sites.)

[Available Papers]
[Table of Contents] About Artronic Development Services White papers and other info Welcome to ARDE Client and other Web sites What's new on this Web site? Contact us Copyright Table of Contents

(Updated 20 OCT 99)