World Wide Web document standards and universal access

Sue Steele
Student: 11258381
Subject: LAR5007
Assignment 1A
Due: 4 October 1997
Co-odinator: Tom Denison
Dept. of Librarianship Archives & Records
Monash University

Introduction

World-Wide Web (WWW) began as a proposal to use hypertext to manage information at the CERN high-energy physics laboratory (Berners-Lee 1989). The proposal described the information environment as a web. The hypertext web was to link textual information on a variety of different machines at several locations, using the Internet TCP/IP protocols. It was to provide linkages within and between documents. It recognised the wider application of the web to "multimedia documents which include graphics, speech and video" (Berners-Lee 1989), but that was not part of the initial development plan. The aim was to develop a system which was readily accessible over Internet. It was to be flexible and extensible, able to deal with a variety of platforms and data-types.

WWW is enormously popular. There are literally millions of people developing WWW content and many more using it. Users' expectations increase as each new web-feature is unveiled. From the outset WWW used standards as its base. Where there was no existing standard a new one was developed. This process is an evolving one. The standards process is slow and competing commercial interests try to capitalise on this by developing and implementing their own 'standards' in many cases. One outcome of this is reduced accessibility.

HTML standards

The Internet Engineering Task Force (IETF) is Internet's standards development body. IETF working groups develop standards which are freely available via Internet. The standards are called Requests for Comment (RFC). IETF endorsed standards are open and platform-independent. This means that any products following the standards are able to communicate with similar or complementary products over the Internet, regardless of the brand of hardware of software in use at each end.

The World Wide Web Consortium (W3C) was formed in October 1994. W3C's stated aim is to "develop common protocols to enhance the Interoperability and lead the evolution of the World Wide Web " (Khudairi). The consortium operates out of the Massachusetts Institute of Technology (MIT) Computer Lab and is jointly hosted by MIT, France's National Institute for Research in Computer Science and Control (INRIA) and Keio University in Japan. W3C has over 160 member institutions drawn from industry government and education. The consortium develops draft and recommended standards on all aspects of WWW. The IETF HTML Working Group was disbanded in September 1996 and carriage of WWW standards was handed to W3C.

Standards ratification is a slow process. This is particularly obvious in the case of HTML where draft new standards are often well developed before a previous version is recommended. HTML 3.2 became the current recommended version in January 1997. The current draft version is HTML 4.0. Drafts are constantly changing, and some never become recommendations. However companies such as Netscape and Microsoft are generally incorporating parts of the current draft into their browser range, and web-developers are keen to use any available tools and tricks.

HTML is a Document Type Definition (DTD) of ISO Standard 8879:1986 Information Processing Text and Office Systems; Standard Generalized Markup Language (SGML). SGML defines itself as a markup language, leaving interpretation to typesetters and/or viewers. Strict HTML is a simple but true SGML DTD.

Various web-browsers may choose to interpret some HTML tags in different ways. If the document and the browser conform to an appropriate HTML DTD this shouldn't cause any major problems aside from the possible philosophical problem with a page author not being able to specify exactly how a page will render on a user's screen. HTML DTDs are backwards compatible. This means that earlier versions of HTML can be displayed in new browsers. It also means that earlier browsers can display later versions of HTML; they simply ignore tags they cannot recognise.

In an effort to push existing standards boundaries and to attract brand loyalty, companies such as Netscape and Microsoft create web-browsers which extend prevailing HTML standards. The earliest examples of this are Netscape's infamous <blink> tag in version 1.1 and tables in version 1.2. Microsoft's Internet Explorer countered with its equally infamous <marquee> tag and with some proprietary <font> declarations. Later on Netscape introduced frames in version 2.0 and layering in version 4.0. When Netscape began supporting javascript, a scripting language which can enhance the presentation of web pages, Microsoft introduced jscript instead. Some of these extensions went on to become part of HTML3.2 or 4.0, some didn't. Browser-specific extensions are not always platform independent. The extended markup often works only in the proprietary browser, and leads to poor results for those with other brands of browser. In the worst cases, the pages are totally unreadable in other browsers.

There is no doubt that companies such as Netscape and Microsoft can speed up HTML developments. However they do so at a cost. The cost is borne by web users who are unable or unwilling to change to a browser which supports the latest Netscapism or Microsoftism and by web-developers who may need to produce different versions of their web sites for different browsers. Netscape and Microsoft are W3C members and act to influence standards development through it. This often leads to conflict within W3C about whose proposed enhancements should prevail. (Messmer 1996. Messmer and Silwa 1996) They are quick to incorporate parts of draft HTML specifications into their browsers. For example HTML StyleSheet are partially implemented in Internet Explorer 3.0 and above and Netscape 4.0 and above.

They also try to influence standards development by adding extra features to their browsers, hoping that popularity will force these features to become part of the standard. Market dominance appears to be the goal behind some of their proposed enhancements, for example Microsoft's attempts to have ActiveX become a de-facto standard and Microsoft and Netscape's consortial bids to have their font technologies become the standard. W3C, and in particular its host institutions and leadership, plays a significant role in ensuring the continued openness of recommendations and standards.

A web user generally has access to one web browser. Their view of WWW is that displayed by their browser. They may have no idea that other browsers provide a completely different view. Berghel (1996) notes that "WYSINWEOS" (what you see isn't necessarily what exists on the server), and that this problem is caused in part by the mixture of "standards" in place. Browsers can be a mixture of features from the current standard, parts of the draft standards and vendor-specific extensions. Many web-page developers are keen to utilise new features. The standards-track process takes far longer than these groups are prepared to wait.

Several sites have developed test pages so that web users and web developers can try out the features of particular browsers against a known set of HTML tags and media types. Although Berghel was writing in early 1996, the sites he lists are still available and have been updated to reflect current standards and drafts. What's really needed is an easy way for developers to view their html on a range of browsers and platforms.

Knight (1996) summarises the problems inherent in using non-standard HTML markup and vendor-specific tags. He also points out that some of the problems relating to non-standard markup and subsequent browser display are caused by web-authors effectively mixing and matching HTML versions, and not correctly specifying which DTD they are using. Using proprietary tags may limit the number of people who can view a site to those with that brand of browser. This results in many sites stating which versions of which browser are required for optimum viewing. In theory this is acceptable but in practise it may require a user to have several browsers installed if they wish to regularly view certain sites which optimise for competing browsers.

Netscape's introduction of frames in its version 2.0 serves as a good example of the way proprietary extensions can affect users, standards and other browser developers. (Notess 1996) Frames provided a new way of presenting web-pages, allowing the screen to be split into several smaller frames. One of the major benefits of frames was the ability to keep a menu bar on the browser screen at all times.

Some of the frames drawbacks were:

Most of these problems were rectified when Microsoft incorporated frames into Internet Explorer version 3 and Netscape fixed some of the navigational problems in its version 3. Frames were adopted as part of the HTML3.2 recommendation. The problem with non-frames browsers still remains on badly designed sites.

Some other electronic document formats

HTML is the most common document format in use on WWW. HTML still lacks a comprehensive mathematical markup scheme and printed-document control. Some of these issues may be addressed by HTML4.0 but there will still be some documents that are not well-served by HTML. Wusteman (1997) has written a comprehensive overview of the most common document formats in use, particularly in relation to electronic serials. Each format has its own benefits and drawbacks.

Some SGML developments

SGML usage is on the increase. Although a web browser cannot directly display any SGML DTD other than HTML, a number of projects are successfully converting SGML to HTML for web-use. This has some advantages to the document-producer. SGML marked-up text can be used for a number of purposes. For example it could be sent to be typeset and be converted to HTML for online delivery. It has some advantages for the web-user as well. A web-document produced this way would be consistently-formatted and readily accessible in almost all web-browsers.

The Humanities Text Initiative at the University of Michigan (Powell and Kerr 1997) is a large SGML based project with almost 2 million pages of encoded text. The texts are made available via WWW in SGML format for those with SGML viewers. Text is also converted, on the fly, to HTML as required. This type of approach, based on a large project and considerable experience with both SGML and HTML, is an excellent one in terms of access and delivery. The SGML texts are searchable via web-forms interfaces and appropriate gateway scripts, they are also almost universally accessible because of the multiple delivery formats.

This project has developed guidelines for electronic text encoding and interchange (the TEI Guidelines), and 14 separate SGML DTDs relevant for particular types of texts. The HTI offers advice and assistance to other organisations, and makes some of its tools available to others. This facilitates text encoding on a broader scale.

The ability to rapidly convert from SGML to HTML was also used at the University of Melbourne (Morton 1997) to produce their Undergraduate Handbook. The data is input by staff from many faculties and departments. Use of an SGML editor and an appropriate DTD allowed a more accurate handbook and rapid output of the end-products, both the print and HTML versions. Initial data was gathered by using the HTML versions of the previous year's handbooks. The total one-off cost of this project, development costs, conversion costs and software costs was significant, but the end result was a timely web version of the handbook for no extra effort.

SGML has been around for a long time. HTML is its most successful implementation to date. The interest in WWW and HTML, and in HTML's deficiencies have aroused new interest in SGML itself and in ways to better utilise SGML within the WWW context. If a web browser were able to interpret more than the HTML DTD, for example, it could become SGML capable. This opens up all sorts of possibilities for encoding and display of text, mathematical formulae etc and allows for the use of more than one display style for an individual document, depending on a user's choice.

Web browsers are already interpreting one SGML DTD (HTML). It would not take that much more for them to interpret more than one DTD and an SGML/HTML style sheet (Sperberg-Mcqueen and Goldstein 1995). These proposals to extend the features of web browsers to incorporate more SGML features were originally proposed at the Second International World-Wide Web Conference in 1994. By December 1996 cascading style sheets (CSS) had become a W3C recommendation, and were partially implemented in Netscape and Microsoft web browsers. Extensible Markup Language (XML), also foreshadowed by Sperberg-McQueen and Goldstein is a proposed method of allowing web-browsers to accept a range of SGML DTDs. Microsoft has already begun incorporating XML in its web browser.

Accessibility issues

PDF is extremely popular with web-publishers and with many web users because of its ability to exactly mimic the printed page. The PDF format is not readily accessible to visually impaired users. Adobe, the manufacturer of PDF has taken steps to rectify this situation. McQuarrie (1997) discusses these issues and the options being developed by Adobe. There is an on-the-fly conversion to HTML for any Internet-based PDF document available from access.adobe.com. A version of adobe access for stand-alone workstations is in the beta phase. This would allow conversion of local PDF documents to HTML or plain text. These steps are necessary because PDF cannot be read by most Windows screen readers, whereas HTML generally can be read. Extensions to PDF to allow a more logical breakdown of the document into text suitable for screen readers may eventually be available.

CSS currently offer the most flexibility for web-designers and have a lot of promise for access issues. The nature of CSS means that an author can apply more than one style to a document. For example, there could be a standard web-browser StyleSheet, a printer-oriented StyleSheet and one to be used by a speech synthesiser. Raman (1997) has developed a CSS speech specification. Using this specification, stylesheets can be attached to web documents and a visually impaired user with an appropriate browser/speech-synthesiser can hear the document "read" in a way which makes it far more meaningful.

Structured documents such as HTML have greatly aided disabled access - particularly for the visually impaired. HTML can be read into braille as well as by speech synthesisers for example. Making web documents readily accessible by almost all users does not require much extra effort in most cases. Many of the things which make HTML accessible to the disabled make it more generally accessible to users of any type of browser, and are often considered good HTML style by experienced web authors. Some very simple "tricks" such as using ALT tags with inline images can make a huge difference to the accessibility of a web page. Some sites insist on this practise. Other sites don't seem to realise. AUS Standards (New South Wales Attorney General's Department 1997) is a good guide to maximising accessibility. Its suggestions are practical, realistic and don't affect the overall design of a web page.

Conclusion

Online information is still a relatively new concept. The current generation of users and developers is uncomfortable with a medium where the final output is not controlled at the developer's end. The desire to totally control the look and feel of an online information product can make that product less-accessible to many users. Ours is still a paper-based society. We like to see screen-based information that exactly resembles a printed page, even when there may be a better alternative.

The current WWW standards and tools are insufficient for many purposes. The proposed standards incorporating more SGML features, provide more flexibility. If they are recommended within a reasonable time-frame and adopted by the major vendors, developers and users will benefit. They can provide finely tuned control over the look and feel of web pages whilst maximising accessibility. Our best hope for universal access is that the needs of the majority and the needs of the access-impaired are converging.

"If we design systems that are truly ubiquitous and nomadic; that we can use whether we are walking down the hall, driving the car, sitting at our workstation, or sitting in a meeting; that we can use when we're under stress or distracted; and that can make it easy for us to locate and use new services -- we will have created systems which are Accessible to almost anyone with a physical or sensory disability. We will also have gone a long way to creating systems that are usable by a large percentage of the population who currently find systems aversive or difficult to learn." (Vanderheiden 1997)

References

Berghel, Hal. (1996) HTML compliance and the return of the test pattern. Communications of the ACM. 39(2): 19-22.

Berners-Lee, Tim (1989). Information Management: A Proposal. URL: http://www.w3.org/History/1989/proposal.html

Khudairi, Sally. World Wide Web Consortium [W3C] Backgrounder. URL: http://www.w3.org/Press/Backgrounder.html

Knight, Jon. (1996) From the Trenches - HTML: which version? Ariadne. Issue 1. URL: http://www.ariadne.ac.uk/issue1/knight

McQuarrie, Liz. (1997) PDF and Adobe(R) Acrobat(R) viewers for the visually disabled. URL: http://www.adobe.com/prodindex/acrobat/accesswhitepaper.html

Messmer, Ellen. (1996) The whole Web in its hands - Part 1. Network World. 13(30): 1,14. July 22.

Messmer, Ellen. Silwa, Carol. (1996) Market forces may crush Web group: Part 2. Network World. 13(31): 1,12. July 29.

Morton, David. (1997) Using SGML to produce the University of Melbourne Undergraduate Handbook. Information technology - the enabler : CAUSE in Australasia '97 : conference proceedings, 13-16 April, 1997, Carlton Crest Hotel, Melbourne, Australia. 405-414.

Notess, Greg R. (1966) Negotiating Netscape's frames. Online. 20(5): 65-68.

Powell, Christina Kelleher. Kerr, Nigel. (1997) SGML creation and delivery: The Humanities Text initiative. D-Lib magazine. July/August 1997. URL: http://sunsite.anu.edu.au/mirrors/dlib/dlib/july97/humanities/07powell.html

New South Wales Attorney General's Department. (1997) AUS standards URL: http://www.agd.nsw.gov.au/standards.html

Raman, T.V. (1997) Cascaded speech style sheets. Sixth International World Wide Web Conference. Santa Clara California. April 7-12 1997 URL: http://www6.nttlabs.com/HyperNews/get/PAPER14.html

Sperberg-McQueen, C.M. Goldstein, Robert F. (1995) HTML to the max: a manifesto for adding SGML intelligence to the World-Wide Web. Computer Networks and ISDN Systems. 28:3-11.

Vanderheiden, Gregg C. (1997) Anywhere, anytime (+anyone) access to the next-generation WWW. Sixth International World Wide Web Conference. Santa Clara California. April 7-12 1997 URL: http://www6.nttlabs.com/HyperNews/get/PAPER253.html

Wusteman, Judith. (1997) Formats for the electronic library. Ariadne. Issue 8. URL: http://www.ariadne.ac.uk/issue8/electronic-formats/


LAR5007 | Librarianship Archives & Records | Monash University
This essay was written as part of the requirements for the subject LAR5007 Electronic Publishing. Disclaimer

Links checked on October 2 1997.