|
Jean C. Bonney
Director, External Research
The Information Utility, the Information Highway, the Internet, the
Infobahn, the Information Economy --- the sound bytes of the 1990s. To
make these concepts reality, a robust technology infrastructure is
necessary. In 1990, Digital's research organization saw this need and set
out to develop an experimental test bed that would examine assumptions and
provide a basis for a technology edge in the '90s. The resulting project
was Sequoia 2000, a three-year research collaboration between Digital, five
campuses of the University of California, and several other industry and
government organizations. The Sequoia 2000 vision is
Petabytes (i.e., trillions of bytes) of data in a
distributed archive, transparently managed, and logically
viewed over a high-speed network with isochronous
capabilities easily accessed by end users via a host of
tools--in other words, a big, fast, easy-to-use system.
Although the vision is still not reality today, our more than three years
of participation in Sequoia 2000 research gave us the knowledge base we
sought.
After a rigorous process of proposal development and review by experts at
Digital and the University of California, Sequoia 2000 began in June 1991.
The focus of the research was a high-speed, broadband network spanning
University of California campuses from Berkeley to Santa Barbara, Los
Angeles, and San Diego; a massive database; storage; a visualization
system; and electronic collaboration. Driving the research requirements,
were earth scientists. The computing needs of these scientists push the
state of the art. Current computing technologies lack the capabilities
earth scientists need to assimilate and interpret the vast quantities of
information collected from satellites. Once the data are collected and
organized, there is the challenge of massive simulations, simulations that
forecast world climate ten or even one hundred years from now. These were
exactly the kinds of challenges the computer scientists needed.
Among the major results of three years of work on Sequoia 2000 was a set of
product requirements for large data applications. These requirements have
been validated through discussions with customers in financial, healthcare,
and communications industries and in government. The requirements include
- A computing environment built on an object relational
database, i.e., a data-centric computing system
- A database that handles a wide variety of non-traditional
objects such as text, audio, video, graphics, and images
- Support for a variety of traditional databases and file
systems
- The ability to perform necessary operations from
computing environments that are intuitive and have the
same look and feel; the interface to the environment
should be generic, very high level, and easily tailored
to the user application
- High-speed data migration between secondary and tertiary
storage with the ability to handle very large data
transfers
- Network bandwidth capable of handling image transmission
across networks in an acceptable time frame with quality
guarantees for the data
- High-quality remote visualization of any relevant data
regardless of format; the user must be able to manipulate
the visual data interactively
- Reliable, guaranteed, delivery of data from tertiary
storage to the desktop
Sequoia 2000 was also a catalyst for maturing the POSTGRES research
database software to the point where it was ready for commercialization.
The commercial version, Illustra, is available on Alpha platforms and is
enjoying success in the banking industry and in geographic information
system (GIS) applications, as well as in other government applications with
massive data requirements. Illustra is also making inroads into the
Internet where it is used by on-line services.
Yet another major result of Sequoia 2000 was a grant from the National
Aeronautics and Space Administration (NASA) to develop an alternate
architecture for the Earth Observing System Data and Information System
(EOSDIS). EOSDIS will process the petabytes of real-time data from the
Earth Observing System (EOS) satellites to be launched at the end of the
decade. The alternate information architecture proposed by the University
of California faculty was the Sequoia 2000 architecture. It will have a
major influence on the EOSDIS project.
For the earth scientists, gains were made in simulation speeds and in
access to large stores of organized data. These scientists used some of
Digital's first Alpha workstation farms and software prototypes for their
climate simulations. An eight-processor Alpha workstation farm provided a
two to one price/performance advantage over the powerful,
multimillion-dollar CRAY C90 machine. In another earth science
application, scientists using Alpha and hierarchical storage systems could
simulate two years' worth of climate data over the weekend without operator
intervention; formerly, two months' worth of data took one day to simulate
and required considerable operator intervention. Thus many more
simulations could be processed in a fixed time and "time to discovery" was
decreased considerably.
Now that we can look at Sequoia 2000 in retrospect, would we do such a
project again? The answer is a resounding "yes" from all of us involved.
It was a complex project that included 12 University of California faculty
members, 25 graduate students, and 20 staff. Another 8 faculty members and
students provided additional expertise. Four of Digital's engineers worked
on site, and a variety of support personnel from other industry sponsors
participated, including SAIC, the California Department of Water Resources,
Hewlett-Packard, Metrum, United States Geological Survey (USGS), Hughes
Application Information Services, and the Army Corps of Engineers.
But as is the case with such ambitious projects, there were unanticipated
and difficult lessons for all to learn. To experiment with real-life test
beds means considerably more than writing a rigorous set of hypotheses in a
proposal. Michael Stonebraker, in his paper, notes a number of challenges
we faced and the lessons learned. One of the issues that kept surfacing
was the "grease and glue" for the infrastructure, that is, the
interoperability of pieces of software and hardware that composed the end
to end system. This remains a challenge that needs research if we are going
to achieve the promised goals of internetworking. Another of the sticky
points was that of scalability. On the one hand, it is difficult to build
a very large networked system from scratch. On the other hand, as we
slowly built the mass storage system to the point of minimal critical mass,
we found that the current off-the-shelf technologies for mass storage were
not ready to be put use for our purposes. So yes, we believe the project
was worthwhile with some caveats. We gained critical knowledge about the
technology, but we also came a long way in learning the art of directing
and leading the type of project that is necessary to assist the Information
Technology industry in its quest for the ubiquitous distributed information
system.
How else are we going to get insight into the critical issues of building
and reliably operating a robust information infrastructure without building
a large test bed with real end users whose needs push the state of the art
at each point along the way? We believe that large projects similar to
Sequoia are crucial. The papers that follow attest to the important
knowledge gained. We have focused specifically on the end to end system
--- from the scientists' desktops to the mass storage system, the challenge
of building and using a large data repository, the timely and fast movement
of very large objects over the network, and browsing and visualizing data
from networked sources.
|
|