|
The Web has become the universal and global delivery
mechanism for external data. In many ways, the Web is
the mother of all data warehouses. The immense resources
of the Web, with all of its complexity and dynamics,
are largely untapped. Valuable information about external
business factors is readily available on the Web and
is becoming more so each day.
Web farming is not surfing the Web haphazardly,
wandering from one intriguing item to another. Nor is
it a one-time search of the Web. On a continuous and
systematic basis, a Web farming system must deliver,
to the right people at the right time, information highly
relevant to the enterprise. In effect, a Web farming
system acts as the eyes and ears of the enterprise,
focusing externally to be aware of important changes
in the business environment.
Web farming has the objective of refining web content
in a systematic manner. In particular, refining this
content involves the processes of discovering, acquiring,
structuring, and disseminating. The specific objectives
of web farming are:
- To discover web content that is highly relevant
to the business
- To acquire that content so it is properly validated
within a historical context
- To structure the content into a useful form that's
compatible with the data warehouse
- To disseminate the content to the proper people
so it has direct and positive impacts on specific
business processes
- To manage the previous steps in a systematic manner
as part of the production operations of a data center
environment.
Web Farming results in practical management and
technical skills for implementing effective business
intelligence systems within your company. The four-stage
methodology for web farming minimizes your risk of an
unsuccessful implementation and maximizes the benefits
to your business.
With web farming, this discipline is called Information
Refining and consists of four processes: discovery,
acquisition, structuring, and dissemination.
Discovery is the exploration of available Web resources
to find those items that relate to specific topics.
Discovery involves considerable "detective"
work far beyond searching generic directory services
(such as Yahoo) or indexing services (such as AltaVista).
Furthermore, the discovery activity must be a continuous
process because data sources are continually appearing
(and disappearing) from the Web. A business analyst
is the central figure in this activity and requires
advanced search and indexing tools to be productive.
Acquisition is the collection and maintenance of
content identified by its source. The main goal of acquisition
is to maintain the historical context so you can analyze
content in the context of past changes. Acquisition
requires a secured server platform with large storage
capacity.
Structuring is the analysis, validation, and transformation
of content into a more useful format and into a more
meaningful structure. The formats can be Web pages,
spreadsheets, word processing documents, and database
tables. As we move toward loading data into a warehouse,
the structures must be compatible with the star-schema
design and with key identifier values.
Dissemination is the packaging and delivery of
information to the appropriate consumers, either directly
or through a data warehouse. It requires a range of
dissemination mechanisms from predetermined schedules
to ad hoc queries. Newer technologies such as information
brokering and preference matching may be desirable.
There is a bi-directional flow to the processes.
The left-to-right flow refines the content of information,
which becomes more structured and validated. The right-to-left
flow refines the control of the processes, which become
more selective and discriminating.
|