Like many others, I run a site. Well, several of them, including
mostly static ones, like business card sites, to highly dynamic in
nature (blogs and portals). There may be more or less dynamic sites
(directories, shops, news feeds and so on), but what I need in every
case is to know when the site is running smoothly, and when it's in
trouble.
To make sure a site is up, in many cases it is sufficient to make
sure a single page is loaded and returns what is expected. It may be a
site state page, an index page, whatever that is expected to always
load with known HTTP response and, possibly, with known content as
well.
I can run handmade scripts and make sure the site responds and the
data is returned as expected, but when it comes to more sites, the
manual operation becomes tedious and hardly acceptable. Of course, I
need automated means to watch the state of my presence on the Net.
It is important to check not only the mere existence of page and
data returned, but also to make sure the response time is within
expected range. Nowadays on the Net, if a site doesn't load quickly
enough, possible visitors can as well lose interest to using the site.
It can result in many drawbacks and losses, thus it is equally
important to check not only the fact the page exists, but also to
verify it is loaded within reasonable time.
So, basically, I need not only to know what time the page should
load within, but also to tell a heavy traffic condition from site
failure or overload.
To handle possible downtime situations, the site administrator
should be notified as soon as possible, if that happens. However, it
is hardly reasonable to load the site too frequently apart from
creating unnecessary load on the server and wasting traffic, it can
also result in severe server load, and other unexpected consequences
such as search engines penalties applied, since they can notice the
pattern of periodic loading of the page. To check the page once in a
minute can be quite acceptable even for complex, database-driven
heavily used site. Perhaps the best solution would be to create a
lightweight monitor, such as PING monitor, and relate the HTTP(S)
monitor to it, raising alerts primarily on PING failures.
I certainly do not wish to attack the sites of mine with too
frequent requests. It's much better to just ask for a ping response,
and much more rarely ask for actual pages. Otherwise, I can cause site
overload myself, this is hardly what I want.
Also, it is not always acceptable just to make sure the page is
loaded quickly enough and returns proper HTTP response code. There
may be need to send not only GET request, perhaps with a query string,
but also to send POST request with certain data (to test a Web form
functionality). The page may be protected against public access and
HTTP authentication may be required. Also, we could also be expecting
a definite kind of content returned, i.e. We can expect the page
loaded from the site to contain a known string. A simple example:
many modern CMSes (content management systems) often intercept 404,
403 and other HTTP error code processing and return a page instead,
explaining what kind of error has occurred and why. Thus, unless we
analyze the actual page content returned, we could be unable to tell
the page we tried to load doesn't exist.
Indeed, I need not only to be sure that the main page of sites
opens fine, but also U could be interested in accessibility of contact
forms on my site. The feedback may be very important, and I must be
sure the visitors can both access the pages and express their opinion,
order my products and so on.
Besides, it is not a rare case when page may also be loaded from
specific locations (i.e., IP ranges) and it is required to make use of
a proxy server to access the site. A typical proxy may be of
HTTP/HTTPS or of Socks type. Thus the monitoring program must be
able to make use of a proxy to access the site in question (there are
further complexities such as possible latency of the proxy, donating
to the possible raising response time; proxy stale cache condition,
but these can't be detected by the monitor alone).
There may be many resources beyond public ones I need to watch. For
example, if I work at a company, it might be required to test whether
all its intranet services behave well. Sites are used widely as common
way to access data, so there may be intranet sites to control and
watch. If this should be done externally, proxy and other gateway
services may be of use.
Concluding, a monitoring
tool chosen to watch a Web site page(s) should be able to
- access one or more of pages at programmable time periods; perhaps
randomly chosen in order not to create a pattern of intentional page
loading to raise Web stats
- make use of lighter monitor such as PING to serve as early-warning
alarm before trying to actually loading the page
- handle both GET and POST request types, with possible data
supplied to either
- investigate response time as well as the actual data returned
- be able to look for HTTP response and specific data in the server
output
- be able to make use of proxy servers to access the site being monitored
See more details on:
Web
Monitoring Software