Friday, April 22, 2011

Tera-WURFL

Tera-WURFL is a PHP & MySQL based library that uses the Wireless Universal Resource File (WURFL) to detect the capabilities of mobile devices. The WURFL website nicely defines the WURFL as follows:
The WURFL is an “ambitious” configuration file that contains info about all known Wireless devices on earth. Of course, new devices are created and released at all times. While this configuration file is bound to be out of date one day after each update, chances are that the WURFL lists all of the WAP devices you can purchase in the nearest shops.
Example applications of Tera-WURFL include:
  • Redirecting mobile visitors to the mobile version of your site
  • Detecting the type of streaming video that a visiting mobile device supports
  • Detecting iPhone, iPod and iPad visitors and delivering them an iPhone-friendly version of your site
  • Used in conjunction with WALL4PHP or HAWHAW to create a website in an abstract language which is delivered to the visiting user in CHTML, XHTML, XHTML-MP, or WML, based on the mobile browser’s support
  • Detecting ringtone download support and supported ringtone formats
  • Detecting the screen resolution of the device and resizing images and wallpapers/backgrounds to an appropriate size for the device.
  • Detecting support for Javascript, Java/J2ME and Flash Lite
Tera-WURFL takes some functionality from the original PHP Tools Library by Andrea Trasatti and a prerelease version of the Java WURFL Evolution by Luca Passani. It serves as a drop in replacement for the original PHP Tools with only minor changes to your existing code. Since Tera-WURFL uses a database backend (MySQL4, MySQL5, MSSQL is experimental), the real world performance increase over the existing PHP Tools implementation is extremely high; normally between 5x and 10x faster!
The author of Tera-WURFL is Steve Kamerman, a professional PHP Programmer, MySQL DBA, Flash/Flex Actionscript Developer, Linux Administrator, IT Manager and part-time American Soldier. This project was originally sponsored by Tera Technologies and was developed as an internal project used for delivering content to customers of the mobile ringtone and image creation site Tera-Tones.com.

What Makes Tera-WURFL Different?

There are many good mobile device detection systems out there, some are free, and some are paid; however, they each have a specific focus. The official WURFL API (avaliable in PHP, Java and .NET coming soon) is a creation of one of the founders of WURFL, Luca Passani. Luca is the driving force behind mobile device detection and is highly regarded in the mobile development community. His APIs are focused on accurate detection of mobile device capabilities. Tera-WURFL’s focus is high-performance detection of mobile devices. Here’s what that means to you.

Tera-WURFL Project Priorities

  1. High Performance: Tera-WURFL uses a MySQL backend (MSSQL is experimental) to store the WURFL data and cache the results of device detections. This database can be shared between many installations of Tera-WURFL, so all your sites can benefit from sharing the same cache. Non-cached lookups on my test system average about 250 devices per second, with cached detections over 1000 per second.
  2. Accurate Detection of Mobile Devices: By using over a dozen UserAgentMatchers, specifically tailored to their own group of mobile devices, Tera-WURFL is constantly tweaked and tuned to provide more accurate results, detecting over 99% of visiting devices.
  3. Fast Detection of Desktop vs. Mobile Devices: Although detection of mobile devices is good among most of the libraries, they are not very good at desktop browsers. As a result, your desktop users are detected as mobile devices and sent to the wrong site. Tera-WURFL includes a feature called the SimpleDesktop Matching Engine that is used to differentiate between desktop and mobile browsers at an extremely high rate. Additionally, instead of the tens of thousands of unique desktop user agents piling up in your cache, SimpleDesktop uses just one entry in the cache to represent all the desktop browsers.
  4. Usability: Since PHP has such a large base in the small- to medium-sized website arena, there are a large number of new developers. Tera-WURFL has been designed to be easy to use and administer. Once the initial configuration is finished, you can maintain the system completely from its Web Administration Page, including the ability to update the WURFL data via the Internet.

Constant Performance and Accuracy Improvements

Using our in-house analysis and regression testing software built for Tera-WURFL, we are able to quickly perform a deep analysis on the performance and accuracy of both the UserAgentMatchers and the core. This data is then evaluated to identify potential bottlenecks in the system. We are also able to track the internal match confidence that Tera-WURFL has with each device detection and construct aggregate visualizations to determine if there are more new user agents on the Internet that are slipping by the detection system.
Tera-WURFL 2.1.0 UserAgentMatcher Distribution
Tera-WURFL 2.1.0 UserAgentMatcher Detection Rate
Tera-WURFL 2.1.0 UserAgentMatcher Performance Impact
Tera-WURFL 2.1.1 UserAgentMatcher Distribution
Tera-WURFL 2.1.1 UserAgentMatcher Detection Rate
Tera-WURFL 2.1.1 UserAgentMatcher Performance Impact

Requirements

  • Hardware: Architecture Independent
  • Software
    • Web Server: Any webserver that supports PHP and the MySQLi extension, here are some I’ve tested:
      • Apache 2.x
      • IIS 6/7 (for Windows users I recommend WampServer or XAMPP
      • lighttpd
    • PHP
      • PHP 5.x with the following required modules:
        • MySQLi (MySQL improved extension)
        • ZipArchive (this package is included with PHP >= 5.2.0. For previous versions of PHP it is available from PECL.
    • Database: One of the following database servers:
      • MySQL >= 4.1
      • MySQL 5.x
      • Microsoft SQL Server 2005/2008 (EXPERIMENTAL)

How does it work?

When a web browser (mobile or non-mobile) visits your site, it sends a User Agent along with the request for your page. The user agent contains information about the type of device and browser that is being used; unfortunately, this information is very limited and often times is not representative of the actual device. The WURFL Project collects these user agents and puts them into an XML file, commonly referred to as the WURFL File. This file also contains detailed information about each device i.e. the screen resolution, audio playback capabilities, streaming video capabilities, J2ME support and so on. This data is constantly updated by WURFL contributors from around the world via the WURFL Device Database. Tera-WURFL takes the data from this WURFL file and puts it into a MySQL database (MSSQL support is experimental) for faster access, and determines which device is the most similar to the one that’s requesting your content. The library the returns the capabilities associated with that device to your scripts via a PHP Associative Array. Currently, the WURFL contains 29 groups of capabilities with a total of 531 capabilities.
Here’s the logical flow of a typical request:

Device Requests a Page

Someone requests one of your pages from their mobile device. Their User Agent is passed to the Tera-WURFL library for evaluation.

Request is Evaluated

Tera-WURFL takes the requestor’s user agent and puts it through a filter to determine which UserAgentMatcher to use on it. Each UserAgentMatcher is specifically designed to best match the device from a group of similar devices using Reduction in String and/or the Levenshtein Distance algorithm.

Capabilities Array is Built

Each device in the WURFL file and WURFL database falls back onto another device, for example the iPhone 3GS has only a handful of capabilities, then it falls back onto the iPhone 3G, which adds to those capabilities and falls back onto the original iPhone, then onto a generic device that contains the default capabilities. Through this method of inheritance, the device entries remain very small in size. In our example, once the User Agent has been matched, the capabilities from this device are stored into the capabilities array, then the next device in its fallback tree (its parent device) is looked up, and its capabilities are add, all the way up to the most generic device.

Results are Cached

The capabilities array is now cached with the User Agent so the next time the device visits the site it will be detected extremely quickly.

Capabilities are Available to the Server

The process is finished and the capabilities are now available for use in your scripts. One common use, for example, is to redirect mobile devices to a mobile version of the site:
<?php
require_once './TeraWurfl.php';
$wurflObj = new TeraWurfl();
$wurflObj->getDeviceCapabilitiesFromAgent();
 
// see if this client is on a wireless device
if($wurflObj->getDeviceCapability("is_wireless_device")){
 header("Location: http://yourwebsite.mobi/");
}
?>

Show a Picture of the Device (optional)

If you have version 2.1.2 or higher, you can show an image of the device (assuming one is available). See the Device Image page for the details.
Here’s a usage example:
<?php
require_once 'TeraWurfl.php';
require_once 'TeraWurflUtils/TeraWurflDeviceImage.php';
$wurflObj = new TeraWurfl();
$wurflObj->getDeviceCapabilitiesFromAgent();
$image = new TeraWurflDeviceImage($wurflObj);
/**
 * The location of the device images as they are accessed on the Internet
 * (ex. '/device_pix/', 'http://www.mydomain.com/pictures/')
 * The filename of the image will be appended to this base URL
 */
$image->setBaseURL('/device_pix/');
/**
 * The location of the device images on the local filesystem
 */
$image->setImagesDirectory('/var/www/device_pix/');
/**
 * Get the source URL of the image (ex. '/device_pix/blackberry8310_ver1.gif')
 */
$image_src = $image->getImage();
if($image_src){
 // If and image exists, show it
 $image_html = sprintf('<img src="%s" border="0"/>',$image_src);
 echo $image_html;
}else{
 // If an image is not available, show a message
 echo "No image available";
}
?>

Using Soap with PHP

SOAP, the Simple Object Access Protocol, is the powerhouse of web services. It’s a highly adaptable, object-oriented protocol that exists in over 80 implementations on every popular platform, including AppleScript, JavaScript, and Cocoa. It provides a flexible communication layer between applications, regardless of platform and location. As long as they both speak SOAP, a PHP-based web application can ask a C++ database application on another continent to look up the price of a book and have the answer right away. Another Internet Developer article shows how to use SOAP with AppleScript and Perl.
SOAP was created collaboratively as an open protocol. Early in its development, XML-RPC was spun off, and now enjoys its own popularity as a simpler alternative to SOAP. Both encode messages as XML, and both use HTTP to transport those messages. SOAP, however, can use other transport protocols, offers a number of high-end features, and is developing rapidly. (For more about SOAP and web services, try XML.com’s helpful demystification.)
A SOAP transaction begins with an application making a call to a remote procedure. The SOAP client script then encodes the procedure request as an XML payload and sends it over the transport protocol to a server script. The server parses the request and passes it to a local method, which returns a response. The response is encoded as XML by the server and returned as a response to the client, which parses the response and passes the result to the original function.
There are a number of different implementations of SOAP under PHP. It’s a shifting landscape: new ones appear, and old ones aren’t maintained or simply vanish. As of this writing, the most viable PHP implementation of SOAP seems to be Dietrich Ayala’s SOAPx4, also known as NuSOAP. This implementation is the most commonly used and appears to be the most fully developed and actively maintained, and it shows every sign of continuing to be a robust and popular solution. It’s not complete—a number of features, including full documentation, are still in the works—but it’s still a highly viable and easy-to-use SOAP solution.

Installation

First, you need to get PHP up and running on your Mac. This is easy to do: check out our tutorial to get yourself set up. If you want to send SOAP messages over HTTPS, you’ll need to include the cURL module in your PHP build.
The next step is to install NuSOAP. Download the package from the developer’s site. Unzip it to get a folder of documentation, as well as the file nusoap.php, which contains the actual PHP classes that we’ll need. To use them, place nusoap.php in your PHP path and include it in the scripts you write.
The base class is nusoap_base. By using it and its subclasses, anything is possible. As an example, I’ll build a simple SOAP server script and client script, and then dissect the XML transaction they send.

A SOAP Server

Here is a simple server, written in PHP, that takes an ISBN (International Standard Book Number) as input, performs a lookup in an imaginary database, and returns the price of the corresponding book. In it, I use the soap_server class, and four methods of that class: the soap_server constructor, register, fault, and service:
<?php
// function to get price from database
function lookup($ISBN) {
    $query = "select price from books where isbn = ". $ISBN;
    if (mysql_connect("localhost", "username", "passwd"))
    else { $error = "Database connection error";
        return $error; }
     if (mysql_select_db("books"))
    else { $error = "Database not found";
        return $error; }
      if ($result = mysql_query($query))
    else { $error = "mysql_error()";
        return $error; }
    $price = mysql_result($result, 0, 0);
    return $price;
    }
// include the SOAP classes
require_once('nusoap.php');
// create the server object
$server = new soap_server;
// register the lookup service
$server->register('lookup');
// if the lookup fails, return an error
if $price == 0 {
    $error = "Price lookup error";
    }
if (isset($error)) {
    $fault =
$server->fault('soap:Server','http://mydomain.com/booklookupscript.php',$err
or);
    }
// send the result as a SOAP response over HTTP
$server->service($HTTP_RAW_POST_DATA);
?>
The first method I use is the soap_server constructor, which creates the server object that will be doing all the work for me. I assign that object to $server. Next is register, which tells the server what to do (in this case, to call the lookup() function). The method’s one parameter is the name of the function. There are other optional parameters that can be used to define the namespace and the SOAPAction information as specified in the SOAP specification, but those aren’t necessary for this example. The general syntax of the register method is:
register(name, in, out, namespace, SOAPAction, style)
The first parameter is the only mandatory one. in and out are arrays of input and output values; namespace and SOAPAction are used in accordance with the SOAP spec. Finally, style is used to indicate whether the data being sent is literal XML data (the default, and what I use in these examples) or RPC serialized application data.
So, the function is executed, and the returned value is passed to the server object. Then the service method returns a SOAP response to the client that initiated the request. The argument to the service method is $HTTP_RAW_POST_DATA.

Dealing with Errors

Because databases are not perfect, the script has a series of steps to catch errors. The lookup function contains three traps for different kinds of MySQL database errors. Each trap assigns an error identification string to the variable $error and returns that variable to the main function. Additionally, the main function tests the $price variable to ensure that it’s not set to zero, which would indicate a defective entry in the database.
If any one of these traps finds an error, NuSOAP’s fault method is called. This halts execution of the server script and returns the method’s parameters to the client as the string variable $fault. The syntax of the fault method is:
fault(faultcode, faultactor, faultstring, faultdetail)
The first two arguments are required, the latter two are optional. For the faultcode argument, a machine-readable fault code must be provided, as described in the SOAP spec. There are four predefined fault codes in the specification: VersionMismatch, MustUnderstand, Client, and Server. These must be given as qualified names in the namespace by prefixing them with SOAP-ENV:. A VersionMismatch error indicates incompatible namespaces. A MustUnderstand error is used when it comes across a mandatory header entry that it doesn’t understand. Client is used when the error lies in the message that was received from the client. And Server indicates a problem encountered during processing on the server, unaffiliated with the SOAP message per se. This latter code is what I used in the script when there’s a problem with the database lookup.
The faultactor argument should contain the URI where the problem originated. This is more important for transactions where numerous intermediaries are involved. In this example, I use the URI of the server script. (Note: the NuSOAP documentation implies that the faultactor element should be set to either “client” or “server.” The SOAP specification, however, says it should be a URI.)
faultstring and faultdetail are set aside for explaining the fault in human-readable language. faultstring should be a brief message indicating the nature of the problem, while faultdetail can go into more detail—it can even contain an array with specific itemized information about the fault. In my example, I pass the $error string to faultstring, and omit faultdetail.

A SOAP Client

Now I’ll write a client for an existing SOAP server, so you can see it in action. I’ll use XMethods’ Barnes & Noble Price Quote server, which acts a lot like the example server, above. It takes an ISBN as input and returns price data from Barnes & Noble.
The client script will need to send a request containing an ISBN and then parse the response. In this script, I use the soapclient class, its constructor, and call, which handles making a request and parsing the response all in one. The only method available on the server is GetPrice, which takes only one parameter, a string called isbn. It returns a floating-point variable called return.
<?php
// include the SOAP classes
require_once('nusoap.php');
// define parameter array (ISBN number)
$param = array('isbn'=>'0385503954');
// define path to server application
$serverpath ='http://services.xmethods.net:80/soap/servlet/rpcrouter';
//define method namespace
$namespace="urn:xmethods-BNPriceCheck";
// create client object
$client = new soapclient($serverpath);
// make the call
$price = $client->call('getPrice',$param,$namespace);
// if a fault occurred, output error info
if (isset($fault)) {
        print "Error: ". $fault;
        }
else if ($price == -1) {
        print "The book is not in the database.";
} else {
        // otherwise output the result
        print "The price of book number ". $param[isbn] ." is $". $price;
        }
// kill object
unset($client);
?>
The soapclient constructor takes a server URL as its argument. Having thus initialized the server object, I pass to the call method the name of the function I want (getPrice), the necessary parameters (the array containing the ISBN string to look up), and the required method namespace: urn:xmethods-BNPriceCheck.
The parameters for soapclient’s call method are: function name, parameter array, and three optional ones: namespace, SOAPAction, and an array of headers. The definition for the server will specify which, if any, of the optional parameters are necessary. The Barnes & Noble Price Quote server requires a method namespace definition (urn:xmethods-BNPriceCheck) but no SOAPAction or SOAP headers. Information about what this server offers and what it requires was gleaned from the server’s listing on XMethods’ index of SOAP servers. (This particular server happens to be hosted by XMethods, but the index lists a wide variety of servers, regardless of host.)
The call method of the client performs the SOAP transaction and returns the content of the server’s response to the $price variable. The script checks for the presence of $fault, which the server returns if there was an error in the transaction. If the $fault variable is set, the script outputs the error information. If there isn’t an error, it checks to see if the price returned is -1, which indicates that the requested book was not found. Otherwise, the price data is printed.

A Closer Look at the Transaction

The actual XML message sent by the client to the server looks something like this:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<SOAP-ENV:Body>
<ns1:getPrice xmlns:ns1="urn:xmethods-BNPriceCheck"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<isbn xsi:type="xsd:string">0385503954</isbn>
</ns1:getPrice>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
The Envelope tag contains pointers to the global namespace definitions. It also includes pointers to the SOAP envelope schema hosted on xmlsoap.org and to the W3C’s XML schema definition. These tell the server where it’s getting definitions for the various XML tags that it’s using. The XMLSchema class (which is, as of this writing, only experimental) can be used to work with aspects of the XML schema.
The schema definition is set automatically by NuSOAP to http://www.w3.org/2001/XMLSchema. If you wish to change this, you must set the $XMLSchemaVersion global variable:
$XMLSchemaVersion = 'http://www.my.org/MYSchema/';
Detailed discussion of the ins and outs of the W3C’s XML schema can be found in O’Reilly’s new book on the subject.
Within the Envelope tag is the Body tag, which contains the body of the message. Its attributes are determined by the parameters of the function call. The name of the remote method, the method namespace, and the actual content of the message—the ISBN string—are set by the client script. NuSOAP automatically detects the variable type and incorporates the type namespace (xsd:string) in the isbn tag. If a SOAPAction had been set in the script, that would appear as a SOAPAction HTTP header.
The encoding style is set by default to http://schemas.xmlsoap.org/soap/encoding/. This is pre-set by NuSOAP as the SOAP-ENC element of the public array called namespaces. To change it, simply include a line in your script like:
$namespaces[SOAP-ENC] = 'http://my.special.encoding';
The same technique can be used to change other namespace values, if necessary. The keys of the namespaces array are SOAP-ENV, xsd, xsi, SOAP-ENC, and si, corresponding to the namespace URIs for the envelope schema, the XML schema definition (equal to $XMLSchemaVersion), the XML schema instance, the encoding style, and the SOAP interoperability test URI, respectively. The default settings for these should not need to be changed under ordinary circumstances.
The server’s XML response to the request looks like this:
<SOAP-ENV:Envelope
xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/"
xmlns:xsi="http://www.w3.org/1999/XMLSchema-instance"
xmlns:xsd="http://www.w3.org/1999/XMLSchema">
<SOAP-ENV:Body>
<ns1:getPriceResponse xmlns:ns1="urn:xmethods-BNPriceCheck"
SOAP-ENV:encodingStyle="http://schemas.xmlsoap.org/soap/encoding/">
<return xsi:type="xsd:float">14.65</return>
</ns1:getPriceResponse>
</SOAP-ENV:Body>
</SOAP-ENV:Envelope>
The envelope is pretty much the same as that of the request, though you’ll notice that the server uses an older XML schema than the client. The body is also similar: the method namespace and the encoding style are the same. The ns1 package tag has Response appended to its name now: <ns1:getPriceResponse>. And where the request had an element called isbn, here the core of the response is called return, and the data type is specified as float. PHP is weakly typed, so NuSOAP assigns variable types automatically.

Conclusion

NuSOAP makes working with SOAP very easy by automatically handling the complexity, although it also provides a fair amount of access to the flexibility and nuance underneath. The call method of the soapclient class and the register method of the soap_server class do a lot of work that many other SOAP implementations make you do by hand. NuSOAP offers some access to the underlayer now, and will allow more as development proceeds.
To learn more about the details of working with SOAP, refer to the SOAP specification and the API documentation that comes with NuSOAP. If you encounter a specific question about how NuSOAP handles SOAP transactions, it can be helpful to look at the nusoap.php file, which is clearly organized by class and decently commented. Going to the source, as it were, should answer most questions.

Top 10 Best SEO Web Tools

Just starting to understand how important SEO is to your overall web strategy? Well, fear not, I have put together a list of 10 of my favorite free web-based SEO tools that will help you to SEO your site like a pro. Why pay beaux coup bucks for something you can do yourself with the right tools? So, without further ado, here are my very favorite SEO tools for optimizing real estate blogs and websites:
  1. We Build Web Pages: This tool allows you to compare your site against the top 10 listed sites in Google for a particular keyword or phrase. You simply enter your keyword and the system will begin comparing 9 different metrics for the top 10 rankings. This allows you to quickly compare the Yahoo rank, MSN rank, Google pages indexed, altavista pages indexed, backlinks to page, backlinks to domain, all in anchor rank, age of url, and whether or not the phrase is on the page. Check out the rest of their SEO tools here.Read also: How to select 100% Yahoo! relevent keywords
  2. Zippy: Zippy is a new meta search engine that queries other major engines and returns results in a format most suited for Webmasters and SEOs. The site was launched in September by the seasoned SEO Dave Naylor, and provides some valuable tools for site optimization.
  3. URL Trends: UrlTrends allows you to see at a glance any of the information that you want to see about all of the selected urls. Whether you only want to see the PageRank of the Urls, or everything they monitor, you can now do so very easily.Read also: URGENT: Get ready for the next PageRank update
  4. NUAH: Super cool site. Crawls your entire site, see how many pages you have, their respective PageRanks, keywords and descriptions. Also has good sitemap generators for both Yahoo! and Google. Read also: How to increase visitor retention, leads and PageRank with links.
  5. Widexl: I love this tool. I mostly use the Meta Tag Analyzer and the Link Popularity tools but each of their tools is powerful and useful. Widexl is a web developers best friend. They offer the following free hosted tools to assist in SEO: Read also: Ultimate guide to backlinks
  6. iWebTool: I dont think an RSS Pieces post would be complete without mentioning the backlinks and PageRank predictor from iwebtool. However, iWebTool has about a ton of other very useful SEO tools, that deserve mentioning as well such as a Google Banned Checker, Link Popularity, Keyword Suggestion, Search Engine Position, etc.Read also: Where does your site rank in the SERPs?
  7. Ezer: Need to check your Page Rank across all the data centers? This is a very handy tool. You can even get an idea of whe they are about to update PR with this tool.
  8. Link Harvester: Are you looking to do some in-depth link analysis? If so, the Link Harvester is certainly for you. This tool is much more advanced than most link analysis software currently on the market. Link Harvester:
    • quickly finds almost every single site that is linking to you
    • reports how many pages of your web site are indexed
    • reports how many links are pointing to any specific page
    • provides links to the Wayback Machine and WhoIs source next to each domain
    • reports the total number of inbound links, home page links, and deep link ratio
    • quickly grabs the number of .gov, .edu, and .ac.uk inbound links
    • indicates if a site links to your site more than 5 times by bolding that particular link in the list of results.Read also: See your site like a search engine does
  9. HitTail: This tool is endlessly useful. I, personally, find that most people dont understand the long tail search. HitTail reveals in real-time the least utilized, most promising keywords hidden in the Long Tail of your natural search results. It presents these terms to you as suggestions that when acted on can boost the natural search results of your site.
    Read also: RSS Pieces SEO Tools
  10. Firefox SEO extension: When Aaron Wall of SEOBook says something is cool, we take note. I have been using this very cool SEO extension for Firefox (my favorite browser) for quite some time now and love it. I can click on the tool to display meta tags for any site I visit without having to switch to code view and scroll all over creation for them. It does much more than just that but that is one of my favorite features.

10 Promising Free Web Analytics Tools

Web analytics is the process of gathering and analyzing your web content’s data in order to glean meaningful information about how your site is being utilized by your users. There are plenty of Web analytics applications out there, and you probably already know the big guns such as Google Analytics, Crazy Egg, and remote-site services such as Alexa and Compete.
We go off the trodden path and explore a few lesser-known Web analytics options. In this article, you’ll find 10 excellent and free tools and applications to help you gather and analyze data about your web content.

1. Piwik

Piwik - screen shot.
Go to Live Demonstration of Piwik.
Piwik is an open-source Web analytics application developed using PHP and MySQL. It has a “plugins” system that allows for utmost extensibility and customization. Install only the plugins you need or go overboard and install them all – the choice is up to you. The plugins system, as you can imagine, also opens up possibilities for you to create your own custom extensions. This thing’s lightweight – the download’s only 1.9MB.

2. FireStats

FireStats - screen shot.
Go to Live Demonstration of FireStats.
FireStats is a simple and straight-forward Web analytics application written in PHP/MySQL. It supports numerous platforms and set-ups including C# sites, Django sites, Drupal, Joomla!, WordPress, and several others. Are you a resourceful developer who needs moar cowbell? FireStats has an excellent API that will assist you in creating your own custom apps or publishing platform components (imagine: displaying the top 10 most downloaded files in your WordPress site) based on your FireStats data.

3. Snoop

Snoop - screen shot.
Snoop is a desktop-based application that runs on the Mac OS X and Windows XP/Vista platforms. It sits nicely on your system status bar/system tray, notifying you with audible sounds whenever something happens. Another outstanding Snoop feature is the Name Tags option which allows you to “tag” visitors for easier identification. So when Joe over at the accounting department visits your site, you’ll instantly know.

4. Yahoo! Web Analytics

Yahoo! Web Analytics - screen shot.
Yahoo! Web analytics is Yahoo!’s alternative to the dominant Google Analytics. It’s an enterprise-level, robust web-based third-party solution which makes accessing data easy especially for multiple-user groups. It’s got all the things you’d expect from a comprehensive Web analytics tool such as pretty graphs, custom-designed (and printable) reports, and real-time data tracking.

5. BBClone

BBClone - screen shot.
Go to Live Demonstration of BBClone.
If you’re looking for a simple, server-side web application that doesn’t rely on third-party services to monitor your data, check out BBClone – a PHP-based server application that gives you a detailed overview of website traffic and visitor data. It supports language localization for 32 languages like English, Chinese, German, and Japanese. It easily integrates with popular publishing platforms like Drupal, WordPress, and Textpattern. Since it’s logfile-based, it doesn’t require you to use a server-side relational database.

6. Woopra

Woopra - screen shot.
Woopra is a Web analytics application written in Java. It’s split into two parts which includes a desktop application for data analysis/exploration and a web service to monitor website statistics. Woopra has a robust user interface, an intuitive management system that allows you to run it on multiple sites and domains, and even a chat feature so that you can gather non-numerical information by talking to your site users. Woopra is currently in beta and requires you to request for a private beta registration.

7. JAWStats

JAWStats - screen shot.
JAWStats is a server-based Web analytics application that runs with the popular AWStats (in fact, if you’re on a shared hosting plan – AWStats is probably already installed). JAWStats does two things to extend AWStats – it improves performance by reducing server resource usage and improves the user interface a little bit. With that said, you can’t go wrong with just using AWStats either if you’re happy with it.

8. 4Q

4Q - screen shot.
A large part of Web analytics deals with number-crunching and numerical data. Raw numbers tells only part of the story and it’s often helpful to perform analytics by way of interacting with actual users. 4Q developer Avinash Kaushik puts it perfectly when he said: “Web analytics is good at the ‘What’. It is not good at the ‘Why’”.4Q is a simple surveying application focused on improving your traditional numerical Web analytics by supplementing it with actual user feedback. Check out this YouTube video on how easy it is to set up 4Q.

9. MochiBot

MochiBot - screen shot.
MochiBot is a free Web analytics/tracking tool especially designed for Flash assets. With MochiBot, you can see who’s sharing your Flash content, how many times people view your content, as well as helping you track where your Flash content is to prevent piracy and content theft. Installing MochiBot is a breeze; you simply copy a few lines of ActionScript code in the .FLA files you want to monitor.

10. Grape Web Statistics

Grape Web Statistics - screen shot.
Go to Live Demonstration of Grape Web Statistics.
Grape Web Statistics is a simple, open-source application geared towards web developers. It has a clean and usable interface and has an Extensions API to extend and customize your installation. It uses PHP for the backend and you can run it on any operating system that runs PHP.

Zend Framework

Zend Framework is an open source, object oriented web application framework for PHP 5. Zend Framework is often called a ‘component library’, because it has many loosely coupled components that you can use more or less independently. But Zend Framework also provides an advanced Model-View-Controller (MVC) implementation that can be used to establish a basic structure for your Zend Framework applications. A full list of Zend Framework components along with short descriptions may be found in the » components overview. This QuickStart will introduce you to some of Zend Framework’s most commonly used components, including Zend_Controller, Zend_Layout, Zend_Config, Zend_Db, Zend_Db_Table, Zend_Registry, along with a few view helpers.
Using these components, we will build a simple database-driven guest book application within minutes. The complete source code for this application is available in the following archives:

Model-View-Controller

So what exactly is this MVC pattern everyone keeps talking about, and why should you care? MVC is much more than just a three-letter acronym (TLA) that you can whip out anytime you want to sound smart; it has become something of a standard in the design of modern web applications. And for good reason. Most web application code falls under one of the following three categories: presentation, business logic, and data access. The MVC pattern models this separation of concerns well. The end result is that your presentation code can be consolidated in one part of your application with your business logic in another and your data access code in yet another. Many developers have found this well-defined separation indispensable for keeping their code organized, especially when more than one developer is working on the same application.
Note: More Information
Let’s break down the pattern and take a look at the individual pieces:
learning.quickstart.intro.mvc.png
  • Model – This is the part of your application that defines its basic functionality behind a set of abstractions. Data access routines and some business logic can be defined in the model.
  • View – Views define exactly what is presented to the user. Usually controllers pass data to each view to render in some format. Views will often collect data from the user, as well. This is where you’re likely to find HTML markup in your MVC applications.
  • Controller – Controllers bind the whole pattern together. They manipulate models, decide which view to display based on the user’s request and other factors, pass along the data that each view will need, or hand off control to another controller entirely. Most MVC experts recommend
source : http://framework.zend.com/manual/en/learning.quickstart.intro.html
download : http://framework.zend.com/download/overview

Web service

A web service is a method of communication between two electronic devices.
The W3C defines a “web service” as “a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically Web Services Description Language WSDL). Other systems interact with the web service in a manner prescribed by its description using SOAP messages, typically conveyed using HTTP with an XML serialization in conjunction with other Web-related standards.”[1]
The W3C also states, “We can identify two major classes of Web services, REST-compliant Web services, in which the primary purpose of the service is to manipulate XML representations of Web resources using a uniform set of “stateless” operations; and arbitrary Web services, in which the service may expose an arbitrary set of operations.”[2]

Web services architecture.

Big web services

“Big web services” use Extensible Markup Language (XML) messages that follow the SOAP standard and have been popular with traditional enterprise. In such systems, there is often a machine-readable description of the operations offered by the service written in the Web Services Description Language (WSDL). The latter is not a requirement of a SOAP endpoint, but it is a prerequisite for automated client-side code generation in many Java and .NET SOAP frameworks (frameworks such as Spring, Apache Axis2 and Apache CXF being notable exceptions). Some industry organizations, such as the WS-I, mandate both SOAP and WSDL in their definition of a web service.

Web API


Web services in a service-oriented architecture.
Web API is a development in web services (in a movement called Web 2.0) where emphasis has been moving away from SOAP based services towards Representational State Transfer (REST) based communications.[3] REST services do not require XML, SOAP, or WSDL service-API definitions.
Web APIs allow the combination of multiple web services into new applications known as mashups.[4]
When used in the context of web development, Web API is typically a defined set of Hypertext Transfer Protocol (HTTP) request messages along with a definition of the structure of response messages, usually expressed in an Extensible Markup Language (XML) or JavaScript Object Notation (JSON) format.
When running composite web services, each sub service can be considered autonomous. The user has no control over these services. Also the web services themselves are not reliable; the service provider may remove, change or update their services without giving notice to users. The reliability and fault tolerance is not well supported; faults may happen during the execution. Exception handling in the context of web services is still an open research issue. Still it can be handled by responding with an error object to the client.

Remote procedure calls


Architectural elements involved in the XML-RPC.
Main article: Remote procedure call
RPC web services present a distributed function (or method) call interface that is familiar to many developers. Typically, the basic unit of RPC web services is the WSDL operation.
The first web services tools were focused on RPC, and as a result this style is widely deployed and supported. However, it is sometimes criticized for not being loosely coupled, because it was often implemented by mapping services directly to language-specific functions or method calls. Many vendors felt this approach to be a dead end, and pushed for RPC to be disallowed in the WS-I Basic Profile.
Other approaches with nearly the same functionality as RPC are Object Management Group‘s (OMG) Common Object Request Broker Architecture (CORBA), Microsoft‘s Distributed Component Object Model (DCOM) or Sun Microsystems‘s Java/Remote Method Invocation (RMI).

Service-oriented architecture

Web services can also be used to implement an architecture according to service-oriented architecture (SOA) concepts, where the basic unit of communication is a message, rather than an operation. This is often referred to as “message-oriented” services.
SOA web services are supported by most major software vendors and industry analysts. Unlike RPC web services, loose coupling is more likely, because the focus is on the “contract” that WSDL provides, rather than the underlying implementation details.
Middleware analysts use enterprise service buses which combine message-oriented processing and web services to create an event-driven SOA. One example of an open-source ESB is Mule, another one is Open ESB.

Representation of concepts defined by WSDL 1.1 and WSDL 2.0 documents.

Representational state transfer (REST)

REST attempts to describe architectures which use HTTP or similar protocols by constraining the interface to a set of well-known, standard operations (like GET, POST, PUT, DELETE for HTTP). Here, the focus is on interacting with stateful resources, rather than messages or operations.
An architecture based on REST (one that is ‘RESTful’) can use WSDL to describe SOAP messaging over HTTP, can be implemented as an abstraction purely on top of SOAP (e.g., WS-Transfer), or can be created without using SOAP at all.
WSDL version 2.0 offers support for binding to all the HTTP request methods (not only GET and POST as in version 1.1) so it enables a better implementation of RESTful Web services.[5] However, support for this specification is still poor in software development kits, which often offer tools only for WSDL 1.1.

Automated design methodologies

Automated tools can aid in the creation of a web service. For services using WSDL it is possible to either automatically generate WSDL for existing classes (a bottom-up strategy) or to generate a class skeleton given existing WSDL (a top-down strategy).
  • A developer using a bottom up method writes implementing classes first (in some programming language), and then uses a WSDL generating tool to expose methods from these classes as a web service [1]. This is often the simpler approach.
  • A developer using a top down method writes the WSDL document first and then uses a code generating tool to produce the class skeleton, to be completed as necessary. This way is generally considered more difficult but can produce cleaner designs [2]

Criticisms

Critics of non-RESTful web services often complain that they are too complex[6] and based upon large software vendors or integrators, rather than typical open source implementations. There are open source implementations like Apache Axis and Apache CXF.
One key concern of the REST web service developers is that the SOAP WS toolkits make it easy to define new interfaces for remote interaction, often relying on introspection to extract the WSDL, since a minor change on the server (even an upgrade of the SOAP stack) can result in different WSDL and a different service interface.[7] The client-side classes that can be generated from WSDL and XSD descriptions of the service are often similarly tied to a particular version of the SOAP endpoint and can break if the endpoint changes or the client-side SOAP stack is upgraded. Well-designed SOAP endpoints (with handwritten XSD and WSDL) do not suffer from this but there is still the problem that a custom interface for every service requires a custom client for every service.
There are also concerns about performance due to web services’ use of XML as a message format and SOAP/HTTP in enveloping and transport, such as that published by the University of Wollongong in 2005 by N.A.B.Gray.[8]

AJAX

Ajax (pronounced /ˈeɪdʒæks/; shorthand for Asynchronous JavaScript and XML)[1] is a group of interrelated web development methods used on the client-side to create interactive web applications. With Ajax, web applications can retrieve data from the server asynchronously in the background without interfering with the display and behavior of the existing page. Data is usually retrieved using the XMLHttpRequest object. Despite the name, the use of XML is not needed, and the requests need not be asynchronous.[2]
Like DHTML and LAMP, Ajax is not one technology, but a group of technologies. Ajax uses a combination of HTML and CSS to mark up and style information. The DOM is accessed with JavaScript to dynamically display, and to allow the user to interact with the information presented. JavaScript and the XMLHttpRequest object provide a method for exchanging data asynchronously between browser and server to avoid full page reloads.

History

In the 1990s, most web sites were based on complete HTML pages; each user action required that the page be re-loaded from the server (or a new page loaded). This process is inefficient, as reflected by the user experience: all page content disappears then reappears, etc. Each time a page is reloaded due to a partial change, all of the content must be re-sent instead of only the changed information. This can place additional load on the server and use excessive bandwidth.
Asynchronous loading of content first became practical when Java applets were introduced in the first version of the Java language in 1995. These allow compiled client-side code to load data asynchronously from the web server after a web page is loaded.[3] In 1996, Internet Explorer introduced the iframe element to HTML, which also enabled asynchronous loading.[4] In 1999, Microsoft created the XMLHTTP ActiveX control in Internet Explorer 5, which was later adopted by Mozilla, Safari, Opera and other browsers as the XMLHttpRequest JavaScript object.[4][5] Microsoft has adopted the native XMLHttpRequest model as of Internet Explorer 7, though the ActiveX version is still supported. The utility of background HTTP requests to the server and asynchronous web technologies remained fairly obscure until it started appearing in full scale online applications such as Outlook Web Access (2000)[6] and Oddpost (2002), and later, Google made a wide deployment of Ajax with Gmail (2004) and Google Maps (2005).[7]
The term Ajax was coined on February 18, 2005 by Jesse James Garrett in an article entitled Ajax: A New Approach to Web Applications.[1]
On April 5, 2006 the World Wide Web Consortium (W3C) released the first draft specification for the XMLHttpRequest object in an attempt to create an official web standard.[7]

Technologies

The term Ajax has come to represent a broad group of web technologies that can be used to implement a web application that communicates with a server in the background, without interfering with the current state of the page. In the article that coined the term Ajax,[1] Jesse James Garrett explained that the following technologies are incorporated:
Since then, however, there have been a number of developments in the technologies used in an Ajax application, and the definition of the term Ajax. In particular, it has been noted that JavaScript is not the only client-side scripting language that can be used for implementing an Ajax application; other languages such as VBScript are also capable of the required functionality.[2][8] (However, JavaScript is the most popular language for Ajax programming due to its inclusion in and compatibility with the majority of modern web browsers.) Also, XML is not required for data interchange and therefore XSLT is not required for the manipulation of data. JavaScript Object Notation (JSON) is often used as an alternative format for data interchange,[9] although other formats such as preformatted HTML or plain text can also be used.[10]

Drawbacks

  • Pages dynamically created using successive Ajax requests do not automatically register themselves with the browser’s history engine, so clicking the browser’s “back” button may not return the browser to an earlier state of the Ajax-enabled page, but may instead return to the last full page visited before it. Workarounds include the use of invisible iframes to trigger changes in the browser’s history and changing the URL fragment identifier (the part of a URL after the ‘#’) when Ajax is run and monitoring it for changes.[11][12]
  • Dynamic web page updates also make it difficult to bookmark a particular state of the application. Solutions to this problem exist, many of which use the URL fragment identifier (the part of a URL after the ‘#’) to keep track of, and allow returning to, the application in a given state.[11][12]
  • Depending on the nature of the Ajax application, dynamic page updates may interfere disruptively with user interactions, especially if working on an unstable internet connection. For instance, editing a search field may trigger a query to the server for search completions, but the user may not know that a search completion popup is forthcoming, and if the internet connection is slow, the popup list may show up at an inconvenient time, when the user has already proceeded to do something else.
  • Because most web crawlers do not execute JavaScript code,[13] publicly indexable web applications should provide an alternative means of accessing the content that would normally be retrieved with Ajax, thereby allowing search engines to index it.
  • Any user whose browser does not support JavaScript or XMLHttpRequest, or simply has this functionality disabled, will not be able to properly use pages which depend on Ajax. Similarly, devices such as mobile phones, PDAs, and screen readers may not have support for the required technologies. Screen readers that are able to use Ajax may still not be able to properly read the dynamically generated content.[14] The only way to let the user carry out functionality is to fall back to non-JavaScript methods. This can be achieved by making sure links and forms can be resolved properly and do not rely solely on Ajax.[15]
  • The same origin policy prevents some Ajax techniques from being used across domains,[7] although the W3C has a draft of the XMLHttpRequest object that would enable this functionality.[16] Methods exist to sidestep this security feature by using a special Cross Domain Communications channel embedded as an iframe within a page,[17] or by the use of JSONP.
  • Ajax-powered interfaces may dramatically increase the number of user-generated requests to web servers and their back-ends (databases, or other). This can lead to longer response times and/or additional hardware needs.
  • The asynchronous, callback-style of programming required can lead to complex code that is hard to maintain or debug. [18]
  • Ajax cannot easily be read by screen-reading technologies, such as JAWS, without hints built into the corresponding HTML based on WAI-ARIA standards. Screen-reading technologies are used by individuals with an impairment that hinders or prevents the ability to read the content on a screen.[19]

PHP CLI

What is PHP CLI? 
PHP CLI is a short for PHP Command Line Interface. As the name implies, this is a way of using PHP in the system command line. Or by other words it is a way of running PHP Scripts that aren’t on a web server (such as Apache web server or Microsoft IIS). People usually treat PHP as web development, server side tool. However, PHP CLI applies all advantages of PHP to shell scripting allowing to create either service side supporting scripts or system application even with GUI!
PHP CLI is available on all popular operating systems: Linux, Windows, OSX, Solaris. Popular Linux distributions (such as Ubuntu, Debian, Fedora Core, Suse and etc.) allow to install PHP CLI from package manager (e.g. Synaptic or similar) with couple of mouse clicks. This makes installation hassle free and you can start using it within a seconds!
PHP CLI SAPI was first released in PHP 4.2.0 as experimental, but as of version PHP 4.3.0 (including PHP5), it is fully supported and enabled by default. PHP CLI is just a new SAPI type (Server Application Programming Interface) that focused on developing shell (or desktop as well) applications with PHP. It’s worth mentioning that PHP CLI and PHP CGI are different SAPI’s although they do share many of the same behaviours.
If you have standard installation of PHP for Apache web server, then there are very high chances that you already have PHP CLI installed on your system. You chances are even higher if your system is running Linux. If you unlucky enough not to have it buy default, then you need to recompile your PHP with the –enable-cli flag or reinstall from the package that does have it. If you are running Windows, then you probably need to add php executable to your system path.
The simplest PHP CLI script on Linux would look like this: 
#!/usr/bin/php -q
<?php echo “Hello world of PHP CLI!”; ?>
Windows user would need to amend the first line with appropriate windows style path to php.exe:
#!C:\php\php.exe -q
<?php echo “Hello world of PHP CLI!”; ?>
Why to use PHP CLI? One want to use PHP CLI SAPI simply because there are several advantages in being able to run PHP code from command line such as: 
  • no need to learn another language such as Perl, Bash or Awk
  • running scheduled (CRON) tasks written in php
  • making GUI applications with PHP and GTK
  • reusage of your existing components
  • write very robust scripts for your system by using PHP5 multithreading capabilities
  • access system STDIN, STDOUT, STERR with PHP
Command Line Interface and CRON : Running PHP scripts in CRON allows you to do scheduled tasks. For example, check weather links on your links page are still alive or run scheduled billing or maintain customers accounts (email customers that their account has to be renewed in next X days, backup for files and folders, creating health checkers (pinging and retrieving your remote servers or web pages regularly to make sure they are working and emailing you if they are down or slow), staring other programs) or anything else that you can imagine. CRON is available only for Linux. However, if you are a Windows user you may use AT command to schedule execution time.
Command Line Interface and GTK : YES! You can do GUI (graphical user interface) applications for your Windows or Linux box with PHP! All you need is PHP Command Line Interface and GTK in a pack. This will allow to create really portable GUI applications and won’t require to learn anything else. Just PHP. GTK is available both on Linux and Windows.
Command Line Interface and Your Components : Bet you have already written good and re-usable classes or functions that you are using in every project. For example, database abstraction layer or just a DB connection function or it might be a logging script or date calculation class / library. With PHP Command Line Interface you can reuse them in command line without reinventing the wheel by doing the same in Perl or any other command line enabled language.
Command Line Interface and PHP5 Multithreading : If you have a bottle neck in database or network connection then you can speed up your script up to 1000% just by implementing PHP5 multithreading. For example, you may spend 10 seconds just to establish http connection when you fopening remote http page and just 1 second to retrieve the content. If you need to fopen 1000 pages one by one then you will spend 10*1000+1*1000 = 11000 seconds (3 hours and 3 minutes)! If you run 100 threads then you will spend (10*1000+1*1000)/100 = 110 seconds (less then 2 minutes!). Obviously, you will need powerful enough CPU, enough memory and network bandwidth.
source : http://www.php-cli.com/

cron

Cron is a time-based job scheduler in Unix-like computer operating systems. The name cron comes from the word “chronos”, Greek for “time”. Cron enables users to schedule jobs (commands or shell scripts) to run periodically at certain times or dates. It is commonly used to automate system maintenance or administration, though its general-purpose nature means that it can be used for other purposes, such as connecting to the Internet and downloading email.

Overview

Cron is driven by a crontab (cron table) file, a configuration file that specifies shell commands to run periodically on a given schedule. The crontab files are stored where the lists of jobs and other instructions to the cron daemon are kept. Users can have their own individual crontab files and often there is a system wide crontab file (usually in /etc or a subdirectory of /etc) which only system administrators can edit.
Each line of a crontab file represents a job and is composed of a CRON expression, followed by a shell command to execute. Some implementations of cron, such as that in the popular 4th BSD edition written by Paul Vixie and included in many Linux distributions, add a sixth field to the format: an account username that the specified job will be run by (subject to user existence and permissions). This is only allowed in the system crontabs, not in others which are each assigned to a single user to configure.
For “day of the week” (field 5), both 0 and 7 are considered Sunday, though some versions of Unix such as AIX do not list “7″ as acceptable in the man page. While normally the job is executed when the time/date specification fields all match the current time and date, there is one exception: if both “day of month” and “day of week” are restricted (not “*”), then either the “day of month” field (3) or the “day of week” field (5) must match the current day.

Examples

The following will clear the Apache error log at one minute past midnight (00:01 of every day of the month, of every day of the week), assuming that the default shell for the cron user is Bourne shell compliant.
1 0 * * *  echo -n "" > /www/apache/logs/error_log
The following would run cron every two hours like 2am, 4am, 6am, 8am
0 */2 * * *  /home/user/test.pl

Predefined scheduling definitions

There are several special predefined values which can be used to substitute the CRON expression.
Entry Description Equivalent To
@yearly (or @annually) Run once a year 0 0 1 1 *
@monthly Run once a month 0 0 1 * *
@weekly Run once a week 0 0 * * 0
@daily (or @midnight) Run once a day 0 0 * * *
@hourly Run once an hour 0 * * * *
@reboot Run at startup
*     *     *   *    *        command to be executed
-     -     -   -    -
|     |     |   |    |
|     |     |   |    +----- day of week (0 - 6) (Sunday=0)
|     |     |   +------- month (1 - 12)
|     |     +--------- day of        month (1 - 31)
|     +----------- hour (0 - 23)
+------------- min (0 - 59)
@reboot configures a job to run once when the daemon is started. Since cron is typically never restarted, this typically corresponds to the machine being booted. This behavior is enforced in some variations of cron, such as that provided in Debian[3], so that simply restarting the daemon does not re-run @reboot jobs.
@reboot can be useful if there is a need to start up a server or daemon under a particular user, and the user does not have access to configure init to start the program.

Cron permissions

The following two files play an important role:
  • /etc/cron.allow – If this file exists, then you must be listed therein (your username must be listed) in order to be allowed to use cron jobs.
  • /etc/cron.deny – If the cron.allow file does not exist but the /etc/cron.deny file does exist, then you must not be listed in the /etc/cron.deny file in order to use cron jobs.
Please note that if neither of these files exists, then depending on site-dependent configuration parameters, only the super user will be allowed to use cron jobs, or all users will be able to use cron jobs.

Timezone handling

Most cron implementations simply interpret crontab entries in the system time zone setting under which the cron daemon itself is run. This can be a source of dispute if a large multiuser machine has users in several time zones, especially if the system default timezone includes the potentially confusing DST. Thus, a cron implementation may special-case any “TZ=<timezone>” environment variable setting lines in user crontabs, interpreting subsequent crontab entries relative to that timezone.[4]

History

Early versions

The cron in Version 7 Unix, written by Brian Kernighan, was a system service (later called daemons) invoked from /etc/inittab when the operating system entered multi-user mode. Its algorithm was straightforward:
  1. Read /usr/etc/crontab
  2. Determine if any commands are to be run at the current date and time and if so, run them as the Superuser, root.
  3. Sleep for one minute
  4. Repeat from step 1.
This version of cron was basic and robust but it also consumed resources whether it found any work to do or not. In an experiment at Purdue University in the late 1970s to extend cron’s service to all 100 users on a time-shared VAX, it was found to place too much load on the system.

Multi-user capability

The next version of cron, with the release of Unix System V, was created to extend the capabilities of cron to all users of a Unix system, not just the superuser. Though this may seem trivial today with most Unix and Unix-like systems having powerful processors and small numbers of users, at the time it required a new approach on a 1 MIPS system having roughly 100 user accounts.
In the August, 1977 issue of the Communications of the ACM, W. R. Franta and Kurt Maly published an article entitled “An efficient data structure for the simulation event set” describing an event queue data structure for discrete event-driven simulation systems that demonstrated “performance superior to that of commonly used simple linked list algorithms,” good behavior given non-uniform time distributions, and worst case complexity O\left(\sqrt{n}\right), “n” being the number of events in the queue.
A graduate student, Robert Brown, reviewing this article, recognized the parallel between cron and discrete event simulators, and created an implementation of the Franta-Maly event list manager (ELM) for experimentation. Discrete event simulators run in “virtual time”, peeling events off the event queue as quickly as possible and advancing their notion of “now” to the scheduled time of the next event. By running the event simulator in “real time” instead of virtual time, a version of cron was created that spent most of its time sleeping, waiting for the moment in time when the task at the head of the event list was to be executed.
The following school year brought new students into the graduate program, including Keith Williamson, who joined the systems staff in the Computer Science department. As a “warm up task” Brown asked him to flesh out the prototype cron into a production service, and this multi-user cron went into use at Purdue in late 1979. This version of cron wholly replaced the /etc/cron that was in use on the Computer Science department’s VAX 11/780 running 32/V.
The algorithm used by this cron is as follows:
  1. On start-up, look for a file named .crontab in the home directories of all account holders.
  2. For each crontab file found, determine the next time in the future that each command is to be run.
  3. Place those commands on the Franta-Maly event list with their corresponding time and their “five field” time specifier.
  4. Enter main loop:
    1. Examine the task entry at the head of the queue, compute how far in the future it is to be run.
    2. Sleep for that period of time.
    3. On awakening and after verifying the correct time, execute the task at the head of the queue (in background) with the privileges of the user who created it.
    4. Determine the next time in the future to run this command and place it back on the event list at that time value.
Additionally, the daemon would respond to SIGHUP signals to rescan modified crontab files and would schedule special “wake up events” on the hour and half hour to look for modified crontab files. Much detail is omitted here concerning the inaccuracies of computer time-of-day tracking, Unix alarm scheduling, explicit time-of-day changes, and process management, all of which account for the majority of the lines of code in this cron. This cron also captured the output of stdout and stderr and e-mailed any output to the crontab owner.
The resources consumed by this cron scale only with the amount of work it is given and do not inherently increase over time with the exception of periodically checking for changes.
Williamson completed his studies and departed the University with a Masters of Science in Computer Science and joined AT&T Bell Labs in Murray Hill, New Jersey, and took this cron with him. At Bell Labs, he and others incorporated the Unix at command into cron, moved the crontab files out of users’ home directories (which were not host-specific) and into a common host-specific spool directory, and of necessity added the crontab command to allow users to copy their crontabs to that spool directory.
This version of cron later appeared largely unchanged in Unix System V and in BSD and their derivatives, the Solaris Operating System from Sun Microsystems, IRIX from Silicon Graphics, HP-UX from Hewlett-Packard, and IBM AIX. Technically, the original license for these implementations should be with the Purdue Research Foundation who funded the work, but this took place at a time when little concern was given to such matters.

Modern versions

With the advent of the GNU Project and Linux, new crons appeared. The most prevalent of these is the Vixie cron, originally coded by Paul Vixie in 1987. Version 3 of Vixie cron was released in late 1993. Version 4.1 was renamed to ISC Cron and was released in January 2004. Version 3, with some minor bugfixes, is used in most distributions of Linux and BSDs.
In 2007, RedHat forked vixie-cron 4.1 to the cronie project and included anacron 2.3 in 2009.
Other popular implementations include anacron and fcron. However, anacron is not an independent cron program; it relies on another cron program to call it in order to perform.
A webcron solution schedules recurring tasks to run on a regular basis wherever cron implementations may not be available in a web hosting environment.
source : http://en.wikipedia.org/wiki/Cron