How to develop WhatWeb 0.4 plugins ---------------------------------- by Andrew Horton aka urbanadventurer. MorningStar Security http://www.morningstarsecurity.com/ Revision 1.1, 29th March 2010. Contents ================================================= 1. Introduction to WhatWeb 2. Introduction to WhatWeb plugins General aims of a plugin Methods to identify systems Important files and folders Anatomy of a plugin 3. Research background information 4. Collect samples Website Showcases Using Search Engines Forums for website development with the cms 5. Analyze samples Read the source of a couple of samples Collect HTML and HTTP headers from samples Remove incorrectly identified samples Examine the samples with WhatWeb Remove more incorrectly identified samples with the whatweb report Use find-common-stuff to automatically identify common strings in the samples Analyse HTTP headers and cookies Read more HTML source 6. Review of unique patterns identified 7. Write the plugin 8. Closing notes 9. Resources 1. Introduction to WhatWeb ================================================= WhatWeb lets you identify content management systems (CMS), blogging platforms, stats/analytics packages, javascript libraries, servers and more. When you visit a website in your browser the transaction includes many unseen hints about how the webserver is set up and what software is delivering the webpage. Some of these hints are obvious, eg. "Powered by XYZ" and others are more subtle. WhatWeb recognises these hints and reports what it finds. WhatWeb has many plugins and needs community support to develop more. Plugins can identify systems with obvious identifying hints removed by also looking for subtle clues. For example, a WordPress site might remove the tag but the WordPress plugin also looks for "wp-content" which is less easy to disguise. Plugins are flexible and can return any datatype, for example plugins can return version numbers, email addresses, account ID's and more. There are both passive and aggressive plugins, passive plugins use information on the page, in cookies and in the URL to identify the system. A passive request is as light weight as a simple GET / HTTP/1.1 request so it is suitable for large scale scanning of websites. Aggressive plugins guess URLs and request more files. 2. Introduction to WhatWeb Plugins ================================================= Plugins are easy to write, you don't need to know ruby to make them but it helps. General aims of a plugin ------------------------ Most plugins have a primary aim which is to identify a type of system based on signatures. The system could be a: * Content Management System * Javascript Library * HTTP Server * Application Framework Some plugins do not have the aim to identify a specific type of system. Instead they try to give information that can be used to identify unanticipated systems or can be used for all types of websites. These plugins are: * Title * MD5 hash * Meta generator tag name * Uncommon HTTP headers Methods to identify systems --------------------------- There are 4 main methods to identify a CMS or web application. They are: 1. Matching patterns in the HTTP headers and HTML of a simple webpage request 2. Testing for URLs and identifying patterns in the HTML 3. Testing for URLs and recognising the MD5 hash of the HTML 4. Testing for URLs and simply noting they exist or return an HTTP status 200 code. WhatWeb supports all 4 methods however the 1st method is the most useful in large scale scanning. It is also the most efficient by trading off knowledge for network bandwidth and time. Support for the first method is the most developed method within WhatWeb and is discussed in detail in this document. Future development of WhatWeb will add more user friendly support for methods 2 through 4 which come under the purview of aggressive plugins. Important files and folders --------------------------- The important folders to plugins are: * disabled-plugins/ * plugin-development/ * plugin-development/tests/ * plugins/ All .rb files in the plugins/ folder are loaded by WhatWeb. To disable a plugin, move it into the disabled-plugins/ folder. The plugin-development folder contains some tools that are useful in developing plugins. The tools are: * find-common-stuff - This searches for common strings among a set of HTML files * wget-list - This downloads a list of example websites The plugin-development/tests folder contains example webpages of CMS's to study. The wget-list will create two files for each example webpage. A .html file and a .meta file. Anatomy of a plugin ------------------- This is a typical plugin. It identifies the Drupal framework and it's split into sections and given line numbers. ->----------------------------------------------------------------------------------------------------------- 1 Plugin.define "Drupal" do 2 author "Andrew Horton" 3 version "0.1" 4 description "Drupal is an opensource CMS written in PHP. Homepage: http://www.drupal.org" -<----------------------------------------------------------------------------------------------------------- Line 1. has the name. This name can be referred to on the commandline in a case insensitive way. For example, the following works: $ ./whatweb -pdrupal www.example.com Line 2. has the author. Just fill in your name between the double quotes. Line 3. contains the version number. It's up to you what number to choose. Line 4. Contains the description. This should contain a description of what the plugin identifies that anyone can understand. It can be many lines but must start and end with double quotes. Note that the author, version and description follow the format: field-name field-content On the left is the name of the variable and on the right, separated by a space is the value. This type of variable declaration isn't ruby code, it's specific to the plugins and only works for certain variable names. The list of variable names that can be declared in a plugin in this manner are: * author * version * description * examples * matches ->----------------------------------------------------------------------------------------------------------- 5 # hard to identify 6 #Powered by Drupal, an open source content management system 7 # 8 # 9 # @import "/misc/drupal.css"; 10 # Set-Cookie: SESS6bdd09d4debccdc3a0f49becc449e8d5=2sq674vjn6vig48e3podh3j8e2; expires=Fri, 11 Dec 2009 15:37:52 GMT; path=/; domain=.moby.com 11 # Set-Cookie: SESS9795bcd4ea70e3f846e84f29f9491636=57eafcca6400d894772a136fb5889b92; expires=Fri, 11-Dec-2009 15:38:25 GMT; path=/; domain=.save-your-future.com 12 13 14 examples %w| amnesty.org/ appel.nasa.gov/ beta.worldbank.org/ entergy.pewclimate.org/ labs.divx.com/ lindenlab.com/ littlestarprints.com moby.com/ myplay.com/ sequelnaturals.com/ teen.secondlife.com/ www.artwaves.de www.asys.com.br/ www.atomicbop.net www.cristal.com.pe/?adulto=si www.dutchbutnotfromholland.eu/ www.elespectador.com/ www.ensembles.com.ph/ www.foxsearchlight.com/index.php www.freshbrain.org/ www.icsalabs.com/ www.johnnycashonline.com/ www.journalismcenter.org/ www.jovenscriativos.com.br/ www.koalafoundation.org.au/ www.la2day.com/ www.moove.be www.mtv.co.uk/channel/flux www.mulinobianco.it/ www.multiways.com/ www.nowpublic.com/ www.pravda.lt/ www.realismssoftware.com/ www.save-your-future.com www.shock.com.co/ www.sosojuicy.com/ www.spreadfirefox.com/ www.tidningenresultat.se www.ubuntu.com/ www.universitytowers.net/ www.warnerbrosrecords.com | 15 -<----------------------------------------------------------------------------------------------------------- Lines 5 through to 11 are comments. Each commented line must begin with a # character and this is a standard ruby way to comment code. Line 14 is a list of example websites. The examples prefix of %w| means an array of elements separated by whitespace. The individual examples are URLs. If they are missing the http:// or https:// then http:// is assumed. If you prefer you can list the examples like this: examples %w| http://www.example.com http://www.example2.com http://www.site.com/blah/ | ->----------------------------------------------------------------------------------------------------------- 16 matches [ 17 {:name=>"/misc/drupal.js", 18 :probability=>100, 19 :regexp=>/