THE IMPORTANCE OF BUG TESTING - Editorial by dethy _________________________________ 1. Software Development Stages i. What defines beta? ii. What defines alpha? iii. What defines stable 2. Why bug test? i. Importance to Client ii. Importance to Programmer 3. Development Goals i. Software testing vendor's goals ii. public's goal as bug testers 4. Software Testing Strategies i. Functional Prototypes ii. Designing Test Sets iii. Defect Testing iv. Acceptance Testing v. Structural Prototypes vi. Signs to observe 5. Bug discovery ? i. Alerting the Vendor ii. Alerting Clients 6. Final Note _________________________________ 1. Software Development Stages The following whitepaper discusses 'The Importance of Bug Testing' with respe ct to client and vendor environments. Various responsibilities are placed on either side o f product development, and it is necessary to understand the reasons behind practising secure code and ethical loyalty. In the Real World systems (hardware or software) often go through two stages of release testing: * Alpha (in-house) * Beta (out-house) What defines alpha ? The term 'alpha' was adopted from the Greek number '1', in the early 1960's f or computer terminology used to describe product cycle checkpoint, first used at IBM, and thus advanced to become a standard throughout the computer industry. This first phase of testing of a product/system is apart of the software deve lopment process. The Alpha development stage includes unit testing, component testing, and sys tem testing, but originally was endowed to feasibility and manufacturability evaluation do ne before any commitment to design and development. What defines beta ? The term 'beta' was adopted from the Greek number '2', again with IBM using t his terminology to categorise product development. In software development, a beta test is the second phase of software testing in which a sampling of the intended audience tries the product out. Beta testing can be considered "pre-release testing" of a product/system, in which there is a potential for code flaws/logic errors distributed throughout the program. However, in recent yea rs Beta test versions of software have become distributed to a wide audience on the World Wide Web partly to give the program a "real-world" test and partly to improve the func tionality by clients voicing their approval/(dis)satisifaction/comments about the product/ system. Often it is the case that a product may stay in the Beta stage for several ye ars, and could be considered stable, but it is the choice of the product's vendor to r eside the product's stage in Beta until furthermore rare bugs have been ironed out. Releasing an advisory could take place after testing a beta release of a prod uct. It is a general rule of thumb that most advisories come out as product testing of m ore stable applications, but it is know that in some cases an advisory for beta releases is necessary. Sometimes, it is the only way to get vendors to make a patch for t heir security risks involved in bug vulnerability discovery, additionally beta rel eases may stay in the phase for months or years, as was previously discussed. Releasing advisories for alpha products is somewhat non-sensical. Alpha releases are known to be h ighly unstable and should not be run without caution and hesitancy; most of the bug s are found at this stage, so releasing bug alerts publically at this time becomes relati vely trivial. What defines stable ? Stable is the final outcome of the developed product. After the Beta product has been decided to be fit for the task, with most of the known bugs fixed or patched, and the product successfully fulfills all requirements of functionality, it then term ed, 'stable'. Product testing has been utilised with most (if not all) flaws worked out, an d as much code optimisation has been implemented as possible for the end software appli cation. This is the accepted version of the product that is capable of handling data corre ctly, as needed. This phase attempts to please customers/clients to satisfy their need s as consumers and users of the product. 2. Why bug test ? It is often debated why people must test programs, as ethical people we do no t have to, but the world is not entirely full of ethical people who ensure correct data is computed into a system, that is why safe practises need to be developed. The only way for this to take place is through bug testing. There are two categories that effect both the client and the programmer, each have different needs and wants in terms of the impor tance of bug testing. * Importance to Client * Importance to Programmer In the client's perspective, having a stable program that is guaranteed to pe rforms it's desired task is not only a reflection of the program but also a reflection of the company itself. Poor products shines the light dimly on the company, that is why a so lid and well tested product needs to be entrusted through bug testing, before manufacturin g takes place. Whilst it is not always known by management about product flaws, company dire ctors assume that every function works smoothly without defects at all. However, experienc e shows that no product/system can be deemed completely secure without controversy. There will always be existence of bugs in program, whether they are found or not is another que stion. On the other hand, open source software is much easier to spot bugs and code fla ws, but active security checks through the public help create a much more stable and operable program. This is one of the reasons why Microsoft (c) products fail consisten tly when it comes to testing; their products are not open source, and therefore it is muc h harder to create a secure and flexible program without aid of the programming community to help optimise code. The importance to the client, purchaser of the software, is without doubt a k ey aspect in performing their daily tasks successfully. If the program was vulnerable to o verflows, lack of input checks, or even lack of encryption, the program would quickly b ecome known for its unstableness, and product sales will drop dramatically. Customers wil l purchase alternative products available that perform the same task, that have been car efully checked by multiple tests, as will be seen in the testing section of this doc ument. There is a high level of ethics involved when the programmer is contracted to develop a program. The programmer is the top of the chain for importance in testing and coding a proficient software application. He/she is responsible for ensuring all funct ions of the program work, and work efficiently; code optimisation should be at its peak, with security functions in check. Better programs are known to have been thoroughl y tested with all sorts of data sets been properly dealt with from within the program, operating systems like Linux are tested everyday by programmers, and hackers alike. Yes , security problems do exist in this environment but most have now been patched or fixed , pushing towards one of the most stable systems currently around. Sloppy programmers will not care about ethics, and will simply code the progr am to minimally function with all it's client side requirements implemented. Some p rograms deem financial security more important than ethical security - becareful of t hose whom you contract to fulfil your programming requirements. 3. Development Goals Goals should be adopted by programmers to ensure software quality assurance, but the customer has a responsibility to communicate to the programmer once a bug has been found. Software testing vendors goals The most important primary goal of a programmer is to actually complete a wor king program that serves purpose to client-side requirements. Once this stage has been rea ched, the more advanced and less known methods should be then put into practise. Added functionality such as: * security features * help support * contact addresses Added security features is a must, and assures code quality to be evident wit hin a program. Use of secure functions and methodologies/implementations should at this stage make themselves known. This is where a gap between sloppy and aware programme rs becomes apparent. All programs should aim for a level of code quality by utilising th e secure function calls within their specified programming language which helps create a more reliable and flexible program. Of course one of the only certain ways to dete rmine a programs reliability is through testing. Testing focuses on the need for rapi d feedback and the evolving nature of the program under test, this is where clients/cust omers come into the picture. Public's goal as bug testers Although programmers bare the most responsibility in terms of code reliance, clients and customers alike need to be prepared to communicate with software engineers if a bug or flaw is observed in a program. If the expected output is different to what is given, it's time to get in contact by means of a bug discussion list, email, phone - what ever, but be sure to advise the correct people. Especially if the bug could lead to increa sed privileges, it most important to inform product vendors before the public kno w about it. This gives time for the vendors to write patches/advisories for their clients , before any harmful damage could be used against their products. Testing software is always a step in the right direction. Effective bug testi ng by customers/clients will force the programmer to improve code quality and secur ity in future products, that is why we must tolerate and thank the software task for ces out there, that make software vulnerability's public, such a bug advocacy is BUGT RAQ, http://www.securityfocus.com. When reporting a bug, always be sure you can reproduce it, always include det ailed descriptions of *exactly* how the bug was found, and the type of system that you tested the software application on. The more information the better, but be sure not to obscure of obfuscate the description - get as much as basic facts down as possible. I n particular segmentation faults generally cause core dumps (a memory image of the termina ted process when any of a variety of errors occur), hold vasts amounts of information for the programmer to locate where the bug took place. Remember full disclosure is bl iss. 4. Software Testing Strategies Developing a program or system effectively needs to be thoroughly thought out before any raw code is actually written down. One of the most important methods of estab lishing functional requirements is through a storyboard, as a means of a prototype. P rototypes may consist of a storyboard, which is a sequence and series of screens, showi ng the end-user a typical scenario of using the program/system. Functional prototypes This is one of the most useful methods for making sure the programmer underst ands just what a program is intended to do. A functional prototype is a very limited ve rsion of the final program, it gives some idea of the appearance of the final product, but with a lot of functions missing. Displaying a simple storyboard to a client or bug teste r is necessary, as they will be able to comment on whether the 'expected input' ta kes the 'observed output' resulting from running the program. This will also force th e programmer to think through many of the details of what the program is meant to do. Designing Test Sets Creating workable and effective sets of tests is intellectually challenging. Testing can almost never be exhaustive, and it may even be possible that not all programm ing flaws are evaluated even after very stringent testing has been covered. In the real ' commercial' world, a significant source of program defects is due to people r unning tests and not checking the results carefully. This means that the programmers actua lly run tests but do not take enough care in reviewing the results to see that the te sts showed unexpected flaws in the programs. Tests must be convincing and must demonstrate a successful performance of the program. In a commercial setting there are many methodologies used to produce a designed set of tests. One of the necessary tests that should be first evaluated is the main functio n of the program. This means deciding on a set of tests that enable you (the programme r) to see if the code achieves its desired outcome. All conditions of the program need to be undoubtedly checked, statements like : * case, loops, if then else structures * boundary conditions [Ex. The pseudocode: IF $i<100 THEN .. - make sure tha t 99,100,101 values for $i are properly dealt with] * exercise all parts of the code [ Ex. designing a rigorous set of tests ] Naturally sets of tests will assess the same parts of the program known as 'e quivalence partitioning' for tests, although this may seem duplicitous, it is standard o f economical testing. Perhaps part of the code works in one scenario, but not another - th is needs to be carefully checked. The first thing a programmer needs to understand is that testing will demonst rate the presence of bugs, but it will not demonstrate the absence of bugs. Semantic errors fall into this category, that is, errors in the logic of the program, that the com piler or interpreter is unable to help you with. Testings falls into two broad categories: * defect testing * acceptance testing Defect Testing This type of test tries to detect all the defects the program may have. All p arts of the program should be tested, and if the programmer feels that one part of the co de may not properly deal with unexpected input, more rigorous tests should be performed on that area of the code. One key point to remember from this is that "nobody knows a prog ram better, than the programmer himself" - the programmer will know the area of the progr am that is most likely defective, such that a designed set of tests should be practiced before a Beta release is produced. Stemming from defect testing is 'regression testing '. Regression testing is the process of testing changes within the programming e nvironment to programs to make sure that the older program still works with the new impl emented changes. Regression testing is a normal part of the program development proce ss and, in the commercial world is performed by code testing specialists. Test departmen t coders develop code test scenarios and exercises that will test new units of code a fter they have been written. These test cases form what becomes the test bucket. Before a new version of a software product is released, the old test cases are run against the new version to make sure that all the old capabilities still work. The reason the y might not work is because changing or adding new code to a program can easily introduce errors into code that is not intended to be changed, and thus will obscure test results. Recursive regression testing is a must ! Acceptance Testing In conjunction to defect testing is acceptance testing. This designed sets of tests means running an agreed set of sets with an agreed output. These should demonstrate that the code does an agreed task well enough for the programmer and client to be conv inced that the program performs the task well enough. In the commercial world, the accep tance tests are part of the contract for defining what the customer insists on before act ual monetary finance for the software has been transacted. Structural Prototyping Prototyping of this nature is relatively simple. Structural prototyping is a stripped down version of a program that will show a structure, in skeleton form, of th e complete version. All major aspects of the code are written but routines and sub progr ams are written only as stubs, that is comments/statements within the program that sh ow the programmer that the actual routine has been called or executed. Maintaining effective code that is easily interpreted by the programmer and o ther developers, and allows further extensions of the program with easy, follows t hree code cliche` characteristics: * understandibility * adaptibility * cohesion Understandibility means that programs that are easier to understand are consi dered to be better designed that ones that do the same task but are harder to understand. A key to developing stable code is a good functional prototype that allows the general idea of the program to be observed before code practise takes place. It may also be neces sary to note that better code is clear and neatly presented - that is spaced out where nec essary with comments throughout the program to let the reader understand what internal wo rking is going on. Adaptibility effectively means how easy it is to modify areas of the code to perform alternate tasks. This is directly linked to understandibility. The more under stand the code, the easier the adaptibility. Cohesion is a routine or sub program that does one clear task, apparent to th e reader and programmer. A well-defined task should give a clear indication of what the pr ogram is intended to do, this includes well chosen names for variable, constants, head ers etc. As small as this concept may seem, it allows any coder to pick up the source and be able to quickly scan through and understand what the program is about. Signs to observe Whether you are checking the source for bugs or testing the binary/executable file for presence of flaws all of the above tests need to be considered and exercised. It is most common that bugs present themselves in bounday structure conditions. When des igning a set of tests, it can not be stressed enough that boundaries need to be checked on either side of their 'walls'. Other recent flaws that should checked before releasing a b eta release of a product, is the current malpractice of dealing with format control bugs, such as %s. The programmer must employ capable input routines/parameters to correctly dea l with user supplied input, ensuring all possible scenarios have been considered before a dopting the most suitable code to perform the given command. This includes identifiers th emselves, such as avoiding use of getenv() , strcpy(), sprint() wherever possible, in e xchange for more secure methods like strncpy() or snprintf(); the 'n' refers to the numbe r of bytes allowed to be copied to a buffer. Avoid common mistakes often used by sloppy programmers to get user supplied environment variables from the terminal or environment. Establish your own method of setting or checking the environment make it unsusceptable to malformed data that could possible lead to unexpected outcomes, such as spawning a shel l - a definite security risk, one that is often observed in many UNIX environments. (Early ZGV [console graphics viewer] programs were always victim to getenv('HOME') probl ems, of this nature.) Another probability of using acceptance testing to expose bug flaws, is using the proper data set to be inputted to the program but sending extensive data to a partic ular input command, such as sending 1024 bytes to a 512 byte buffer, will cause an overf low, while the acceptance test of sending 256 bytes to the terminal would be deemed acce ptable, and will pass this test, the 1024 byte buffer would not. Sometimes when a program appears to have decreased it's efficiently level in terms of speed or processing of the actual data may be directly linked to a heap or st ack overflow, caused by corrupted data being entered. It is at this stage where v ital tests need to be conducted by the bug tester for the presence of bugs. Let's take a real life example of a program that I exposed with a flaw not lo ng ago. - WinSMTPD mailer/pop3d daemon. Version 1.06f and 2.X. After acceptance testing this program everything worked well. All the desired tasks of the program were fulfilled and the smptd/pop3d server performed their tasks e fficiently. Now, here is where defect testing comes in to play. Firstly to start an SMTP transaction, the client needs to send a 'HELO %s' ca ll, where the format string "%s" is your hostname. WinSTMPD only allows a fixed buffer of 170 bytes before the expected output becomes unexpected. So by sending 150 bytes after the HELO field, the program noticeably paused before proceeding to function as normal. This tells us one of two possibilities. 1. The program has been coded poorly in terms of speed, OR 2. The program does not deal with boundary tests, with exceeding data being e ntered. As it turns out WinSTMPD was vulnerable to a stack overflow, by sending 170+ bytes to the HELO field. The unexpected output for the program was: WINSMTP caused a general protection fault in module WINSMTP.EXE at 0003:00002359. Registers: EAX=461e0001 CS=42e7 EIP=00002359 EFLGS=00000246 EBX=00807fe0 SS=4207 ESP=00007e36 EBP=00004141 ECX=00010283 DS=4207 ESI=0000544c FS=05c7 EDX=58600000 ES=461e EDI=00001547 GS=0000 Bytes at CS:EIP: cb 49 73 49 63 6f 6e 69 63 00 00 58 4c 6f 63 00 Stack dump: 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 41414141 Obviously this isn't what the programmer had in mind when performing an SMTP transaction. The 41414141 that appears on the stack is "A" binary value, which I had fille d the buffer with. From this general protection fault, we as bug testers and programmers, are able to ascertain that this 16-bit program (judged by the leading 0's within the memo ry registers) have successfully overwritten the EBP register (+4 bytes for EIP), and as eth ical programmers/bug testers that's all we need to know to fix or patch this bug. If there were say, an unethical hacker out there, loading up the stack with malicious data could effectively allow arbitrary code to be executed from the stack, and anything is possible from there. This is why it is important to test for bugs, and especially chec k the boundaries and data that is allowed to be inputted by the client/user. Although I approve of people writing 'proof of concept' exploits to expose th e existence of a bug in a program, as I am a firm believer in full disclosure and vouchee for open source, it not ethical or urged to run these scripts without the direct conse nt of those people(s) you are exploiting. (POC exploits are necessary in whitehat hacker security firms to prove and demonstrate a code flaw.) Data sets and tests computed to the program/system are effectively system cal ls executed by active processes. These include different kinds of programs (Ex. programs that run as daemons a nd those that do not), programs that vary widely in their size and complexity, and dif ferent purposes of programs. Spawns or fork()'s by applications are therefore tested when the maximum process limit is exhausted by various resource-depleting exploits, th is too needs to be prepared for when making a heavily used program. Normal computed data c an be "synthetic" or "live". Synthetic traces are collected in production environme nts by running a prepared script, often called a driver program; the program options are chosen solely for the purpose of exercising the program (acceptance testing), and no t to meet any real user's requests. Live normal data traces of programs are computed du ring normal usage of a production computer system (manual specificities of code testing; boundary testing). Both these methods are often put to test when processing en-mass so ftware applications. 5. Bug Discovery ? So, you think you've found a bug ? then read on, here's what to do next. Alerting the vendor If the client or user has somehow stumbled on a logical error, or security vu lenrability with in the tested (beta/stable) product, it is then necessary to inform the bug immediately to the vendor. More of this criterion was discussed in the 'Devel opment Goals' subtopic, but visually displaying a practical advisory was not. The bu g report should include most, if not all of the following information, generally in br ief conceptual form. * bug synopsis (brief paragragh explaining the vulnerability) * description (the sequential steps taken to produce the proposed bug) * attachments (any revelant materials, such as: core dumps, message logs) * environment (system specifications and conditions used to test the bug) * contact info (how the vendor can contact you for further comments/queries ) Alerting clients If the proposed bug has been accepted by the vendors as being a risk or vulne rability that could lead to such things as network/software penetration, increased pri vileges, excessive system resource usage, the vendor should then issue their own advis ory publically, through use contact by mailing-lists, the vendor's URL, and/or by e-mail. It is now the responsbility of the programmer/manufacturer to maintain sure f ire advice for the client to patch their software/system so the vulnerability bec omes non-existant. The Advisory after such an event has occured, should include th e following information: * Date (date of advisory release) * Affected systems (listing of the environment/setting in which the bug may occur) * Description (similar to clients description, but with more technical insid e info.) * Patch (URL of patch or description of how to correct the bug) * Contact (how clients can contact the vendor for more info, phone, e-mail, URL.) Having the above communicational link creates a much more friendly atmosphere between users and vendors, which in effect helps forward software development into be coming a more stable and reliable community - one that excels in safe security practices. 6. Final Note I made a generic resource kit named reskit.tgz earlier this year. Basically t hese are just 7 skeletal template scripts coded in perl for various purposes of testin g network services on a Linux/Unix environment; such as malformed HTTP 'GET' requests, multiple thread connections, random data streaming, ICMP error generator etc. Mainly U sed as a research and development kit to help spot bugs more easily, particularly on s erver/router applications/software, feel free to expand on them. These scripts can be downloaded in tarball form from: http://dethy.synnergy.n et/reskit.tar Comments Main editorial by dethy [ dethy@synnergy.net | www.synnergy.net ]