23. dec. 2004

ONJava.com: The Hidden Gems of Jakarta Commons

If you are not familiar with the Jakarta Commons, you have likely reinvented a few wheels.

Before you write any more generic frameworks or utilities, grok the Commons. It will save you serious time. Too many people write a StringUtils class that duplicates methods available in.
This article gives some examples and tips.

21. dec. 2004

TheServerSide.com - Struts Live Chapter: Nested POJOs

The new chapter will include a solution to one of the most frequently heard complaints about Struts: that it doesn't provide a simple way to meaningfully nest POJO graphs in ActionForms. The provided example code solves this problem and includes an integrated solution for automating conversion, formatting, and validation.

Read the article in PDF here:Struts Live Chapter: Nested POJOs.

20. dec. 2004

Yet Another Servlet Filter You Can't Live Without - Cache control

The Cubicle Jockey: Yet Another Servlet Filter You Can't Live Without
At work, we were looking for ways to reduce bandwidth usage on our secure applications. We already have the compression filter, various object caches, etc-- everything you read about. Then we started looking into caveats with using SSL and found something surprising-- HTTP 1.1 SSL usage prevents ANY content from being cached on the user's drive. "So you mean those massive JavaScript and Stylesheet files are getting pulled down with each secure page?" Yes, images too-- multiple times on a single page even.

The solution? Yet another Servlet Filter you can't live without that will be bound to your static content such as Javascript, CSS, and image files. The code for the filter is only a couple lines, but the goal is to write out the 'Cache-Control' header and the 'Expires' header.


long durationSeconds = 28800; // 8 hours

response.setDateHeader("Expires",

          System.currentTimeMillis()

          + (duractionSeconds * 1000));

response.setHeader("Cache-Control",

          "public, max-age="

          + durationSeconds);

chain.doFilter(request, response);

I couldn't believe how much bandwidth we are now saving with our SSL applications.

Read more here.

Write custom appenders for log4j

Write custom appenders for log4j.

Extend log4j to support lightweight over-the-network logging.

The Apache Software Foundation's log4j logging library is one of the better logging systems around. It's both easier to use and more flexible than Java's built-in logging system. This article shows you how to extend log4j with a custom "appender," the part of the system that actually writes the logging messages. This article also provides a simple example of Java's socket APIs and multithreading capabilities.

1. dec. 2004

Unit Testing with Mock Objects : What are Mock Objects?

Unit Testing with Mock Objects : Shine Technologies.

This page and its children talk about unit testing with the Mock Objects pattern. In particular, it covers what Mock Objects are, what tools the author (Ben Teese) have used, and important lessons he has learnt along the way.

The pages is about:

What Are Mock Objects?
DynaMock - A Mock Object Testing Framework
Experiences With Mock Objects
Conclusions on Unit Testing with Mock Objects

26. nov. 2004

Usable GUI Design: A Quick Guide

In Usable GUI Design: A Quick Guide, the author has 5 points in writing good GUI's.
It's supplemented with good examples.

The points:

The user is not using your application

use

Fitt's Law

Unnecessary interference

Use the power of the computer

Make items easy to distinguish and find

New Community: community.java.net - Portlet

New Community: community.java.net - Portlet.

If J2EE based portals, JSR 168 or WSRP mean anything to you, you have come to the right place. This is a gathering of developers and technical experts working on Portals and related technologies. Here you will find open source projects, articles, tips, news, product announcements, blogs and FAQs. This community is also dedicated to creating a repository of open source and free JSR 168 compliant portlets that can be used on any J2EE portal server available in the market today. This is a great place to obtain portlets, learn, discuss, share knowledge and publicize your work.

25. nov. 2004

AOP: Aspect Oriented Programming - Introduction

AOP (Aspect oriented Programming) enables clean modularization of crosscutting concerns, such as error checking and handling, synchronization, context-sensitive behavior, performance optimizations, monitoring and logging, debugging support, and multi-object protocols

When Object-Oriented (OO) programming entered the mainstream of software development, it had a dramatic effect on how software was developed. Developers could visualize systems as groups of entities and the interaction between those entities, which allowed them to tackle larger, more complicated systems and develop them in less time than ever before. The only problem with OO programming is that it is essentially static, and a change in requirements can have a profound impact on development timelines.

Aspect-Oriented Programming (AOP) complements OO programming by allowing the developer to dynamically modify the static OO model to create a system that can grow to meet new requirements. Just as objects in the real world can change their states during their lifecycles, an application can adopt new characteristics as it develops.

References:

Introduction to Aspect-Oriented Programming

I want my AOP! Part 1

Compiling an AspectJ Project Using Eclipse

Chapter 17 in AspectJ Cookbook

Eclipse with integrated AspectJ

Compiling an AspectJ Project Using Eclipse

24. nov. 2004

Animation: Totally Gridbag - Fun animation about java gridbags

madbean.com: Totally Gridbag

A fun day of coding with gridbags.

This is a funny animation about a man coding gridbaglayout.

Windows Tips & tricks: Defragmenting Your Pagefile

WindowsDevCenter.com: Defragmenting Your Pagefile: "Defragmenting your hard drive regularly is an important part of general system maintenance for Windows XP machines, and the Disk Defragmenter tool in Computer Management lets you do this easily. (You can also get to the Disk Defragmenter tool by choosing Control Panel-->Performance and Maintenance-->Rearrange items on your hard disk to make programs run faster.) But what about defragmenting your pagefile?"
This article tells you how.

The way to do it, is by removing your pagefile, defragment your hardisk, and then add the pagefile again.

Link: Effective Java - Tips & tricks - What about equals()?

Effective Java: "Joshua Bloch's book Effective Java Programming Language Guide" is about tips and tricks about java.

This book i really recommend.

The only thing is, that the implementation of equals should be made different.

In stead of:


public boolean equals(Object o) {

    if (! (o instanceof MyObject)) {

      return false;

    }

    MyObject other = (MyObject) o;

    return (value == null ? other.value == null : 

        value.equals(other.value));

}

Use:


public boolean equals(Object o) {

    if ((o == null) || (this.getClass() != o.getClass())) {

        return false;

    }

    MyObject other = (MyObject) o;

    return (value == null ? other.value == null :

        value.equals(other.value));

}

The reason to use getClass() instead of instanceof, is the problem when you have inheritance.
The equals must be symmetric, which means if a.equals(b) then b.equals(c) must be true.

Example:


... Class A...

  public boolean equalsFalse(Object o) {

    if (! (o instanceof A)) {

      return false;

    }

    ...

  }

}





... Class B extends A...

  public boolean equalsFalse(Object o) {

    if (! (o instanceof B)) {

      return false;

    }

    ...

  }

}

In this case this would happen:


	A a = new A();

	B b = new B();

	

	a.equals(b); <- would return true

	b.equals(a); <- would return false

If you use getClass() you will instead get:


	a.equals(b); <- would return false

	b.equals(a); <- would return false. 

        Which means the symmetric isn't broke!

I've made some tests that shows that getClass() takes about 7% longer time to evaluate than instanceof.
But this is a price I would like to pay to be sure that I don't have any problems with inheritance.

Link: Sikkerhedsforum.dk :: Security debate (danish website)

Security debate in danish.
Sikkerhedsforum.dk :: Seriøs debat om sikkerhed.

Download: Check your computer for security-leaks (remote-exploit.org)

At remote-exploit.org you can download a Cd with an Auditor security collection which contains a bootable Linux distribution and has many programs to test Foot-printing, analysis, scanning, wireless, brute-forcing and cracking.

Download the CD here: http://www.remote-exploit.org/content/mirrors.html

22. nov. 2004

Trace SQL: SQL Logging and tracing.

IronGrid - home_page: "IronEye SQL is the first step in troubleshooting JDBC performance problems. An ideal introduction to performance testing, this intuitive tool monitors the time it takes for SQL queries to pass between a Java application and the database without changing any source code."

MacDevCenter.com: Write a Webserver in 100 Lines of Code or Less

Write a Webserver in 100 Lines of Code or Less by Jonathan Johnson -- REAL Software programmer and tester, Jonathan Johnson, shows you the power and simplicity of developing with REALbasic by walking you through the building of a working webserver. After this tutorial, you'll not only have a pratical knowledge or REALbasic, but you'll have a cool little server too.

(The Response header: HTTP_VERSION + Space + STATUS + Space + MESSAGE + CRLF + CRLF.
Ex.: Write "HTTP/1.1 " + format( statusCode, "000" ) + " " + _
response + chr(13) + chr(10) + chr(13) + chr(10))

18. nov. 2004

Blog of Anders Holmbech Brandt

My blogs: "Blog of Anders Holmbech Brandt".

Anders is freelance consultant and are currently hired though ProData to work for TDC Kabel TV.

xHTML/stylesheet generation slideshow like powerpoint

S5: An Introduction.
xHTML/stylesheet generation slideshow like powerpoint.

Works with all browsers.

16. nov. 2004

Techworld.com - What is encryption?

Techworld.com - What is encryption?: "What is encryption?
What's the difference between a cipher and a code? Encryption schemes explained."

15. nov. 2004

Test: Automate GUI tests for Swing applications

Automate GUI tests for Swing applications: "Transition from unit tests to acceptance tests"

12. nov. 2004

Download: IzPack, Java Installer

IzPack homepage is a java installer.

All you need is a JRE and then you're running.

Download: MSN Messenger 7.0.0332

WindowsBeta.Net - Download: MSN Messenger 7.0.0332.

(This is only a betaversion).

AntiVirus: New version from Grisoft Freeweb.

New version of the free antivirus from Grisoft.
Grisoft AVG Free Edition version 7.

DivX codec - New alpha release of DivX Pro Plasma Coded

DivX codec - New alpha release of DivX Pro Plasma Coded

The new codec can compress 8 to 34 percent of the DivX Pro 5.2.1-codec.

Link: Java Programming Notes

Collection of useful javas-stuff.
Java Programming Notes.

Example: How to read from a file. (With sourcecode.)

Link: JDocs.com - Your javadocs super-center

Huge collection og APIs, which are fully indexed and searchable.
JDocs.com - Your javadocs super-center.

10. nov. 2004

Create intelligent Web spiders

URL: Create intelligent Web spiders and Full article

Create intelligent Web spiders

How to use Java network objects and HTML objects

Summary

Have you ever wanted to create your own database of Websites that meet specific criteria? Web spiders, sometimes referred to as Web crawlers, are programs that follow Web links from one site to another, examining content and recording locations. Commercial search sites use Web spiders to populate their databases; researchers can use spiders to find relevant information. Creating your own spider allows you to control the search for content, domains, and Webpage characteristics, such as text density and embedded multimedia content. This article shows you how to create your own powerful Web spider in Java using Java HTML and network classes. (2,300 words; November 1, 2004)

By Mark O. Pendergast

Printer-friendly version |

Mail this to a friend

his article demonstrates how to create an intelligent Web spider based on standard Java network objects. The heart of this spider is a recursive routine that can perform depth-first Web searches based on keyword/phrase criteria and Webpage characteristics. Search progress displays graphically using a JTree structure. I address issues such as resolving relative URLs, avoiding reference loops, and monitoring memory/stack usage. In addition, I demonstrate the proper use of Java network objects used in accessing and parsing remote Webpages.

Spider demonstration program

The demonstration program consists of the user interface class SpiderControl; the Web-searching class Spider; the two classes used to build a JTree showing the results, UrlTreeNode and UrlNodeRenderer; and two classes to help verify integer input into the user interface, IntegerVerifier and VerifierListener. See Resources for a link to the full source code and documentation.

The SpiderControl interface is composed of three tabs, one to set the search parameters, another to display the resulting search tree (JTree), and a third to display error and status messagesâ€”see Figure 1.

Figure 1. Search parameters tab. Click on thumbnail to view full-sized image.

Search parameters include the maximum number of sites to visit, the search's maximum depth (links to links to links), a list of keywords/phrases, the root-level domains to search, and the starting Website or portal. Once the user has entered the search parameters and pressed the Start button, the Web search will start, and the second tab (Figure 2) displays to show the search's progress.

Figure 2. Search tree. Click on thumbnail to view full-sized image.

An instance of the Spider class running in a separate thread conducts the Web search. Separate threads are used so that the SpiderControl module can continually update the search tree's display and process the Stop Search button. As the Spider runs, it continually adds nodes (UrlTreeNode) to the JTree displayed in the second tab. Search tree nodes that contain keywords and phrases appear in blue (UrlNodeRenderer).

When the search completes, the user can view the site's vital statistics and view the site itself in an external Web browser (the program defaults to Internet Explorer, located in the Program Files folder). The vital statistics include the keywords present, total text characters, total images, and total links.

The Spider class

The Spider class is responsible for searching the Web given a starting point (portal), a list of keywords and domains, and limits on the search's depth and size. Spider inherits Thread so it can run in a separate thread. This allows the SpiderControl module to continually update the search tree's display and process the Stop Search button.

The constructor method is passed the search parameters along with a reference to an empty JTree and an empty JTextArea. The JTree is used to create a hierarchical record of the sites visited as the search progress. This provides visual feedback to the user and helps the Spider track where it has been to prevent circular searches. The JTextArea posts error and progress messages.

The constructor stores its parameters in class variables and initializes the JTree to render nodes using the UrlNodeRenderer class. The search will not start until SpiderControl calls the run() method.

The run() method starts execution in a separate thread. It first determines whether the portal site is a Web reference (starting with http, ftp, or www) or a local file reference. It then ensures the portal site has the proper notation, resets the run statistics, and calls searchWeb() to begin the search:







      public void run()





     {





       DefaultTreeModel treeModel = (DefaultTreeModel)searchTree.getModel(); // get our model





       DefaultMutableTreeNode root = (DefaultMutableTreeNode)treeModel.getRoot();





       String urllc = startSite.toLowerCase();





       if(!urllc.startsWith("http://") && !urllc.startsWith("ftp://") &&





            !urllc.startsWith("www."))





         {





          startSite = "file:///"+startSite;   // Note you must have 3 slashes !





         }





         else // Http missing ?





          if(urllc.startsWith("www."))





          {





            startSite = "http://"+startSite; // Tack on http://  





          }





         





        startSite = startSite.replace('\\', '/'); // Fix bad slashes





   





       sitesFound = 0;





       sitesSearched = 0;





       updateStats();





       searchWeb(root,startSite); // Search the Web





       messageArea.append("Done!\n\n");





     }

searchWeb() is a recursive method that accepts as parameters a parent node in the search tree and a Web address to search. searchWeb() first verifies that the given Website has not already been visited and that depth and site limits have not been exceeded. searchWeb() then yields to allow the SpiderControl thread to run (updating the screen and checking for Stop Search button presses). If all is in order, searchWeb() continues, if not, it returns.

Before searchWeb() begins reading and parsing the Website, it first verifies that the site is of the proper type and domain by creating a URL object based on the Website. The URL's protocol is checked to ensure it is either an HTML address or a file address (no need to search for "mailto:" and other protocols). Then the file extension (if present) is checked to ensure that it is an HTML file (no need to parse pdf or gif files). Once that is done, the domain is checked against the list specified by the user with the isDomainOk() method:







 ...URL url = new URL(urlstr); // Create the URL object from a string.











   String protocol = url.getProtocol(); // Ask the URL for its protocol





   if(!protocol.equalsIgnoreCase("http") && !protocol.equalsIgnoreCase("file"))





   {





      messageArea.append("    Skipping : "+urlstr+" not a http site\n\n");





      return;





   }











   String path = url.getPath();  // Ask the URL for its path





   int lastdot = path.lastIndexOf("."); // Check for file extension





   if(lastdot > 0)





   {





      String extension = path.substring(lastdot);  // Just the file extension





      if(!extension.equalsIgnoreCase(".html") && !extension.equalsIgnoreCase(".htm"))





      return;  // Skip everything but html files





   }











   if(!isDomainOk(url))





   {





      messageArea.append("    Skipping : "+urlstr+" not in domain list\n\n");





      return;





   }

At this point, searchWeb() is fairly certain it has a URL worth searching, so it creates a new node for the search tree, adds it to the tree, opens an input stream, and parses the file. The following sections provide more details on parsing HTML files, resolving relative URLs, and controlling recursion.

Parsing HTML files

There are two ways to parse (pick apart) an HTML file to look for the A HREF = tagsâ€”a hard way and an easy way.

If you choose the hard way, you would create your own parsing algorithm using Java's StreamTokenizer class. With this technique, you'd have to specify the word and white-space characters for the StreamTokenizer object, then pick off the < and > symbols to find the tags, the attributes, and separate the text between tags. A lot of work.

The easy way is to use the built-in ParserDelegator class, a subclass of the HTMLEditorKit.Parser abstract class. These classes are not well documented in the Java documentation. Using ParserDelegator is a three-step process. First, create an InputStreamReader object from your URL; then, create an instance of a ParserCallback object; finally, create an instance of the ParserDelegator object and call its one public method parse():







    UrlTreeNode newnode = new UrlTreeNode(url); // Create the data node 





   InputStream in = url.openStream(); // Ask the URL object to create an input stream





   InputStreamReader isr = new InputStreamReader(in); // Convert the stream to a reader





   DefaultMutableTreeNode treenode = addNode(parentnode, newnode);  





   SpiderParserCallback cb = new SpiderParserCallback(treenode); // Create a callback object





   ParserDelegator pd = new ParserDelegator(); // Create the delegator





   pd.parse(isr,cb,true); // Parse the stream





   isr.close();  // Close the stream

parse() is passed an InputStreamReader, an instance of a ParseCallback object, and a flag for specifying whether the CharSet tags should be ignored. parse() then reads and decodes the HTML file, calling methods in the ParserCallback object each time it has completely decoded a tag or HTML element.

In the demonstration code, I implemented my ParserCallback as an inner class of Spider. Doing so allows ParseCallback to access Spider's methods and variables. Classes based on ParserCallback can override the following methods:

handleStartTag(): Called when a starting HTML tag is encountered, e.g., >A <
handleEndTag(): Called when an end HTML tag is encountered, e.g., >/A<
handleSimpleTag(): Called when a HTML tag that has no matching end tag is encountered
handleText(): Called when text between tags is encountered

In the demonstration program, I overrode the handleSimpleTag(), handleStartTag(), handleEndTag(), and handleTextTag() methods.

I overrode handleSimpleTag() so that my code can process HTML BASE and IMG tags. BASE tags tell what URL to use when resolving relative URL references. If no BASE tag is present, then the current URL is used to resolve relative references. handleSimpleTag() is passed three parameters, an HTML.Tag object, a MutableAttributeSet that contains all the tag's attributes, and relative position within the file. My code checks the tag to see if it is a BASE object instance; if it is, then the HREF attribute is retrieved and stored in the page's data node. This attribute is used later when resolving URL addresses to linked Websites. Each time an IMG tag is encountered, that page's image count is updated.

I overrode handleStartTag() so that the program can process HTML A and TITLE tags. The method tests to see if the t parameter is in fact an A tag, if it is, then the HREF attribute is retrieved.

fixHref() is called to clean up sloppy references (changes back slashes to forward slashes, adds missing final slashes). The link's URL is resolved by creating a URL object instance using the base URL and the one referenced. Then, a recursive call to searchWeb() processes this link. If the method encounters a TITLE tag, it clears the variable storing the last text encountered so that the title's end tag is assured of having the proper value (sometimes, a Webpage will have title tags with no title between them).

I overrode handleEndTag() so the HTML TITLE end tags can be processed. This end tag indicates that the previous text (stored in lastText) is the page's title text. This text is then stored in the page's data node. Since adding the title information to the data node will change the display of the data node in the tree, the nodeChanged() method must be called so the tree can adjust its layout.

I overrode handleText() so that the HTML page's text can be checked for any of the keywords or phrases being searched. handleText() is passed an array of characters and the position of the characters within the file. handleText() first converts the character array to a String object, converting to all uppercase in the process. Then each keyword/phrase in the search list is checked against the String object using the indexOf() method. If indexOf() returns a non-negative result, then the keyword/phrase is present in the page's text. If the keyword/phrase is present, the match is recorded in the node's match list and run statistics are updated:







  public class SpiderParserCallback extends HTMLEditorKit.ParserCallback {



/**





  * Inner class used to html handle parser callbacks





  */





  public class SpiderParserCallback extends HTMLEditorKit.ParserCallback {





      /** URL node being parsed */





      private UrlTreeNode node;





      /** Tree node */





      private DefaultMutableTreeNode treenode;





      /** Contents of last text element */





      private String lastText = "";





      /**





       * Creates a new instance of SpiderParserCallback





       * @param atreenode search tree node that is being parsed





       */





     public SpiderParserCallback(DefaultMutableTreeNode atreenode) {





            treenode = atreenode;





            node = (UrlTreeNode)treenode.getUserObject();





     }





     /**





      *  Handle HTML tags that don't have a start and end tag





      * @param t HTML tag





      * @param a HTML attributes





      * @param pos Position within file





      */ 





     public void handleSimpleTag(HTML.Tag t,





                             MutableAttributeSet a,





                             int pos)





     {





       if(t.equals(HTML.Tag.IMG))





       {





         node.addImages(1);





         return;





       }





        if(t.equals(HTML.Tag.BASE))





       {





         Object value = a.getAttribute(HTML.Attribute.HREF);





         if(value != null)





          node.setBase(fixHref(value.toString())); 











       }    





         





     }





     /**





      *  Take care of start tags





      * @param t HTML tag





      * @param a HTML attributes





      * @param pos Position within file





      */





      public void handleStartTag(HTML.Tag t,





                             MutableAttributeSet a,





                             int pos)





     {





        if(t.equals(HTML.Tag.TITLE))





       {





         lastText="";





         return;





       }





       if(t.equals(HTML.Tag.A))





       {





         Object value = a.getAttribute(HTML.Attribute.HREF);





         if(value != null)





         {





          node.addLinks(1); 





          String href = value.toString();





          href = fixHref(href);





          try{





            URL referencedURL = new URL(node.getBase(),href);





            searchWeb(treenode, referencedURL.getProtocol()+"://"+referencedURL.getHost()+referencedURL.getPath());





          }





          catch (MalformedURLException e)





          {





            messageArea.append("    Bad URL encountered : "+href+"\n\n");   





            return;  





          }





         }





       }





          





     }





      





      /**





      *  Take care of start tags





      * @param t HTML tag





      * @param pos Position within file





      */





      public void handleEndTag(HTML.Tag t,





                                int pos)





     {





       if(t.equals(HTML.Tag.TITLE) && lastText != null)





       {





         node.setTitle(lastText.trim());





         DefaultTreeModel tm = (DefaultTreeModel)searchTree.getModel();





         tm.nodeChanged(treenode);





        }





          





     }





      /**





       * Take care of text between tags, check against keyword list for matches, if





       * match found, set the node match status to true





       * @param data Text between tags





       * @param pos position of text within Webpage





       */





      public void handleText(char[] data, int pos)





      {





        lastText = new String(data);





        node.addChars(lastText.length());





        String text = lastText.toUpperCase();





        for(int i = 0; i < keywordList.length; i++)





        {





          if(text.indexOf(keywordList[i]) >= 0)





          {





            if(!node.isMatch())





            {





             sitesFound++;





             updateStats();





            }





            node.setMatch(keywordList[i]); 





            return;





          }





        }





      }





      





   





  }

Resolving and repairing URLS

When relative links to Webpages are encountered, you must build complete links based on their base URLs. Base URLs can be explicitly defined in a Webpage via the BASE tag or implicitly defined as the URL of the page holding the link. The Java URL object provides a constructor that handles the resolution for you, providing you give it links structured to its liking.

URL(URL context, String spec) accepts the link in the spec parameter and the base URL in the context parameter. If spec is a relative link, the constructor will create a URL object using context to build the complete reference. URL prefers all URL specifications to be in the strict (Unix) format. Use of backslashesâ€”which is allowed in the Microsoft Windows worldâ€”instead of forward slashes, result in bad references. Also, if spec or context refers to a directory (containing index.html or default.html), instead of an HTML file, it must have a final slash. The method fixHref() checks for these sloppy references and fixes them:







    public static String fixHref(String href)





   {





      String newhref = href.replace('\\', '/'); // Fix sloppy Web references





      int lastdot = newhref.lastIndexOf('.');





      int lastslash = newhref.lastIndexOf('/');





      if(lastslash > lastdot)





      {





      if(newhref.charAt(newhref.length()-1) != '/')





         newhref = newhref+"/";  // Add missing /





      }





    





      return newhref;     





       





   }

Controlling recursion

searchWeb() is initially called to search the starting Web address specified by the user. It then calls itself whenever an HTML link is found. This forms the basis of the "depth-first" search and can lead to two types of problems. First, and most critical, memory/stack overflow problems can result due to too many recursive calls. These will occur if there is a circular Web reference; that is, one Webpage links to another that links back to the firstâ€”a common occurrence in the World Wide Web. To prevent this, searchWeb() checks the search tree (via the urlHasBeenVisited() method) to see if the referenced page already exists. If it does, then the link is ignored. If you choose to implement a spider without a search tree, you still must maintain a list of sites visited (either in a Vector or array) so that you can determine if you are revisiting a site.

The second problem with recursion results from the nature of a depth-first search and the World Wide Web's structure. Depending on the chosen portal, a depth-first search could result in hundreds of recursive calls before the original link on the original page is completely processed. This has two undesirable consequences: first, memory/stack overflow could occur, and second, the pages being searched may be too far removed from the original portal to give meaningful results. To control this, I added a "maximum search depth" setting to the spider. The user may select how deep the number of levels can go (links to links to links); as each link is entered, the current depth is checked via a call to the depthLimitExceeded() method. If the limit is exceeded, the link is ignored. This test merely checks the level of the node in the JTree.

The demonstration program also adds a site limit, specified by the user, that can stop the search after the specified number of URLs has been examined, thus ensuring the program will eventually stop! The site limit is controlled via a simple integer counter sitesSearched that is updated and checked after each call to searchWeb().

UrlTreeNode and UrlNodeRenderer

UrlTreeNode and UrlNodeRenderer are classes for creating custom tree nodes to add to the JTree in the SpiderControl user interface. UrlTreeNode contains the statistics and URL information for each searched Website. The UrlTreeNode is stored in the JTree as the "user object" attribute of the standard DefaultMutableTreeNode objects. The data includes the ability to track keywords present in the node, the node's URL, the node's base URL, the number of links, images, and text characters, and whether the node matches the search criteria.

UrlTreeNodeRenderer is an implementation of the DefaultTreeCellRenderer interface. UrlTreeNodeRenderer causes the node to display in blue text if the node contains matching keywords. UrlTreeNodeRenderer also incorporates a custom icon for the JTreeNodes. Custom display is achieved by overriding the getTreeCellRendererComponent() method (see below). This method creates a Component object to display in the tree. Most of Component's attributes are set by the superclass; UrlTreeNodeRenderer changes the text color (foreground) and icons:







    public Component getTreeCellRendererComponent(





                          JTree tree,





                          Object value,





                          boolean sel,





                          boolean expanded,





                          boolean leaf,





                          int row,





                          boolean hasFocus) {





  





          super.getTreeCellRendererComponent(





                          tree, value, sel,





                          expanded, leaf, row,





                          hasFocus);





          





          UrlTreeNode node = (UrlTreeNode)(((DefaultMutableTreeNode)value).getUserObject());





          if (node.isMatch()) // Set color





              setForeground(Color.blue);





           else 





              setForeground(Color.black);





         





          if(icon != null)    // Set a custom icon





          {





              setOpenIcon(icon);





              setClosedIcon(icon);





              setLeafIcon(icon);





          }





          





  





          return this;





    }

Summary

This article has shown you how to create a Web spider and a user interface to control it. The user interface employs a JTree to track the spider's progress and record the sites visited. Alternatively, you could use a Vector to record the sites visited and display the progress using a simple counter. Other enhancements could include an interface to a database to record keywords and sites, adding the ability to search through multiple portals, screening sites with too much or too little text content, and giving the search engine synonym-search capabilities.

The Spider class shown in this article uses recursive calls to a search procedure. Alternatively, a separate thread with a new spider could be launched for each link encountered. This has the benefit of allowing connections to remote URLs to occur concurrently, speeding execution. However, note that some JTree objects, namely DefaultMutableTreeNode, are not thread-safe, and programmers must perform their own synchronization.

Join the discussion about this article	Click Here To Add Your Comment
Create intelligent Web spiders	JavaWorld	10/30/04 09:53 PM

Printer-friendly version |

Mail this to a friend

About the author

Dr. Mark Pendergast is an associate professor at Florida Gulf Coast University. Pendergast received an M.S. and Ph.D. from the University of Arizona and a B.S.E. in electrical computer engineering from the University of Michigan. He has worked as an engineer for Control Data Corporation, Harris Corporation, Ventana Corporation, and has taught at the University of Florida and Washington State University. His works have appeared in books, journals, and he has presented his work at numerous conferences. He is a member of the ACM and IEEE computer societies.

Resources

Download the source code and Java documentation that accompanies this article:
http://www.javaworld.com/javaworld/jw-11-2004/spider/jw-1101-spider.zip
Sun's tutorial on how to use JTrees:
http://java.sun.com/docs/books/tutorial/uiswing/components/tree.html
Sun's tutorial on how to work with URL network objects:
http://java.sun.com/docs/books/tutorial/networking/urls/index.html
Basic documentation on the Java classes used in this program:
http://java.sun.com/j2se/1.4.2/docs/index.html
The Open Directory Project provides a list of useful URLs on HTML coding and formats:
http://dmoz.org/Computers/Data_Formats/Markup_Languages/HTML/Reference/
Lee Underwood's excellent article on the many uses and abuses of Web spider technology:
http://www.webreference.com/authoring/robots/
Also by Dr. Mark Pendergast: "Navigate Through Virtual Worlds Using Java 3D" (JavaWorld, July 2003):
http://www.javaworld.com/javaworld/jw-07-2003/jw-0704-3d.html
For more articles on putting your Java skills to work, search the articles in the Applied Java section of JavaWorld's Topical Index:
http://www.javaworld.com/channel_content/jw-applied-index.shtml

Using and Programming Generics in J2SE 5.0

URL: Using and Programming Generics in J2SE 5.0 and
Full article

The Need for Generics

The motivation for adding generics to the Java programming language stems from the fact that a collection doesn't contain information about the element type, the need to keep track of what type of elements collections contain, and the need for casts all over the place. Using generics, a collection is no longer treated as a list of Object references, but you would be able to differentiate between a collection of references to Integers and collection of references to Bytes. A collection with a generic type has a type parameter that specifies the element type to be stored in the collection.

As an example, consider the following segment of code that creates a linked list and adds an element to the list:


LinkedList list = new LinkedList();

list.add(new Integer(1));

Integer num = (Integer) list.get(1);

As you can see, when an element is extracted from the list it must be cast. The casting is safe as it will be checked at runtime, but if you cast to a type that is different from, and not a supertype of, the extracted type then a runtime exception, ClassCastException will be thrown.

Using generic types, the previous segment of code can be written as follows:


LinkedList list = new LinkedList();

list.add(new Integer(1));

Integer num = list.get(1);

Here we say that LinkedList is a generic class that takes a type parameter, Integer in this case.

As you can see, you no longer need to cast to an Integer since the get() method would return a reference to an object of a specific type (Integer in this case). If you were to assign an extracted element to a different type, the error would be at compile-time instead of run-time. This early static checking increases the type safety of the Java language.

To reduce the clutter, the above example can be rewritten as follows...using autoboxing:


LinkedList list = new LinkedList();

list.add(1);

int num = list.get(1);

As a complete example, consider the following class, Ex1, which creates a collection of two Strings and one Integer, and then prints out the collection:

Ex1.java


import java.util.*;



public class Ex1 {



  private void testCollection() {

    List list = new ArrayList();

    list.add(new String("Hello world!"));

    list.add(new String("Good bye!"));

    list.add(new Integer(95));

    printCollection(list);

  }



  private void printCollection(Collection c) {

    Iterator i = c.iterator();

    while(i.hasNext()) {

      String item = (String) i.next();

      System.out.println("Item: "+item);

    }

  }



  public static void main(String argv[]) {

    Ex1 e = new Ex1();

    e.testCollection();

  }

}

Again, an explicit cast is required in the printCollection method. This class compiles fine, but throws a CLassCastException at runtime as it attempts to cast an Integer to a String:


Item: Hello world!

Item: Good bye!

Exception in thread "main" java.lang.ClassCastException: java.lang.Integer

        at Ex1.printCollection(Ex1.java:16)

        at Ex1.testCollection(Ex1.java:10)

        at Ex1.main(Ex1.java:23)

Using Generics

Using generics, the Ex1 class above can be written as follows:

Ex2.java


import java.util.*;



public class Ex2 {



  private void testCollection() {

    List<String> list = new ArrayList<String>();

    list.add(new String("Hello world!"));

    list.add(new String("Good bye!"));

    list.add(new Integer(95));

    printCollection(list);

  }



  private void printCollection(Collection c) {

    Iterator<String> i = c.iterator();

    while(i.hasNext()) {

      System.out.println("Item: "+i.next());

    }

  }



  public static void main(String argv[]) {

    Ex2 e = new Ex2();

    e.testCollection();

  }

}

Now, if you try to compile this code, a compile-time error will be produced informing you that you cannot add an Integer to a collection of Strings. Therefore, generics enable more compile-time type checking and therefore mismatch errors are caught at compile-time rather than at run-time.
... See more in the full article ...

Dear Manager, They Need a Build Machine

URL: Dear Manager, They Need a Build Machine and Full article

Dear Manager,

When I talk with the developers on your team, they tell me they need a dedicated build machine. I'm partly to blame. See, I've been showing them a free build scheduler that, after just a few minutes of configuration, will continuously build your software with no human intervention. That's good for them, and it's even better for you. Let me tell you why.

If you build and test your software once an hour, no problem is more than an hour old.

This makes it easier to find and fix problems, which saves time and money, and lets your team concentrate on adding new stuff, not fixing old stuff.

The continuous build process feeds you valuable and timely information, letting you manage the development more tightly.

Here's how it works: Every hour, for example, and only if new work has done, the build scheduler checks out a fresh copy of your project from version control and attempts to build the project. The build process includes compiling all of your project's source files, running an arbitrary number of tests, and generating any other build artifacts, such as project metrics. If, for any reason, the build process is unsuccessful, then the build scheduler can notify you via email, your cell phone, RSS, or a visual device such as a lava lamp. You can also use your web browser to view the current status of the build, or any prior build.

Hermes 1.7 Released: Java GUI for JMS

URL: Hermes 1.7 Released: Java GUI for JMS

Hermes is an application that allows you to interact with JMS providers. Hermes will work with any JMS enabled transport making it easy to browse and seach queues, topics and JNDI. Messages can then be copied around, deleted or saved as files for later resending. A plugin framework that uses the native API to the provider gives monitoring of queue depth and other statistics for WebSphereMQ, WebLogic, JBossMQ, ArjunaMQ, JORAM, OpenJMS and WebMethods Enterprise.

New Features

A JNDI browser letting you browse and then create sessions from the ConnectionFactory instances bound and adding administered queues or topics to the session.
Plugins for WebSphereMQ, JBossMQ, WebLogicMQ, ArjunaMQ, WebMethods Enterprise, OpenJMS and ObjectWeb JORAM.
Search for messages on queues using a regular expression - dood where's my message?
Automaticaly discover queues/topics from supported providers and JNDI.
Queue/topic statistics from supported providers.
Built in renderer for FIX messages.
Display your messages in hexadecimal.
Manage provider class paths in their own class loaders, allowing you to use different versions of the same provider in Hermes and to avoid cross-provider dependency incompatabilities.

For more information see http://www.hermesjms.com

View the Release Notes

Filter collections

URL: Filter collections

Filter collections

A simple generic mechanism for filtering collections

Summary
This article describes a simple mechanism for filtering collections based on a variable number of criteria. This mechanism could prove useful in a search mask that offers many search criteria that the user can either select or ignore. (700 words October 18, 2004)
By David Rappoport

Printer-friendly version |

Mail this to a friend

ften, you must iterate through a collection of objects and filter them based on a number of criteria. The JDK supplies a useful mechanism for sorting collections, namely the Comparator interface. However, the JDK lacks a mechanism for filtering collections.

This article describes a simple mechanism consisting of only one class and one interface that allows you to filter collections quickly and neatly. When searching a collection, the described mechanism offers the same functionality as a SQL SELECT statement. Its underlying concept is its separation of responsibilities between iterating through the collection and filtering the objects in the collection.

The approach presented here has the following benefits:

Reuse of a central filtering component produces cleaner code
Reuse of common filtering components generates less error-prone code
Separating the iteration logic from the filtering logic allows you to add or remove filters at will without affecting any other code
Possible performance gains with large collections and multiple criteria

The problem

Imagine a search mask where a user can choose among numerous different criteria to search for cars. Approaching this task simply, the developer must iterate through the collection multiple times. In each iteration, he must execute certain logic on each object in the collection to decide whether it fits the criteria. Usually, the result of this process is messy code that is both hard to read and maintain.

The solution

We define a class called CollectionFilter and an interface called FilterCriteria.

FilterCriteria defines only one method: public boolean passes(Object o). In this method, an object in the collection must pass a certain test. If it passes the test, the method returns true, otherwise, false.

CollectionFilter now takes any number of FilterCriteria as input. You then call the public void filter(Collection) method, which applies all FilterCriteria to the supplied collection and removes any object in the collection that doesn't pass all FilterCriteria.

The CollectionFilter class also defines a public Collection filterCopy(Collection) method, which completes the same task as the filter(Collection) method, but on a copy of the original filter.

That's it!

As you may have noticed, this solution borrows some ideas from the Chain of Responsibility design pattern and applies them to a collection.

The following class diagram illustrates the classes and interfaces and how they relate to each other.

Simple example

Let's look at an example: Class Car has three attributes: String color, double maxSpeed, boolean fourWheelDrive.

Your application allows searching for cars based on these criteria. The user can enter the color she prefers. She can also provide the maximum speed she wants the car to have and also whether the car should support four-wheel drive.

We now create three filter classes, one for each criteria the user can choose.

Write the FilterCriteria implementations:

class ColorFilterCriteria implements FilterCriteria{ private String color; public boolean passes(Object o){ return ((Car)o).getColor().equals(color); } } class MaxSpeedFilterCriteria implements FilterCriteria{ private int maxSpeed; public boolean passes(Object o){ return ((Car)o).getMaxSpeed() >= maxSpeed; } } class FourWheelDriveFilterCriteria implements FilterCriteria{ private boolean fourWheelDriveRequired; private boolean fourWheelDriveAllowed; public boolean passes(Object o){ return fourWheelDriveRequired?((Car)o).isFourWheelDrive():fourWheelDriveAllowed?true:! ((Car)o).isFourWheelDrive(); } }

Then add these FilterCriteria to a CollectionFilter:

CollectionFilter collectionFilter = new CollectionFilter(); filter.addFilterCriteria(new ColorFilterCriteria(color)); filter.addFilterCriteria(new MaxSpeedFilterCriteria(maxSpeed)); filter.addFilterCriteria(new FourWheelDriveFilterCriteria(fourWheelDriveRequired, fourWheelDriveAllowed));

Now filter:

collectionFilter.filter(carCollection);

Technicalities

As you may have realized, similar to the compare(Object o1, Object o2) method in the Comparator interface, the passes(Object o) method in the FilterCriteria interface takes an object of type Object as input. This means you must cast the object to the type you want to work with and ensure your collection only contains an object of that type. If this is not certain, you can use instanceof to test whether the specific object is of that type.

Sometimes, you might prefer not to define a separate class for each FilterCriteria. The use of an anonymous inner class suggests itself in such cases.

To keep the solution simple, I refrained from adding OR functionality to this filter. In other words, every time you add a FilterCriteria to your CollectionFilter, this can be compared to an AND in a SQL statement, since you're adding another condition. However, you can easily add OR-like functionality within one FilterCriteria. For example:







class EitherOrColorFilterCriteria implements FilterCriteria{





    private String color1;





    private String color2;





    public boolean passes(Object o){





        return ((Car)o).getColor().equals(color1) || ((Car)o).getColor().equals(color2);





    }





}

Conclusion

As you have seen, it is simple to filter collections based on numerous criteria. Each FilterCriteria object is responsible only for the single filtering logic it represents. The CollectionFilter then combines all filters to produce the desired result. Similar solutions are conceivable for other kinds of manipulations of collections (besides removal). The solution combines the Chain of Responsibility and Iterator design patterns: The CollectionFilter iterates over the collection and for each object in the collection, the FilterCriteria objects act as chains of responsibility, where each filter can decide whether any additional filters prove necessary.