Java | The Darker Side of Programming

Improved method for removing duplicate white space

March 23, 2011 syzygy 1 comment

On the principle that constant refactoring is a good thing, I revisited my method for removing duplicate white space from Strings / StringBuffers. The result was extremely positive, a much cleaner and more streamlined method.

private StringBuffer rmDuplicateWS(StringBuffer sb)
{
int currentPos = 0;
char ws = ' ';
// trim the leading whitespace

while(sb.charAt(0)  == ws)
{
sb.deleteCharAt(0);
}
// now get the trailing whitespace

while(sb.charAt(sb.length() - 1)  == ws)
{
sb.deleteCharAt(sb.length() - 1);
}
// loop until we reach the end, deleting duplicate ws instances

boolean chk = true;
while(chk)
{
if((sb.charAt(currentPos) == ws) && (sb.charAt(currentPos + 1) == ws) )
{sb.deleteCharAt(currentPos);}
else
{currentPos++;}
if(currentPos == sb.length() - 1)
{chk = false;} // exit
}

return sb;
}

Categories: Java Tags: Java, refactoring, String Manipulation, StringBuffer, whitespace

Implementing a split (or a pseudo-split) for StringBuffers/Builders

March 15, 2011 syzygy Leave a comment

One of the functionalities regrettably absent from the StringBuilder/StringBuffer families is the inbuilt and nice String method split(String regexp) (q.v. java.lang.String.split()), which will produce a tokenised array of Strings based around and consuming the supplied regexp token. The cranky way of doing this with a Stringbuffer is to cast your lightweight StringBuffer to a String, split to an array of Strings, then cast the array of Strings back to a StringBuffer array or List, which to me looks somewhat like defeating the object of the entire exercise in working with StringBuffers. I have a marked preference for working with Lists as opposed to arrays but I do realise that there are those of the other faith who have valid reasons for their heretical idolatry (j/k) so I’ll provide methods for both outcomes.

Given that we have a method for providing a List of token positions from a supplied StringBuffer (q.v. Locating token positions[..]) (and I have a somewhat improved method for doing this which I will anyway supply as an appendage to this post – the refactored method has been renamed getTokenPositions as opposed to the earlier findTokens) the way is clear for us to implement the new split method.

/** Method to split an inbound StringBuffer by (consumed) tokens and produce a List 
* @param StringBuffer sbx - the StringBuffer to split
* @param StringBuffer sbTok - a StringBuffer representation of token(s) to use to split
* @return List of StringBuffers split out
*/
public List split(StringBuffer sbx, StringBuffer sbTok)
    {
    int tokSz = sbTok.length();
    List lix = new ArrayList();
    List lPos = getTokenPositions(sbx, sbTok );
    if( lPos.isEmpty() || lPos == null) // no split?  send the original sb back
    {
        lix.add(sbx);
        return lix;
    }

    int start = 0;
    if(lPos.get(0) == 0)
    {
    start += tokSz;
    }

    int iSz = lPos.size();

        for (int i = 0; i < iSz; i++) {
            StringBuffer sbnew = new StringBuffer();
        if(i + 1 == iSz)
        {
        sbnew = new StringBuffer(sbx.subSequence(start, sbx.length()));
        }
        else
        {
                sbnew = new StringBuffer(sbx.subSequence(start, lPos.get(i + 1)));
                start = lPos.get(i + 1) + tokSz;
            }
           // System.out.println(sbnew.toString());
            lix.add(sbnew);
        }

    return lix;
    }

To produce an Array of StringBuffers, you merely need to change the return method signature

public StringBuffer[] split(StringBuffer sbx, StringBuffer sbTok)

and modify the code where the returns occur (2 places) to read:

 return (StringBuffer[]) lix.toArray();

Modified method for providing a List of token positions

I mentioned earlier I had a somewhat improved version of the findTokens method. The code for this (+ the comparator-helper List construction methods) follows:

 public List getTokenPositions(StringBuffer sbx, StringBuffer tok )
    {
    List liTok = charListFromSb(tok);
    List liOut = new ArrayList();
    int sz = tok.length() - 1;
    int finish = sbx.length() - sz;
    char firstTok = tok.charAt(0);
    char lastTok = tok.charAt(sz);
        for (int i = 0; i < finish; i++) {
            if ( (sbx.charAt(i) == firstTok)   && (sbx.charAt(i + sz) == lastTok) )
            {
            List comp =  charListFromSb(sbx, i, i+ sz);
            if (comp.equals(liTok))
              {
                boolean add = liOut.add(i);
              }
            }
        }
    return liOut;
    }

 public List charListFromSb(StringBuffer sbx)
    {
        List liOut = new ArrayList();
        int iEnd = sbx.length();
        for (int i = 0; i < iEnd; i++) {
            boolean add = liOut.add(sbx.charAt(i));
        }

    return liOut;
    }
 public List<Character> charListFromSb(StringBuffer sbx, int start, int finish)
    {
       
        List<Character> liOut = new ArrayList<Character>();
        for (int i = start; i <= finish; i++) {
            boolean add = liOut.add(sbx.charAt(i));
        }
    return liOut;
    }

Categories: Java Tags: Java, refactoring, String Manipulation, String splitting, StringBuffer

Working with XPath

February 16, 2011 syzygy 1 comment

It has always struck me that XPath is a nice tool for XML forensics. It can be exposed very easily and quickly, and, from a programmatic point of view, can be used to open up and expose the intricacies of often very complex xml documents with a minimum of effort.

Here’s a (slightly simplified refactored and cleaned) version of my basic setup class for examining documents with XPath.

import java.io.IOException;
import java.io.InputStream;
import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.namespace.QName; //not actually used in this vn. but can be handy to have around...
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.xpath.XPath;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import javax.xml.xpath.XPathFactory;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;

public class XPathBase {

private DocumentBuilderFactory domFactory;
private DocumentBuilder builder;
private Document doc;
private String xmlFile;
private XPath xPath;
private String resourceRoot = "";
private InputStream inputStream ;
private javax.xml.xpath.XPathExpression expression;
private DocumentBuilderFactory dbFactory = DocumentBuilderFactory.newInstance();

/**
* Constructor takes 2 args, file to examine, and the resource location
**/
public XPathBase(String xFile, String resRoot) {
resourceRoot = resRoot;
xFile = resourceRoot + xFile;
this.xmlFile = xFile;
setDomObjects();
}

public InputStream getAsStream(String file)
{
inputStream = this.getClass().getResourceAsStream(file);
return inputStream;
}

public void setDomObjects()
{
try {
domFactory = DocumentBuilderFactory.newInstance();
domFactory.setNamespaceAware(true);
builder = domFactory.newDocumentBuilder();
doc = builder.parse(getAsStream(xmlFile));
xPath = XPathFactory.newInstance().newXPath();
} catch (SAXException ex) {
Logger.getLogger(XPather.class.getName()).log(Level.SEVERE, null, ex);
} catch (IOException ex) {
Logger.getLogger(XPather.class.getName()).log(Level.SEVERE, null, ex);
} catch (ParserConfigurationException ex) {
Logger.getLogger(XPather.class.getName()).log(Level.SEVERE, null, ex);
}

}

public Document getDoc() {
return doc;
}

public InputStream getInputStream() {
return inputStream;
}

public XPath getxPath() {
return xPath;
}

public String getXmlFile() {
return xmlFile;
}

// [..] Getters & setters for other objects declared private may follow (i.e. add what you need although typically you will only need getters for the XPath, the Document, and the InputStream)

// Not really part of this class and would be (ordinarily) implemented elsewhere
public void readXPath(String evalStr)
{
try {
XPathExpression expr = xPath.compile(evalStr);
Object result = expr.evaluate(doc, XPathConstants.NODESET);
NodeList nodes = (NodeList) result;

for (int i = 0; i < nodes.getLength(); i++) {
if(nodes.item(i).getNodeValue() != null)
{
 System.out.println(nodes.item(i).getNodeValue());
}
}

} catch (XPathExpressionException ex) {
Logger.getLogger(XPather.class.getName()).log(Level.SEVERE, null, ex);
}

}

}

As I mentioned in the source code the readXPath method at the tail is not really a part of this class and is provided for illustrative purposes and will allow us to quickly begin to get under the bonnet.

Let’s set up a piece of trivial xml for examination

<?xml version="1.0" encoding="UTF-8"?>

<root>
<people>
<person ptype = "author" century = "16th">William Shakespeare</person>
<person>Bill Smith</person>
<person ptype = "jockey">A P McCoy</person>
</people>
</root>

Assuming you had a file called test.xml in a resources folder you would action this as follows:


XPathBase xpb = new XPathBase("test.xml","resources/");
xpb.readXPath("//people/person/*/text()");

The readXPath method really belongs in another class which is more concerned with the handling of XPath expressions and manipulation than the nuts and bolts of xml document setup and preparation. The following code is an example of how this might begin to look (it won’t look anything like this in the long run, but it will give you an idea of how refactoring and reshaping a class can pay big dividends).


import java.util.logging.Level;
import java.util.logging.Logger;
import javax.xml.xpath.XPathConstants;
import javax.xml.xpath.XPathExpression;
import javax.xml.xpath.XPathExpressionException;
import org.w3c.dom.NodeList;

public class XPathAnalyser {
    private String expression = "";
    private String file = "test.xml";
    private String resourceRoot = "resources/";
    private XPathBase xpb;
    private XPathExpression expr;

    public XPathAnalyser(String expr) {
        expression = expr;
        xpb = new XPathBase(file,resourceRoot);
    }
    public XPathAnalyser(String expr, String xFile, String resRoot) {
        expression = expr;
        file = xFile;
        resourceRoot = resRoot;
        xpb = new XPathBase(file,resourceRoot);
    }

public void readXPath(String evalStr)
    {
        try {
            XPathExpression expr = xpb.getxPath().compile(evalStr);
            Object result = expr.evaluate(xpb.getDoc(), XPathConstants.NODESET);
            NodeList nodes = (NodeList) result;

            for (int i = 0; i < nodes.getLength(); i++) {
                if(nodes.item(i).getNodeValue() != null)
                {
                         System.out.println(nodes.item(i).getNodeValue());
                }
            }

        } catch (XPathExpressionException ex) {
            Logger.getLogger(XPathBase.class.getName()).log(Level.SEVERE, null, ex);
        }

	}

}

This will work but it has a few drawbacks, notably that the class is dependant on the previous implementation of XPathBase, and strong dependencies are a bad thing. A better implementation would take the XPath setup class in as a class object in its own right and introspect accordingly. This would allow fully separation of context from implementation, use different document models, etc. We’ll live with this limitation for the moment while we begin to construct the XPathAnalyser in a more decomposed and useful shape. Most of the refactoring is about moving to a more rigorous setter/getter paradigm.

We can improve even as basic a method as readXPath with a bit of root and branch surgery. Making the compiled expression and the resultant NodeList private class variables gives us a lot more traction on the problem. Note the overloaded init method which is invoked without args in the constructor and has an overload which allows a fresh expression to be safely supplied.

public class XPathAnalyser {
    private String expression = "";
    private String file = "test.xml";
    private String resourceRoot = "resources/";
    private XPathBase xpb;
    private XPathExpression expr;
    private NodeList nodes;

    public XPathAnalyser(String expr) {
        expression = expr;
        xpb = new XPathBase(file,resourceRoot);
        init();
    }
    public XPathAnalyser(String exprStr, String xFile, String resRoot) {
        expression = exprStr;
        file = xFile;
        resourceRoot = resRoot;
        xpb = new XPathBase(file,resourceRoot);
        init();
    }

    public void init()
    {
          setExpression();
            setNodeList();
}
    public void init(String exprString)
    {
    expression = exprString;
    init();
    }

    public void setExpression()
    {
        try {
            expr = xpb.getxPath().compile(expression);
        } catch (XPathExpressionException ex) {
            Logger.getLogger(XPathAnalyser.class.getName()).log(Level.SEVERE, null, ex);
        }
    }

public void setNodeList()
    {
        try {
            nodes = (NodeList) expr.evaluate(xpb.getDoc(), XPathConstants.NODESET);
        } catch (XPathExpressionException ex) {
            Logger.getLogger(XPathAnalyser.class.getName()).log(Level.SEVERE, null, ex);
        }
}

public void readXPath()
    {
        try {
      
            for (int i = 0; i < nodes.getLength(); i++) {
                if(nodes.item(i).getNodeValue() != null)
                {
                System.out.println(nodes.item(i).getNodeValue());
                }
            }

        } catch (Exception ex) {
            Logger.getLogger(XPathBase.class.getName()).log(Level.SEVERE, null, ex);
        }

	}

    public int getNodeListSize() {
        return nodes.getLength();
    }

    public NodeList getNodeList() {
        return nodes;
    }

}

The simple getters getNodeList() & getNodeListSize are useful since we are now in a position to work with the object in a much more amenable fashion. We can dig a little deeper by adding a reader to the class to examine attributes.

public boolean containsAttributes()
{
    int k = getNodeListSize();
    for (int i = 0; i < k; i++) {
    if( nodes.item(i).hasAttributes())
    {
     return true;
    }
return false;
}

public void readAttributes()
{
if (!containsAttributes())
{return;}

    for (int i = 0; i < getNodeListSize(); i++) {
        NamedNodeMap nnm =    nodes.item(i).getAttributes();
            for (int j = 0; j < nnm.getLength(); j++) {
                 //  System.out.println(nnm.item(j).getLocalName());
                 String attr = nnm.item(j).getLocalName();
                 System.out.println(nnm.getNamedItem(att));
            }
    }
}

This can obviously be extended and modified as necessary but the crux of the matter is that once you can construct the XPath and produce a viable nodelist, there’s very little you can’t do in the way of parsing and dissecting an XML document.

Categories: Java Tags: Java, refactoring, XML, xpath

Java fullscreen graphics – reporting basic graphics capabilities

December 26, 2010 syzygy Leave a comment

I’ve been toying with the Java full screen graphics capability recently, just to get a handle on how it hangs together since I’m becoming somewhat tired with writing windowed apps. It’s admittedly complex but like most complex things can be broken down and composed into separated tasks and components.

As part of the experimentation process I did what I usually do when presented with a complex problem, which is to find a logical start point and work my way around from there, the logical start point in this case being the java.awt.GraphicsEnvironment which typically describes the graphical environment of the machine you are working on (although it can be used to describe a remote machine). Extending out from there it is a fairly seamless process to define the graphics environment you are confronted with e.g.

import java.awt.DisplayMode;
import java.awt.GraphicsConfiguration;
import java.awt.GraphicsDevice;
import java.awt.GraphicsEnvironment;
import java.awt.Rectangle;

public class GraphicsReportManager {
    protected GraphicsEnvironment env = GraphicsEnvironment.getLocalGraphicsEnvironment();
    protected GraphicsDevice gd = env.getDefaultScreenDevice();
    protected boolean fullScreen = gd.isFullScreenSupported();
    protected Rectangle r = env.getMaximumWindowBounds();
    protected double ht = r.getHeight();
    protected double wt = r.getWidth();
    protected GraphicsDevice[] gdArr = env.getScreenDevices();
    protected long fastMem = gd.getAvailableAcceleratedMemory();
    protected GraphicsConfiguration[] gc = gd.getConfigurations();
    protected DisplayMode[] dm = gd.getDisplayModes();
    

    public void printGraphicsReport()
    {
        System.out.println("Current Full Screen support: " + fullScreen);
        System.out.println("Dimensions: " +  ht +  " * " + wt );
        if(fastMem < 0)  // not all systems will report fast mem and will return a negative int showing state unknown
            {System.out.println("Available accelerated memory undefined");}
        else
            { System.out.println("Available accelerated memory: " + fastMem); }

        int len = dm.length;
        System.out.println("DisplayModes = " + len + " & are: ");
        StringBuffer sb = new StringBuffer();
        for (int i = 0; i < len; i++) {
            sb.setLength(0);
            sb.append(dm[i].getHeight()).append(" * ").append(dm[i].getWidth());
            sb.append(" ");
            sb.append(dm[i].getRefreshRate()).append("hz");
            sb.append(" Depth: ");
            sb.append(dm[i].getBitDepth());
            System.out.println(sb.toString());

        }       
    }
    }

This will give us the basics of what we need to know, but obviously there is room for further definition of graphics capabilities and I will be building these up along the way as the need arises.

Getting font details back initally from GraphicsEnvironment looks to be a time-expensive operation (it feels slow and if it feels slow it probably is). However the reality is you may need this information so you could supplement the class with the following methods:

public void printAllAvailableFonts()
    {
    String[] sFonts = env.getAvailableFontFamilyNames();
    int len = sFonts.length;
        for (int i = 0; i < len; i++) {
            String string = sFonts[i];
            System.out.println(string);
        }
}
public void printAllAvailableFonts(Locale locale)
    {
    String[] sFonts = env.getAvailableFontFamilyNames(locale);
    int len = sFonts.length;
        for (int i = 0; i < len; i++) {
            String string = sFonts[i];
            System.out.println(string);
        }

The overloaded method is potentially handy if you need to see only fonts for a particular language locale or the current locale. Constructing a Locale object is easy enough. You can either do this to get the current Locale

 Locale loc = new Locale(System.getProperty("user.language"),System.getProperty("user.country") );

or you can just pass it the String name of the language you’re interested in e.g.:

Locale loc2 = new Locale("en","US");

Footnote (later)

I think I am probably going to abandon this experiment in Java, much as I would like to do it this way, and write the mini-app I had in mind properly in C++. The graphics stuff in Java still really isn’t good enough & is the wrong side of unpalatably messy; double-buffering as a strategy to reduce screen flicker is a real killer (for a revealing take on this see this c2 wiki link).

Categories: Java Tags: GraphicsEnvironment, Java, java graphics

Extending and the improving the token location method

December 16, 2010 syzygy Leave a comment

I have been busily building up the methods which surround the token location method I outlined in yesterday’s post since I aim to build a robust StringBuffer/Builder helper class which is as flexible and intuitive as the locator methods expressed in the standard Java String class.

Obviously one thing I will want to do with the method outlined previously is to build a stripped down version which only iterates the String to search once for handling short token matches. And I’ll obviously need to determine the criteria for deciding under what circumstances to use which of the two eventual implementations of this method I arrive at. The NetBeans profiling tool is an obvious win here for examining assets such as memory and heap usage, and I have a nice little wrap-around timer utility which I can inject into my class for assessing relative speeds of differently constructed search parameters passed as arguments. I’ll have a look at it over the weekend and once that’s out of the way and the appropriate method is being called by a delegator method, I’ll optimise some of the syntax candy which I am already beginning to surround this method with. None of the methods are particularly pretty or built at this stage for speed, they’re built for facility of implementation and can be used as is pretty much out of the box.

The obvious initial methods to sugar up are ones which can use the generated List object e.g. the trivial (and inefficient) countTokens method & its overload which follows:


/** Syntax candy to count the number of incidences of a token in a given char sequence */

public int countTokens(StringBuffer sb, StringBuffer sbx)
    {
            List<Integer> l = findTokens(sb, sbx);
            return l.size();
     }

/** Syntax candy ph to count the number of incidences of a token (expressed as items) in a given List */

public int countTokens(List<Integer> l)
    {
            return l.size();
     }

Next up there’s a standard boolean check to see whether there are any matched tokens:


public boolean containsMatch(StringBuffer sb, StringBuffer sbx)
    {
    if (countTokens(sb, sbx) < 1 )
        {
        return false;
        }
    return true;
    }

OK that’s the rough & ready syntax candy out of the way, now let’s look at how we can leverage the information we have back in the List. Examining large strings (& this is particularly the case with large strings arriving from or in markup language formats such as HTML/XML etc) it’s often the case that you need to know about the position of either a single char relative to the token’s position, or alternatively another token altogether. The char based implementations for forward and reverse location are relatively simple. They both take the String to search, the offset character index point and the char which needs to be located as arguments, and look like this:


public int findPreviousMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    while (ctr >= 0)
    {
            ctr--;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

public int findNextMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    int len = sb.length();
    while (ctr < len)
    {
            ctr++;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

We need to do the same thing for tokens. Arguments are an int indicating the starting point to search around, the StringBuffer to search (sb) and the token to search for (sbx) expressed again as a StringBuffer.


public int findTokenPriorToOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {start = sb.length();} // move start to be the sb.length() if start > the size of inc string

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){return loc;} // no match

    // only 1 item and bigger than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return loc;
    }
    // only 1 item and less than start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return (Integer) l.get(1);
    }

    Iterator it = l.iterator();
    while(it.hasNext())
    {
    int val = (Integer) it.next();
    if (val > start){return loc;}
    if (val < start)
    {
        loc = val;
    }

    }

    return loc;
}
public int findTokenAfterOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {        return  -1; } // won't be found.... ret -1

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){       return loc; }  // no match

    // only 1 item and less than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return loc;
    }
    // only 1 item and &gt; start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return (Integer) l.get(1);
    }
    Iterator it = l.iterator();
   

 while(it.hasNext())
    {
    int val = (Integer) it.next();
       if (val > start){return val;}
    }
    // fallthrough
    return loc;
}

Categories: Java Tags: Java, Programming, refactoring, String Manipulation, StringBuffer

Locating token positions in StringBuffers

December 15, 2010 syzygy Leave a comment

This might be useful to someone along the way; I certainly have practical uses for it of my own for fast exact matching & location of long tokens in StringBuffers. It’s probably up for a certain amount of optimisation since it double scans the searchable base StringBuffer object, on the obverse side in certain situations this is probably more efficient than trying to do it all in one bite since it builds two List objects, a preliminary one which contains possible matches, and from this list of possible matches it then refines its search to provide a definitive List of exact matches – the longer the length of the search token the more efficient this ultimately is. The method could also be expanded or enhanced to do regex style pattern matching and case insensitive matching.

/**
* Method for searching the StringBuffer sb to identify the int locations of instances of the contents of StringBuffer sbx
*
* Returns a list of Integer positions of all occurrences of an exact match of the passed StringBuffer sbx in
*  StringBuffer sb
*/
public List<Integer> findTokens(StringBuffer sb, StringBuffer sbx)
    {
            int ctr = 0;
            int len = sb.length();
            int k = sbx.length();
            char tokenStart = sbx.charAt(0);
            char tokenEnd = sbx.charAt(k - 1);

            List possibles = new ArrayList();
            for (int i = 0; i < (len - (k - 1)); i++) {
            if((sb.charAt(i) == tokenStart) && (sb.charAt(i + (k - 1)) == tokenEnd))
            {
                possibles.add(i);
            }
            }

            List definites = new ArrayList();
            Iterator it = possibles.iterator();
            while (it.hasNext())
            {
            int start = (Integer) it.next();
            boolean OK = true;
            int tokCtr = 0;
            for (int i = start; i < start + (k - 1); i++) {
                if(sb.charAt(i) != sbx.charAt(tokCtr))
                {OK = false;} // probably ought to break/label here if you want to make it bleed (I don't, need the trace!)

                tokCtr++;
                }
               if(OK) // don't add if not ok!
               {
                    definites.add(start);
                }
            }
            return definites;
     }

Categories: Java Tags: Java, String Manipulation, StringBuffer

Inspecting HashMaps

November 7, 2010 syzygy Leave a comment

One query which seems to turn up frequently in the blog stat reports is how to iterate or print a HashMap so that its contents are visible. The bottom line is that HashMap itself doesn’t directly support an iterator. However the getKeys() method of HashMap returns a Set object which can be iterated without any problem.

public void printHashMap(HashMap ha)
{
Set keys = ha.keySet();
Iterator it = keys.iterator();

String sx = "";

while (it.hasNext()) {
          Object key = it.next();
        String sKey = key.toString();
        Object val = ha.get(key);
        String kVal = val.toString();
        sx += sKey + " - " + kVal + "\n";

    }
    System.out.println(sx);

}

Obviously once you’ve got the trick of handing off the iteration of a HashMap to the Set returned by getKeys() life gets altogether more comfortable; no longer is that HashMap a ferociously daunting object lurking in memory,a monster which you can’t be certain at any time as to what it contains, but it becomes an altogether more amenable and house-trained creature.

I’ll get round to a few more of these over the coming weeks, in between writing some more semantic parsing code which I’ll also be blogging about.

Categories: Java Tags: HashMaps, Iteration, Java

JRuby using ScriptEngine, invoking methods, passing args

March 4, 2010 syzygy Leave a comment

Doing the same job with JRuby was, if anything, even easier than getting Jython up and running. Download jar, classpath it, run up a couple of units to test POC and then a quick and dirty Hello World:

   ScriptEngine engine = new ScriptEngineManager().getEngineByName("jruby");
    String f = "puts 'JRuby says hello world' " ;
        try {
            engine.eval(f);
        } catch (ScriptException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }

No indentation issues to concern myself with, and because JRuby in most cases is agnostic about double or single quotes for String representations, no escaping either. Now I want to play with some of my mixins and classes to see what happens. First up in my list of concerns is that most of my Ruby classes were written back in the day when mastodons walked the earth, when the command line interface was at the cutting edge of GUI design, and gems were rare and precious commodities. By a flickering candle’s light I crafted a cunning set of .rb modules to give me access to a number of utilities, one of which was a toolset for doing some basic math stuff which ruby lacked at the time. Now it’s probably all plug and play and not only will it do the math for you, it’ll probably make you tea and hand around the biscuits whilst doing so. Let’s consider the following power method which I’ve isolated from its friends and loved ones and put into solitary in a new file called testpow.rb.

# raise a number to power
def power( a, pow )
ctr = 1
b = a
  while ctr < pow
    b = a * b
    ctr += 1
  end
return b
end

We have two issues to deal with here. We need to call the method and also pass arguments to the method via the ScriptEngine. The way this is done is with the Invocable interface (the relevant JavaDoc is here).

Now all we need to do to make use of this method is something not a million light years removed from this:

ScriptEngine engine = new ScriptEngineManager().getEngineByName("jruby");
String pathToRubyScripts = "X:/anon/productionScripts/"; // obviously make this appropriate to your .rb file locations
    FileReader fr = null;
    Invocable inv = null;
        try {
            fr = new FileReader(pathToRubyScripts + "testpow.rb");
        } catch (FileNotFoundException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }
        try {
              engine.eval(fr);
               inv = (Invocable) engine;
            try {
                long lg = (Long) inv.invokeFunction("power", 10,3); //invoke(String,Object args)
                System.out.println("Answer: " + lg);
            } catch (NoSuchMethodException ex) {
                Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
            }
        } catch (ScriptException ex) {
            Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
        }

Quietly I’m impressed with just how easy it is to implement scripting in Java.

Categories: Java Tags: Invocable, Java, JRuby, ScriptEngine

Embedding Jython script in Java with ScriptEngine…

March 3, 2010 syzygy 1 comment

In an idle hour I did some more playing about with the Java ScriptEngine (see the earlier post on !Reinventing the Wheel) and decided to see just how easy it was to introduce another ScriptEngine into the mix. It was a toss-up between JRuby and Jython, but in the balance I went with Jython because it has certain benefits to work I frequently get engaged in relating to middleware, although I am altogether less familiar and comfortable with Jython/Python than JRuby/Ruby.

It was spectacularly easy, in fact depressingly so, not a technical challenge in sight. Download and classpath jar. Register Jython with ScriptEngine, run up a JUnit to test it’s available etc recycling the stringEval method outlined in the earlier post to test POC.

OK next up we want Java to be able to execute a Jython script proper. Here’s a simple Jython script to get the current directory (scarfed & modified from WebsphereTools):

import sys
import java.io as jio
currentdir = javaio.File (".")
print "Current directory : " + currentdir.getCanonicalPath()

Worked first time in standalone mode from Jython itself.

Next up is to see whether we can make it execute this script when embedded as a String in Java. Again depressingly easy.

static final private ScriptEngineManager sm = new ScriptEngineManager();
static final private ScriptEngine sEngine = sm.getEngineByName("jython");
static final String lsep = System.getProperty("line.separator");

public static void main(String[] args)
{
String a = "import sys" + lsep;
        a += "import java.io as javaio" + lsep;
        a+= "currentdir = javaio.File (\".\")" + lsep;
        a+= "print \"Current directory : \" + currentdir.getCanonicalPath()" + lsep;
       Object anon = runScript(a, "jython");
}

 public static Object runScript(String script, String engine) {
        Object res = "";
        if (engine.equals("jython")) {
            try {
                res = sEngine.eval(script);
            } catch (ScriptException ex) {
                Logger.getLogger(Main.class.getName()).log(Level.SEVERE, null, ex);
            }
        }
return res;
}

Since I’m using NetBeans (can I say how much I love NetBeans again?) the answer this will print for me is:

Current directory : C:\Users\XXX\Documents\NetBeansProjects\TestEngineFactory (where XXX = my user base dir, names changed to protect the innocent for all the fairly obvious reasons).

The only real gotchas when doing something like this are to remember to ensure quotes are escaped correctly, that lines have a line separator and that you don’t trim or otherwise format incoming Jython script Strings when reading existing Jython scripts since Jython (being a Python superset) has those dull indentation rules etc.

Think I’ll stop blogging now and see how easy it is to do the same job with JRuby to access my arsenal of Ruby classes and mixins. I’ll let you know how I got on with an update later in the week.

Categories: Java Tags: Java, JRuby, Jython, Python, ScriptEngine, String Manipulation

HashMap->ArrayList breakdown and reconstitution of documents in Java

March 2, 2010 syzygy Leave a comment

Given the general utility of HashMaps with ArrayLists as values, as illustrated in the earlier post HashMap key -> ArrayList, it’s fairly obvious that we can in most cases (actually all, but I’m open to persuasion that somewhere an exception may exist) represent a document, such as a short story by Lord Dunsany, a play by William Shakespeare, an XML or Soap document, in fact any document in which repetitious tokens occur or are likely to occur, as a set of keys indicating the tokens, and an arraylist of their relative positions in the document.

Let’s as an example take a block of text such as that found in Churchill’s speech of November 10, 1942 in the wake of Alexander and Montgomery repelling Rommel’s thrust at Alexandria: “Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.” It’s short, sweet, and contains the criteria we need for a good example, notably multiple repetitive tokens. For the purposes of this examination we’ll totally neuter the punctuation and capitalisation so that we just have tokens:now this is not the end it is not even the beginning of the end but it is perhaps the end of the beginning. At this juncture it is still recognisable as Churchill’s text. Let’s turn this into a reproducible hashmap in which each unique word is a key to a value set which is an ArrayList containing instances of the positions of the word in the text which we supplied it with. I won’t bore you with repeating the addArrayListItem method (it’s in the post immediately below this one…) and the printOutHashMap method is somewhat self-evident but included for compeleteness.

public void someBoringMethodSig()
{
String inString = "now this is not the end it is not even the beginning of the end ";
inString += "but it is perhaps the end of the beginning";

HashMap ha = new HashMap();

String[] tok = inString.split(" ");
int k = tok.length;

 for (int i = 0; i< k; i++)
        {
        addArrayListItem(ha,tok[i],i );
        }

    printOutHashMap(ha);
   // assemble(ha);
    }
public void printOutHashMap(HashMap hamp)
{
Collection c = hamp.values();
Collection b = hamp.keySet();
Iterator itr = c.iterator();
Iterator wd = b.iterator();
    System.out.println("HASHMAP CONTENTS");
    System.out.println("================");
while(itr.hasNext()){
System.out.print("Key: >>" + wd.next() + "<< Values: " );//+ itr.next()); }
    ArrayList x = (ArrayList) itr.next();
    Iterator lit = x.iterator();
    while (lit.hasNext())
    {System.out.print(lit.next() + " ");

    }
    System.out.println("");
}

}

This will give us an output which looks very much like

HASHMAP CONTENTS
================
Key: >>not<< Values: 3 8
Key: >>of<< Values: 12 21
Key: >>but<< Values: 15
Key: >>is<< Values: 2 7 17
Key: >>beginning<< Values: 11 23
Key: >>it<< Values: 6 16
Key: >>now<< Values: 0
Key: >>even<< Values: 9
Key: >>the<< Values: 4 10 13 19 22
Key: >>perhaps<< Values: 18
Key: >>this<< Values: 1
Key: >>end<< Values: 5 14 20

We can obviously reassemble this by recourse to the values, either in forward (starting at 0) or in reverse order (finding the max int value) and reassembling in a decremental loop. The implications of this are significant when one considers the potential applications of this sort of document treatment from a number of perspectives, notably IT security, improved data transmissibility by reduction, cryptography (where one would not transmit the keys but a serializable numeric indicator to a separately transmitted (or held) common dictionary, etc, etc.

Generally the reassembly process looks something like the following :


public void assemble(HashMap hamp)
{
Collection c = hamp.values();
Collection b = hamp.keySet();

int posCtr = 0;

int max = 0;

Iterator xAll = hamp.values().iterator();
while(xAll.hasNext()){
  ArrayList xxx = (ArrayList) xAll.next();
  max += xxx.size();
  }

String stOut = "";
boolean unfinished = true;

while (unfinished)
{
Iterator itr = c.iterator();
Iterator wd = b.iterator();
start:
while(unfinished){

String tmp = (String) wd.next() ; // next key
ArrayList x = (ArrayList) itr.next(); // next ArrayList
Iterator lit = x.iterator(); // next ArrayList iterator

while (lit.hasNext())
{
//System.out.println(lit.next());
if((Integer.valueOf((Integer) lit.next()) == posCtr )) // if it matches the positional counter then this is the word we want...
{
stOut = stOut + " " + tmp ; // add the word in tmp to the output String
posCtr++;

wd = b.iterator(); //don't forget to reset the iterators to the beginning!
itr = c.iterator();
if(posCtr == max) // last word to deal with....
{
System.out.println("FINISHED");
unfinished = false;}
break start; // otherwise goto the start point, rinse, repeat (irl this would be handled more elegantly, illustrative code here!)

}

}

}
}
System.out.println("ASSEMBLED: " + stOut);
}

This can obviously be improved in a number of ways, particularly moving from a String-based paradigm to one in which char[] array representations replace Strings in the HashMap.

Categories: Java Tags: ArrayList, Document assembly, Document decomposition, HashMaps, Java, String Manipulation

Older Entries

The Darker Side of Programming

Archive