Archive

Posts Tagged ‘Programming’

Extending and the improving the token location method

December 16, 2010 Leave a comment

I have been busily building up the methods which surround the token location method I outlined in yesterday’s post since I aim to build a robust StringBuffer/Builder helper class which is as flexible and intuitive as the locator methods expressed in the standard Java String class.

Obviously one thing I will want to do with the method outlined previously is to build a stripped down version which only iterates the String to search once for handling short token matches. And I’ll obviously need to determine the criteria for deciding under what circumstances to use which of the two eventual implementations of this method I arrive at. The NetBeans profiling tool is an obvious win here for examining assets such as memory and heap usage, and I have a nice little wrap-around timer utility which I can inject into my class for assessing relative speeds of differently constructed search parameters passed as arguments. I’ll have a look at it over the weekend and once that’s out of the way and the appropriate method is being called by a delegator method, I’ll optimise some of the syntax candy which I am already beginning to surround this method with. None of the methods are particularly pretty or built at this stage for speed, they’re built for facility of implementation and can be used as is pretty much out of the box.

The obvious initial methods to sugar up are ones which can use the generated List object e.g. the trivial (and inefficient) countTokens method & its overload which follows:


/** Syntax candy to count the number of incidences of a token in a given char sequence */

public int countTokens(StringBuffer sb, StringBuffer sbx)
    {
            List<Integer> l = findTokens(sb, sbx);
            return l.size();
     }

/** Syntax candy ph to count the number of incidences of a token (expressed as items) in a given List */

public int countTokens(List<Integer> l)
    {
            return l.size();
     }

Next up there’s a standard boolean check to see whether there are any matched tokens:


public boolean containsMatch(StringBuffer sb, StringBuffer sbx)
    {
    if (countTokens(sb, sbx) < 1 )
        {
        return false;
        }
    return true;
    }

OK that’s the rough & ready syntax candy out of the way, now let’s look at how we can leverage the information we have back in the List. Examining large strings (& this is particularly the case with large strings arriving from or in markup language formats such as HTML/XML etc)  it’s often the case that you need to know about the position of  either a single char relative to the token’s position, or alternatively another token altogether. The char based implementations for forward and reverse location are relatively simple. They both take the String to search, the offset character index point and the char which needs to be located as arguments, and look like this:


public int findPreviousMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    while (ctr >= 0)
    {
            ctr--;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

public int findNextMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    int len = sb.length();
    while (ctr < len)
    {
            ctr++;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

We need to do the same thing for tokens. Arguments are an int indicating the starting point to search around, the StringBuffer to search (sb) and the token to search for (sbx) expressed again as a StringBuffer.


public int findTokenPriorToOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {start = sb.length();} // move start to be the sb.length() if start > the size of inc string

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){return loc;} // no match

    // only 1 item and bigger than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return loc;
    }
    // only 1 item and less than start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return (Integer) l.get(1);
    }

    Iterator it = l.iterator();
    while(it.hasNext())
    {
    int val = (Integer) it.next();
    if (val > start){return loc;}
    if (val < start)
    {
        loc = val;
    }

    }

    return loc;
}
public int findTokenAfterOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {        return  -1; } // won't be found.... ret -1

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){       return loc; }  // no match

    // only 1 item and less than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return loc;
    }
    // only 1 item and &gt; start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return (Integer) l.get(1);
    }
    Iterator it = l.iterator();
   

 while(it.hasNext())
    {
    int val = (Integer) it.next();
       if (val > start){return val;}
    }
    // fallthrough
    return loc;
}

Fast (StringBuffer-centric) location of substrings

February 20, 2010 Leave a comment

Over the past few days I’ve extended and refactored my StringBuffer-oriented fast String search classes a little. I’ve got to say that the performance improvement over regexps in dissecting large String objects (and we’re talking gigs of Strings that I’m playing with here) is paying significant dividends. Also, constant refactoring is a paradigm I’ve paid much attention to over the years, there’s no such thing in my book as “perfect code”, there is only code which exposes itself as being in less need of improvement than other code. One piece of code that was high on my hit list was the locateStringContents method, which was ripe for extension to be used to locate substrings and not just single chars. The first thing that struck me on revisiting the method in question was that I had done something which I ought not to have done, which was to call a function unnecessarily in a for loop (here’s the code down to the offending line)

public int[] locateStringContents(StringBuffer sb, StringBuffer sbCheckable)   {
    List l = new ArrayList();
    for (int i = 0; i < sb.length(); i++)
[..]

So I fixed that so that it now looks like this (and I had compounded the error in an inner loop with a similar call to sbCheckable.length so I fixed that at the point where it would only be called once at the same time).

public int[] locateStringContents(StringBuffer sb, StringBuffer sbCheckable)    {
    List l = new ArrayList();
    int iCheckSize = sbCheckable.length();
    int iSbSize = sb.length();
    for (int i = 0; i < iSbSize; i++)
[..]

These look like small and superficial changes but when iterated over several hundred million times the time-savings can be significant and it was important to pick up whilst optimising my code to make it bleed. I’m now systematically doing similar clean-ups on a number of methods which contain similar lapses of coding. It’s all too easily done, particularly when working against tight timescales, and subsequent revisitation of code and classes which are substantial consumers of processor resources can often pay big dividends.

After the cleanup I turned my attention to what needed to be done to do some fast substring location. I approached this initially from a linear point of view with a nested looping construct which matched character for character but the end result was frankly horrible to look at as a piece of code and moreover inefficient by comparison to my subsequent approach to the problem which was to deal with the problem pragmatically and eradicate the complex jungle of nested loops and conditional evaluations with a boolean method, containsMatch, which receives a slice of the right amount of String to examine on matching the first char and then evaluates the slice by comparison to the comparator sequence.


/**
*Method to expose all int start positions of a substring within a greater String, returning an int array containing their
* locations
**/
public int[] locateSubstrings(StringBuffer sb, StringBuffer sbCheckable)
{
List l = new ArrayList();
int iCheckSize = sbCheckable.length();
int iSbSize = sb.length() - iCheckSize; // since we don't want array overruns

for (int i = 0; i < iSbSize; i++)
{
if (containsMatch(sb.subSequence(i, i + iCheckSize),sbCheckable))
{
l.add(i);
continue;
}
}
return toIntArray(l);

 }

/**
*Method to determine whether a sliced CharSequence cs, derived from a String to evaluate, is equivalent to a comparative StringBuffer
**/
public boolean containsMatch(CharSequence cs, StringBuffer sCheckable)
{
int csLength = cs.length();
for (int i = 0; i < csLength; i++)
{
if (cs.charAt(i) != sCheckable.charAt(i)) { return false;} // fail if any one of the chars does not match
}
return true; // didn't fail ergo must be a match
}

public int[] toIntArray(List integerList) {
int iSize = integerList.size();
int[] intArray = new int[iSize];

for (int i = 0; i < iSize; i++) {
intArray[i] = integerList.get(i);
}
return intArray;
}

Refactoring simple code

September 25, 2009 Leave a comment

In this essay I’ll be outlining and exposing some of the approaches, thought processes & strategies I engage in when refactoring code. I’ll continue with some of the very simple themes outlined in Fun with String vectors… Part 1.Let’s say we now want to add two String vectors (usually derived from String arrays) together…. It’s something I often have to do when handling blocks of multiline text…

public  Vector<String> joinVectors(Vector<String> v1, Vector<String> v2)
{
Vector<String> vOutput = getPreparedVector();
vOutput.addAll(v1);
vOutput.addAll(v2);
return vOutput;
}

Easy enough. But I think you’ll agree that a lot of this vector activity is more geared to action surrounding the vector or array being handled than the String replacement class and probably really belongs in a class of its own. So at this point we might seriously think about refactoring and moving the Vector action to a StringVector class. The upshot of this is that we might also want to do similar sorts of things for all the other collection classes which we might be using to operate on Strings with. It also occurs to me that a lot of this sort of activity is not only pertinent to Strings but might equally well apply to Integers and all other non-primitive types which we might want to work on, e.g. Integers, Longs, etc. We will probably want to supply an interface to ensure conformity across our proposed StringVector class and, say, an IntegerVector class or a LongVector class.

The first and immediately obvious candidate for refactoring from previously is the method with the signature public Vector getPreparedVector(), a neutral enough name.

The next is the vectorToStringArray( Vector v ) , however it might appear as if semantics is going to be an issue here because we will want this method signature’s name to be applicable to all the other Objects we may need classes for (e.g. we are going to be deeply confused calling IntegerVector.vectorToStringArray()), so first off we could neutralise this by renaming it to vectorToArray. However, it should be apparent to you that vectorToArray(Vector v) is less a function of a Collection class than a generic Array replacement class, which we might putatively name ArrayMaster, where it will live with similar methods which will produce different types of arrays… for example an Integer array. In an ideal world this would present no problem but of course Java doesn’t recognise overloading on return values and two identically named methods will result in Java or your IDE notifying you that the overloaded method with the different return signature is already defined in the class. So.. The simple thing is to have one uniquely named method per Object type, i.e. vectorToIntegerArray, vectorToLongArray etc.

The other thing we might have liked to consider is to produce a method with the signature public Object[] vectorToArray(Vector v). However this won’t fly since you can’t cast a Java supertype Array into a subtype, and if you attempt to compile it you would get a ClassCastException, the reason being that every element would need to be examined by the JVM for type safety. (The rules for this can be found here: http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#20232). So it’s back to the drawing board with discretely named methods vectorToStringArray, vectorToIntegerArray, vectorToLongArray etc.

We can however make these Array conversion utilities simpler by refactoring the code a tad and taking advantage of Vector’s toArray() method e.g. from :

/**
* Will convert a passed vector to a string array
* @param Vector v - the vector for conversion
* @return String array of the converted vector
*/
public String[] vectorToStringArray( Vector<String> v )
{
int count = v.size();
String[] outArray = new String[count];
v.copyInto(outArray);
return outArray;
}

to:

/**
* Will convert a passed vector to a string array
* @param Vector v - the vector for conversion
* @return String array of the converted vector
*/
public String[] vectorToStringArray(Vector<String> v)
{
return (String[])v.toArray(new String[v.size()]);
}
// and our new Integer component.....
/**
* Will convert a passed vector to a integer array
* @param Vector v - the vector for conversion
* @return Integer array of the converted vector
*/
public Integer[] vectorToIntegerArray(Vector<Integer> v)
{
return (Integer[])v.toArray(new Integer[v.size()]);
}

[..]

Next up we can look at public Vector getVectorFromStringArray(String[] inArray) . The name is frankly horrible and we immediately refactor this to vectorToArray. While this is ostensibly type-agnostic, we will however want to type-check the Vector per Object type, and add one implementation per ObjectVector class, i.e. there will be a method typechecking and producing String Vectors in StringVector, Integers in IntegerVector, etc.

So we now have two obvious methods to define in our proposed vectorInterface interface, getPreparedVector and arrayToVector, which should look something like this.

import java.util.Vector;

public interface VectorInterface {
    public Vector getPreparedVector();
    public Vector arrayToVector(Object[] xArray);
}

Our new StringVector class should look something like this (I’ve removed comments, etc, for concision):

import java.util.Vector;

public class StringVector implements VectorInterface {

public  Vector<String> getPreparedVector()
{
Vector<String> vOutput = new Vector<String>();
return vOutput;
}

public  Vector<String> arrayToVector(Object[] sArray)
{
Vector<String> v = getPreparedVector();
for (int i = 0; i<sArray.length; i++)
{
v.add(sArray[i].toString());
}
return v;
}
/* Method to join two String vectors
* @param v1 vector to add to
* @param v2 vector to be added on to v1
* @return the joined vectors
*/
public  Vector<String> joinVectors(Vector<String> v1, Vector<String> v2)
{
Vector<String> vOutput = getPreparedVector();
vOutput.addAll(v1);
vOutput.addAll(v2);
return vOutput;
}

}

And our IntegerVector class will look pretty much the same with Integer substituted for String appropriately,and an Integer conversion of

v.add(sArray[i].toString()); 

to read

v.add(Integer.parseInt(sArray[i].toString()));
//or (if we can be sure of the contents of the Array)
v.add((Integer) sArray[i]);

A footnote about String equality testing

September 19, 2009 Leave a comment

It occurs to me that when I was writing about the string equality testing functions in an earlier post that aside from efficiency there are an overwhelming number of good reasons for preferring the String.equals() to == as a test of equality in a generalised function in a class containing String manipulation routines.

The crux of the matter is this: .equals() will always tell you the truth about the equality of the contents of Strings being compared since it does a byte by byte comparison of both Strings under consideration, hence it is inherently less efficient than == which does a comparison of the object references and not the equality of the contents. == will only return true if the Strings being compared are the same object.

Implementing Text to Speech quickly with FreeTTS

September 19, 2009 Leave a comment

I had a quick play with the FreeTTS Text to Speech API yesterday. I needed a voice alert for a little Swing app I was running up. Outwardly this looked like hard work, perhaps more work than I cared for, but nevertheless I persisted and came up with this as a quick first shot. I have to do some more work on it to resolve issues with threadedness, but that’s just a matter of time and, like Marcel Proust I will probably get round to it… someday anyway… Let me say that the TTS voices are horrible and work needs doing on these (for which the FreeTTS guys are not responsible) more than the API itself which works pretty much out of the box. The API, when boiled down, does what it says on the label.

You can get FreeTTS from here on Sourceforge.

You need to unzip it, head to the unzipped bin directory and run the JSAPI.exe to make the jars, then put the jars where you need them and make them available to your application. And put the speech.properties file somewhere your app can find it like your {jre}/lib. You will probably end up using the kevin voice. I have it hardcoded but you can easily extend and modify the code to take a String with the name of the voice you want to use or hardcode your own in place of nasty kevin. I’ve put the ShowVoices routine in there to show you the available voices just in case… But kevin works, almost. All you need to do is instantiate TTSReader, and call consume, passing it the String you want it to speak.


import com.sun.speech.freetts.Voice;
import com.sun.speech.freetts.VoiceManager;

public class TTSReader {

public void showVoices() {
        VoiceManager voiceManager = VoiceManager.getInstance();
        Voice[] v = voiceManager.getVoices();
        for (int i = 0; i > v.length; i++) {
            System.out.println(v[i].getName() + "  : " + v[i].getDomain() );
        }
    }
 
    public void consume(String input)
    {
    VoiceManager voiceManager = VoiceManager.getInstance();
        Voice myVoice = voiceManager.getVoice("kevin");
        myVoice.allocate();
        myVoice.speak(input);
        myVoice.deallocate();
    }
}
Categories: Java Tags: , , , ,

Netbeans <3

September 13, 2009 Leave a comment

I have a confession to make. I never really got to terms with Eclipse at any point for a number of reasons, most of which will be immediately apparent to anyone who has fought a long running insurgency campaign against an uncomfortable and frequently idiosyncratic IDE. In fact I never really liked IDEs very much at all, preferring the simplicity of text-based editors where you can get right under the bonnet and tinker with the mechanics of your code. Netbeans in its latest incarnation 6.7 RC1 changed that.

Finally I have found a Java IDE that even when it breaks (and mine blew up just the other day after a borked update) can be fixed without spending the rest of the week on it – five minutes of checking through the logs provided the solution to the problem, a disastrously big .zip file splatted onto my Vista Desktop (my bad). Move aforementioned .zip file, take out a lock file or two, run the installer again and back in business, all before breakfast. Not only is it robust ( at least if you have anything approaching a clue as to how to backfix stuff), but it has very few irritating features relative to the many which are handy and time/effort saving. My advice: try it. You won’t hate it and you may even like it…

Categories: Java Tags: , , ,

Fun with String vectors… Part 1

September 13, 2009 Leave a comment

Working extensively with Strings inevitably entails working with arrays. Lots of them. In this article I will be looking at how some of the pain can be taking out of this in our putative replacement class for String.

First off, one thing which makes working with String arrays easier is to convert your array to a Vector, ArrayList or one of the other Collection classes; your mileage will vary based around a number of criteria, not the least being what you intend to do, whether you need to handle this in a threaded manner, etc, etc and due consideration of the right Collection class to utilise is important; I have opted for Vector for this illustration because it has broadly based utility.

Arrays inherently are messy to work with and serial abuse of them inevitably results in an unhealthy proliferation of for loops, often nested to the point of Lovecraftian insanity.

Let’s therefore pretend for the purposes of simplicity & clarity in this article that we only have the Vector class with which to work. First off we’ll need to add

import java.util.Vector;

to our class header.

On demand empty String Vector

Next we’ll need a method to create an instant on-demand String Vector method:

public  Vector<String> getPreparedVector()
{
Vector<String> vOutput = new Vector<String>();
return vOutput;
}

Converting String[] to a Vector

Next up we need to have a method to pass a String array to a Vector since most of the inbuilt String class methods we frequently need recourse to produce output as an array. This we can do as follows:

/*
* @param inArray - the array to convert to a Vector
* @return the populated Vector from incoming String array
*/
public  Vector getVectorFromStringArray(String[] inArray)
{
// since this will be the same length as s but parsed
Vector<String> v = getPreparedVector();
for (int i = 0; i < inArray.length; i++)
{
v.add(inArray&#91;i&#93;);
}
return v;
}&#91;/sourcecode&#93;

<strong>Vector to String[]</strong>

And of course we will likely need a method to reverse the process, i.e. to convert the Vector back to an array...

/**
* Will convert a passed vector to a string array  
* @param Vector v - the vector for conversion
* @return String array of the converted vector
*/
public String[] vectorToStringArray( Vector v )
{
int count = v.size();
String[] outArray = new String[count];
v.copyInto(outArray);
return outArray;
}

Vector is responsible for very little of the code in the above method, and you are probably seeing already why it is a better candidate for hardcore work than an array: it has functionality which obviates the need for intricate loops. Most of the effort here derives from having to accomodate the somewhat Byzantine complexities of the Array class.

Growing a vector

One immediate advantage which a Vector has over an array is that the number of elements which it contains can be increased or decreased dynamically. The API documentation for Vector makes this point explicitly:

The Vector class implements a growable array of objects. Like an array, it contains components that can be accessed using an integer index. However, the size of a Vector can grow or shrink as needed to accommodate adding and removing items after the Vector has been created.

Each vector tries to optimize storage management by maintaining a capacity and a capacityIncrement. The capacity is always at least as large as the vector size; it is usually larger because as components are added to the vector, the vector’s storage increases in chunks the size of capacityIncrement. An application can increase the capacity of a vector before inserting a large number of components; this reduces the amount of incremental reallocation.

Inevitably, we will want to add and remove elements from our Vector; a royal pain with an Array, but a moment’s work in a Vector…

Vector<String> v = getPreparedVector();
v.add("Hello");
v.add("World");
v.add("A superfluous element");
v.add("etc");
v.remove(3);

[..]