Archive

Posts Tagged ‘StringBuffer’

Improved method for removing duplicate white space

March 23, 2011 1 comment

On the principle that constant refactoring is a good thing, I revisited my method for removing duplicate white space from Strings / StringBuffers. The result was extremely positive, a much cleaner and more streamlined method.

private StringBuffer rmDuplicateWS(StringBuffer sb)
{
int currentPos = 0;
char ws = ' ';
// trim the leading whitespace

while(sb.charAt(0)  == ws)
{
sb.deleteCharAt(0);
}
// now get the trailing whitespace

while(sb.charAt(sb.length() - 1)  == ws)
{
sb.deleteCharAt(sb.length() - 1);
}
// loop until we reach the end, deleting duplicate ws instances

boolean chk = true;
while(chk)
{
if((sb.charAt(currentPos) == ws) && (sb.charAt(currentPos + 1) == ws) )
{sb.deleteCharAt(currentPos);}
else
{currentPos++;}
if(currentPos == sb.length() - 1)
{chk = false;} // exit
}

return sb;
}

Implementing a split (or a pseudo-split) for StringBuffers/Builders

March 15, 2011 Leave a comment

One of the functionalities regrettably absent from the StringBuilder/StringBuffer families is the inbuilt and nice String method split(String regexp) (q.v. java.lang.String.split()), which will produce a tokenised array of Strings based around and consuming the supplied regexp token. The cranky way of doing this with a Stringbuffer is to cast your lightweight StringBuffer to a String, split to an array of Strings, then cast the array of Strings back to a StringBuffer array or List, which to me looks somewhat like defeating the object of the entire exercise in working with StringBuffers. I have a marked preference for working with Lists as opposed to arrays but I do realise that there are those of the other faith who have valid reasons for their heretical idolatry (j/k) so I’ll provide methods for both outcomes.

Given that we have a method for providing a List of token positions from a supplied StringBuffer (q.v. Locating token positions[..]) (and I have a somewhat improved method for doing this which I will anyway supply as an appendage to this post – the refactored method has been renamed getTokenPositions as opposed to the earlier findTokens) the way is clear for us to implement the new split method.

/** Method to split an inbound StringBuffer by (consumed) tokens and produce a List 
* @param StringBuffer sbx - the StringBuffer to split
* @param StringBuffer sbTok - a StringBuffer representation of token(s) to use to split
* @return List of StringBuffers split out
*/
public List split(StringBuffer sbx, StringBuffer sbTok)
    {
    int tokSz = sbTok.length();
    List lix = new ArrayList();
    List lPos = getTokenPositions(sbx, sbTok );
    if( lPos.isEmpty() || lPos == null) // no split?  send the original sb back
    {
        lix.add(sbx);
        return lix;
    }

    int start = 0;
    if(lPos.get(0) == 0)
    {
    start += tokSz;
    }

    int iSz = lPos.size();

        for (int i = 0; i < iSz; i++) {
            StringBuffer sbnew = new StringBuffer();
        if(i + 1 == iSz)
        {
        sbnew = new StringBuffer(sbx.subSequence(start, sbx.length()));
        }
        else
        {
                sbnew = new StringBuffer(sbx.subSequence(start, lPos.get(i + 1)));
                start = lPos.get(i + 1) + tokSz;
            }
           // System.out.println(sbnew.toString());
            lix.add(sbnew);
        }

    return lix;
    }

To produce an Array of StringBuffers, you merely need to change the return method signature

public StringBuffer[] split(StringBuffer sbx, StringBuffer sbTok)

and modify the code where the returns occur (2 places) to read:

 return (StringBuffer[]) lix.toArray();

Modified method for providing a List of token positions

I mentioned earlier I had a somewhat improved version of the findTokens method. The code for this (+ the comparator-helper List construction methods) follows:

 public List getTokenPositions(StringBuffer sbx, StringBuffer tok )
    {
    List liTok = charListFromSb(tok);
    List liOut = new ArrayList();
    int sz = tok.length() - 1;
    int finish = sbx.length() - sz;
    char firstTok = tok.charAt(0);
    char lastTok = tok.charAt(sz);
        for (int i = 0; i < finish; i++) {
            if ( (sbx.charAt(i) == firstTok)   && (sbx.charAt(i + sz) == lastTok) )
            {
            List comp =  charListFromSb(sbx, i, i+ sz);
            if (comp.equals(liTok))
              {
                boolean add = liOut.add(i);
              }
            }
        }
    return liOut;
    }

 public List charListFromSb(StringBuffer sbx)
    {
        List liOut = new ArrayList();
        int iEnd = sbx.length();
        for (int i = 0; i < iEnd; i++) {
            boolean add = liOut.add(sbx.charAt(i));
        }

    return liOut;
    }
 public List<Character> charListFromSb(StringBuffer sbx, int start, int finish)
    {
       
        List<Character> liOut = new ArrayList<Character>();
        for (int i = start; i <= finish; i++) {
            boolean add = liOut.add(sbx.charAt(i));
        }
    return liOut;
    }

Extending and the improving the token location method

December 16, 2010 Leave a comment

I have been busily building up the methods which surround the token location method I outlined in yesterday’s post since I aim to build a robust StringBuffer/Builder helper class which is as flexible and intuitive as the locator methods expressed in the standard Java String class.

Obviously one thing I will want to do with the method outlined previously is to build a stripped down version which only iterates the String to search once for handling short token matches. And I’ll obviously need to determine the criteria for deciding under what circumstances to use which of the two eventual implementations of this method I arrive at. The NetBeans profiling tool is an obvious win here for examining assets such as memory and heap usage, and I have a nice little wrap-around timer utility which I can inject into my class for assessing relative speeds of differently constructed search parameters passed as arguments. I’ll have a look at it over the weekend and once that’s out of the way and the appropriate method is being called by a delegator method, I’ll optimise some of the syntax candy which I am already beginning to surround this method with. None of the methods are particularly pretty or built at this stage for speed, they’re built for facility of implementation and can be used as is pretty much out of the box.

The obvious initial methods to sugar up are ones which can use the generated List object e.g. the trivial (and inefficient) countTokens method & its overload which follows:


/** Syntax candy to count the number of incidences of a token in a given char sequence */

public int countTokens(StringBuffer sb, StringBuffer sbx)
    {
            List<Integer> l = findTokens(sb, sbx);
            return l.size();
     }

/** Syntax candy ph to count the number of incidences of a token (expressed as items) in a given List */

public int countTokens(List<Integer> l)
    {
            return l.size();
     }

Next up there’s a standard boolean check to see whether there are any matched tokens:


public boolean containsMatch(StringBuffer sb, StringBuffer sbx)
    {
    if (countTokens(sb, sbx) < 1 )
        {
        return false;
        }
    return true;
    }

OK that’s the rough & ready syntax candy out of the way, now let’s look at how we can leverage the information we have back in the List. Examining large strings (& this is particularly the case with large strings arriving from or in markup language formats such as HTML/XML etc)  it’s often the case that you need to know about the position of  either a single char relative to the token’s position, or alternatively another token altogether. The char based implementations for forward and reverse location are relatively simple. They both take the String to search, the offset character index point and the char which needs to be located as arguments, and look like this:


public int findPreviousMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    while (ctr >= 0)
    {
            ctr--;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

public int findNextMatch(StringBuffer sb, int startPt, char toLocate)
    {
    int loc = -1;
    int ctr = startPt;
    int len = sb.length();
    while (ctr < len)
    {
            ctr++;
            if (sb.charAt(ctr) == toLocate)
            {return ctr;}
    }
    return loc;
    }

We need to do the same thing for tokens. Arguments are an int indicating the starting point to search around, the StringBuffer to search (sb) and the token to search for (sbx) expressed again as a StringBuffer.


public int findTokenPriorToOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {start = sb.length();} // move start to be the sb.length() if start > the size of inc string

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){return loc;} // no match

    // only 1 item and bigger than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return loc;
    }
    // only 1 item and less than start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return (Integer) l.get(1);
    }

    Iterator it = l.iterator();
    while(it.hasNext())
    {
    int val = (Integer) it.next();
    if (val > start){return loc;}
    if (val < start)
    {
        loc = val;
    }

    }

    return loc;
}
public int findTokenAfterOffset(int start, StringBuffer sb, StringBuffer sbx)
    {
    int loc = -1;
    if (start > sb.length())
    {        return  -1; } // won't be found.... ret -1

    int pos = start;
    List<Integer> l = findTokens(sb, sbx);
    if(l.size() == 0){       return loc; }  // no match

    // only 1 item and less than start? return -1
    if ((l.size() == 1) && ((Integer) l.get(1) < start) )
    {
       return loc;
    }
    // only 1 item and &gt; start? return token startpoint
    if ((l.size() == 1) && ((Integer) l.get(1) > start) )
    {
       return (Integer) l.get(1);
    }
    Iterator it = l.iterator();
   

 while(it.hasNext())
    {
    int val = (Integer) it.next();
       if (val > start){return val;}
    }
    // fallthrough
    return loc;
}

Locating token positions in StringBuffers

December 15, 2010 Leave a comment

This might be useful to someone along the way; I certainly have practical uses for it of my own for fast exact matching & location of long tokens in StringBuffers. It’s probably up for a certain amount of optimisation since it double scans the searchable base StringBuffer object, on the obverse side in certain situations this is probably more efficient than trying to do it all in one bite since it builds two List objects, a preliminary one which contains possible matches, and from this list of possible matches it then refines its search to provide a definitive List of exact matches – the longer the length of the search token the more efficient this ultimately is. The method could also be expanded or enhanced to do regex style pattern matching and case insensitive matching.

/**
* Method for searching the StringBuffer sb to identify the int locations of instances of the contents of StringBuffer sbx
*
* Returns a list of Integer positions of all occurrences of an exact match of the passed StringBuffer sbx in
*  StringBuffer sb
*/
public List<Integer> findTokens(StringBuffer sb, StringBuffer sbx)
    {
            int ctr = 0;
            int len = sb.length();
            int k = sbx.length();
            char tokenStart = sbx.charAt(0);
            char tokenEnd = sbx.charAt(k - 1);

            List possibles = new ArrayList();
            for (int i = 0; i < (len - (k - 1)); i++) {
            if((sb.charAt(i) == tokenStart) && (sb.charAt(i + (k - 1)) == tokenEnd))
            {
                possibles.add(i);
            }
            }

            List definites = new ArrayList();
            Iterator it = possibles.iterator();
            while (it.hasNext())
            {
            int start = (Integer) it.next();
            boolean OK = true;
            int tokCtr = 0;
            for (int i = start; i < start + (k - 1); i++) {
                if(sb.charAt(i) != sbx.charAt(tokCtr))
                {OK = false;} // probably ought to break/label here if you want to make it bleed (I don't, need the trace!)

                tokCtr++;
                }
               if(OK) // don't add if not ok!
               {
                    definites.add(start);
                }
            }
            return definites;
     }

HashMaps and how to bend them to your will

February 26, 2010 Leave a comment

When processing large volumes of String data, the ability to use HashMaps intelligently and creatively can be a very useful skillset to have in your armoury.  Let’s consider a hypothetical situation where you want to understand the range and extent of a given author’s vocabulary (actually, it’s not so hypothetical since I have a tool for doing probabilitistic analysis of documents which utilises a number of  methods and strategies which are very similar to those which I am about to expose).

By way of illustration, let’s start with an abridgement of the first few paragraphs of the wonderful short story by Lord Dunsany, How Nuth Would Have Practiced His Art Upon the Gnoles, a cautionary tale for the imprudent if ever there were. 

Despite the advertisements of rival firms, it is probable that every
tradesman knows that nobody in business at the present time has a
position equal to that of Mr. Nuth. To those outside the magic circle
of business, his name is scarcely known; he does not need to
advertise, he is consummate. He is superior even to modern
competition, and, whatever claims they boast, his rivals know it. His
terms are moderate, so much cash down when when the goods are
delivered, so much in blackmail afterwards. [..]

It must not be thought that I am a friend of Nuth’s; on the contrary
such politics as I have are on the side of Property; and he needs no
words from me, for his position is almost unique in trade, being among
the very few that do not need to advertise.

Given that Gutenberberg source material is derived from volunteers using scanning technologies etc of varying degrees of reliability and when dealing with text of this nature (or in fact of any sort) there are inevitably going to be a number of hurdles to deal with, we will need to accommodate the preponderance of these before parsing the text into our target HashMap. Duplicate white space, a problem I have identified and pointed up in an earlier article is potentially one; typographical errors will undoubtedly occur, line separation will need to be addressed, punctuation resolved, and, in the case of your source text being a  play, dramatic instructions and character identification issues will need handling. I say that we will need to handle the greater majority of the issues up front, although the construction of a free-text document parser and the implementation of its results is often and by and large a somewhat iterative discovery process, where the text will inevitably yield up some unique problems of its own. What we want to end up with prior to throwing the text into the HashMap is a homogenous collection of largely punctuation-free (exceptions being contractions like won’t and hasn’t and hyphenated words such as co-operative) single-space delimited words which looks something not unlike the abridged representation below:

Despite the advertisements of rival firms it is probable that every
tradesman knows that nobody in business at the present time has a
position equal to that of Mr Nuth [..]  do not need to advertise

In this instance, even with the punctuation and line wraps resolved, we will still at some point have to deal with issues relating to proper nouns, capitalisation of standard words, etc.

We could quickly count the words in the text by recycling the vowel count method explained in an earlier article, substituting the single whitespace char ' ' for the char[] array of vowels, so we immediately know how many tokens we are going to have to parse but we don’t really need to nor care at this juncture since we are going to have to String.split the text and can get the wordcount back from the resultant array size in any case.


HashMap ha = new HashMap();
String[] splitWords = theDoc.split(" "); // break the doc into an array based on cleansed ws

int len = splitWords.length; // and find out how many words we have

Now we need to iterate through our array of words. The rules of this game are simple. The keys of the hashmap will be unique words, the values will be the number we have thus far encountered as we walk though the array. If the word is NOT already in the hashmap, we have to create a new key for the word and set the value to 1. If, however, the word is already in the hashmap, we just increment the current value by 1. To paraphrase those pesky meerkats on TV, simples.

OK, let’s look at how we might do the iteration, carrying on from where we left off with the code snippet above…

for (int i = 0; i < len; i++ )
    {
    String word = splitWords[i].trim(); 

    if(!ha.containsKey(word)) // if the word isn't already in the hashmap....
    {
    ha.put(word, 1); // add the word to the HashMap as a key and set the value to 1...
    }
    else // it must be in the hashmap
    {
    Integer objectInteger = (Integer) ha.get(word); // ha.get returns Object so we cast to Integer
    int counter = objectInteger.intValue(); // and thence to an int
    counter += 1; // increment the value
    ha.put(word, counter); // and put the new value back in the hashmap by key
    }

    }

The only real nasty with HashMaps is that they have a tendency, by dint of their generic nature, to produce output with methods like get and put as Objects which you will need to cast appropriately. The upside to this is that you don’t have to overly concern yourself with what type of Objects you are putting in them so long as you deal with them appropriately.

Interrogating your HashMap to see what it looks like when you’re done is a snip if your hashmap is fully populated with both keys and values (which your’s will be since you pump the value for each word when you create the key). Here’s a quick example that dumps a hashmap to screen.

public void printPopulatedHash(HashMap hamp)
{
Collection c = hamp.values(); // put all the values in c 
Collection b = hamp.keySet(); // and all the keys go in b
Iterator itr = c.iterator(); // create an iterator for each...
Iterator wd = b.iterator();
 System.out.println("HASHMAP CONTENTS");
 System.out.println("================");
while(wd.hasNext()){
System.out.println("Key: " + wd.next() + " Value: " + itr.next()); }

}

We could, of course, have obviated the need to split the text into an array and processed the corpus as a StringBuffer. This, for all the well-worn reasons would work a lot faster.

This is a fairly straightforward process with only a little array arithmetic involved and I’ll address how you’d do it now, since the grist of the article, getting stuff into and out of HashMaps has been covered. It’ll parse teh Gutenberg Hamlet as a text file to a HashMap count of unique words in under 16 milliseconds on my clunky laptop… Benchmarking this against the String[] split version earlier, the StringBuffer method comes in at +/- 15% faster, which really is not significant if you’re only processing a single document, but if you have a lot of them to do, it all adds up….

public HashMap doSbHashing(String theDoc)
{
HashMap ha = new HashMap();
StringBuffer sb = new StringBuffer(theDoc.trim());
sb.append(' '); // stick a char whitespace on the end so it will handle the last word without an unnecessary if condition;
int len = sb.length();
int ctr = 0;
boolean finished = false;
StringBuffer sbx = new StringBuffer(30); // not may words longer than this unless you're dealing in chemicals in which case fix accordingly

while (!finished)
{
       
    if ((sb.charAt(ctr) != ' '))
    {sbx.append(sb.charAt(ctr));} // add a char into the temp stringbuffer
    else
    {

    String sWord = sbx.toString(); // it must be a word becuase we hit whitespace
   
    if(!ha.containsKey(sWord)) // if there's NOT a mapping for the word as a key
    {
    ha.put(sWord, 1);
    sbx.delete(0, sbx.length());
    }
    else //the word is in the map already
    {
    
    Integer objectInteger = (Integer) ha.get(sWord);
    int counter = objectInteger.intValue();
    counter += 1;
    ha.put(sWord, counter);
    sbx.delete(0, sbx.length());
    }

    }
    ctr++;
    if (ctr == len)
    {finished = true;}
}
return ha;
}

You’ll note that the expression

ha.put(sWord, 1);
sbx.delete(0, sbx.length());

has a definite correspondence with its partner in crime

ha.put(sWord, ctr);
sbx.delete(0, sbx.length());

and its certainly one that needs to be flagged as a contender for becoming a method in its own right. Moreover in spotting this correspondence between the two snippets of code we have the first inkling that we might also start to think in terms of extending the HashMap class as well, quite possibly in a number of directions. Certainly the “put into hash/zero-length corresponding tmp variable” is a first class candidate for method re-engineering. It’s a very common occurrrence when working with char[] arrays in combination with collection objects of all sorts.

Fast (StringBuffer-centric) location of substrings

February 20, 2010 Leave a comment

Over the past few days I’ve extended and refactored my StringBuffer-oriented fast String search classes a little. I’ve got to say that the performance improvement over regexps in dissecting large String objects (and we’re talking gigs of Strings that I’m playing with here) is paying significant dividends. Also, constant refactoring is a paradigm I’ve paid much attention to over the years, there’s no such thing in my book as “perfect code”, there is only code which exposes itself as being in less need of improvement than other code. One piece of code that was high on my hit list was the locateStringContents method, which was ripe for extension to be used to locate substrings and not just single chars. The first thing that struck me on revisiting the method in question was that I had done something which I ought not to have done, which was to call a function unnecessarily in a for loop (here’s the code down to the offending line)

public int[] locateStringContents(StringBuffer sb, StringBuffer sbCheckable)   {
    List l = new ArrayList();
    for (int i = 0; i < sb.length(); i++)
[..]

So I fixed that so that it now looks like this (and I had compounded the error in an inner loop with a similar call to sbCheckable.length so I fixed that at the point where it would only be called once at the same time).

public int[] locateStringContents(StringBuffer sb, StringBuffer sbCheckable)    {
    List l = new ArrayList();
    int iCheckSize = sbCheckable.length();
    int iSbSize = sb.length();
    for (int i = 0; i < iSbSize; i++)
[..]

These look like small and superficial changes but when iterated over several hundred million times the time-savings can be significant and it was important to pick up whilst optimising my code to make it bleed. I’m now systematically doing similar clean-ups on a number of methods which contain similar lapses of coding. It’s all too easily done, particularly when working against tight timescales, and subsequent revisitation of code and classes which are substantial consumers of processor resources can often pay big dividends.

After the cleanup I turned my attention to what needed to be done to do some fast substring location. I approached this initially from a linear point of view with a nested looping construct which matched character for character but the end result was frankly horrible to look at as a piece of code and moreover inefficient by comparison to my subsequent approach to the problem which was to deal with the problem pragmatically and eradicate the complex jungle of nested loops and conditional evaluations with a boolean method, containsMatch, which receives a slice of the right amount of String to examine on matching the first char and then evaluates the slice by comparison to the comparator sequence.


/**
*Method to expose all int start positions of a substring within a greater String, returning an int array containing their
* locations
**/
public int[] locateSubstrings(StringBuffer sb, StringBuffer sbCheckable)
{
List l = new ArrayList();
int iCheckSize = sbCheckable.length();
int iSbSize = sb.length() - iCheckSize; // since we don't want array overruns

for (int i = 0; i < iSbSize; i++)
{
if (containsMatch(sb.subSequence(i, i + iCheckSize),sbCheckable))
{
l.add(i);
continue;
}
}
return toIntArray(l);

 }

/**
*Method to determine whether a sliced CharSequence cs, derived from a String to evaluate, is equivalent to a comparative StringBuffer
**/
public boolean containsMatch(CharSequence cs, StringBuffer sCheckable)
{
int csLength = cs.length();
for (int i = 0; i < csLength; i++)
{
if (cs.charAt(i) != sCheckable.charAt(i)) { return false;} // fail if any one of the chars does not match
}
return true; // didn't fail ergo must be a match
}

public int[] toIntArray(List integerList) {
int iSize = integerList.size();
int[] intArray = new int[iSize];

for (int i = 0; i < iSize; i++) {
intArray[i] = integerList.get(i);
}
return intArray;
}

Reducing the Necronomicon to numbers….

February 10, 2010 2 comments

Abdul Alhazred, in his crazed and infinite wisdom, besides being the author of the Necronomicon was also something of a part time numerologist. Between inscribing in blood (derived from virgins, a scarce commodity in the circles in which Abdul moved) the memorial pages which have driven many beyond the brink of sanity, he spent much time in contemplating how one might reductively turn language into numbers. One method which he overlooked was that proposed by Mr Donovan K. Loucks. It appeared as a comment to an earlier post on “Counting vowels… the faster AND dynamically reusable way…” and in essence what we need to do here is take a given String, remove non-vowels, i.e. consonants, punctuation and whitespace, replace the vowels with “+1” and suffix a “0” to the end so that the final number to be added will be a ten.

The replacement is easy enough using the locateStringContents method set out earlier, which we’ll bolt into a putative new class called LovecraftEvaluator, along with the boringly necessary toIntArray method which appears more times on this blog than Cthulhu has yawned in the sleeping millenia. We don’t actually need to replace anything at all, given that we are just interested in the vowels, so we’ll just create a new StringBuffer and for each vowel append a “+1” and StringBuffer.join a “0” once this is complete. Then we have to calculate the sum of these numbers….


String test = "The Necronomicon does not make for light reading. At all.";
LovecraftEvaluator le = new LovecraftEvaluator();
String vowels = "aeiouAEIOU";
int[] positions = le.locateStringContents(new StringBuffer(test), new StringBuffer(vowels));

int ctr = 0;
StringBuffer sOut = new StringBuffer();
for (int i = 0; i < positions.length; i++)
{
sOut.append("+1");
}
sOut.append(0);

If we print the contents of sOut we’ll get this:

+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+1+10

Now we’re onto phase 2, evaluating the String mathematically. Java still doesn’t have closures or lambdas, even though they’ve been on the request list since the early middle ages. Apparently they’re proposed for JDK 7 but I won’t be holding my breath; something we’ve taken for granted since year 1 in Ruby is still unimplemented in Java and likely to remain so in the near future.

Aside from the obvious occult usages of being able to reduce Strings to their numerical values, there is a serious method behind this which will allow you to calculate a mathematically expressed String effortlessly. I won’t bore you overmuch with details but it’s easy enough to do by casting your Stringbuffer to a char[] array and then walking the array as necessary and calculating the outcome as you go along. This method with its helpers should allow you to process most mathematically expressed Strings. Obviously if you’re working with floats etc, you would have to overload and extend the metaphor, but generally what’s needed is not much beyond the skeletal framework we have here (you’ll need to handle decimal points being the main difference). Typically your method for the LovecraftEvaluator class should look something like this


public int sbMathEvaluator(StringBuffer sb)
    {
    int result = 0;
    char operator = ' ';
    String tmp = "";

    char[] aEval = sb.toString().toCharArray();
    int last = aEval.length;

    for (int i = 0; i < aEval.length; i++) // each character we have....
    {
    if(!isInt(aEval[i])) // if it's not an int we'll change the value of the operator and go back to the top of the loop
    {
    operator = aEval[i];
    tmp = ""; // reset tmp
    continue;
    }

// if we got here it's an int (at least 1) & there may be more so don't calc just yet....
    tmp += Character.toString(aEval[i]);

    if(i + 1 == last){ // we really don't want an array exception here so this is the final calc on the last number
    result = calc(result,operator,toInt(tmp))    ;
        break;}
    if(!isInt(aEval[i + 1])) // if the next char's an operator, recalculate
    {
    result = calc(result,operator,toInt(tmp))    ;
    }

    }

    return result;
    }

The helpers you will need are as follows. Firstly the calc method which takes the running total, the operation to perform, and the value to apply to it as arguments. Although we’re just doing addition it doesn’t hurt to make it relatively generic. You could add other operators etc, but let’s keep it relatively simple for illustrative purposes…


public int calc(int result, char operator, int val)
    {
    if (operator == '+')
    {return result + val;}
    if (operator == '-')
    {return result - val;}
    if (operator == '*')
    {return result * val;}
    if (operator == '/')
    {return result / val;}
// no op possible? just throw back the result unmutated
    return result;
    }

And we’ll need a few little syntax helpers. Feel free to rip these and rep them as necessary, they’re for convenience mainly.


   public int toInt(String tmp)
    {
    return Integer.parseInt(tmp);
    }

    public int toInt(char c)
    {
    return Character.getNumericValue(c);
    }

    public boolean isInt(char c)
    {
    int i = Character.getNumericValue(c);
    if (i >= 0){        return true;}
    return false;
    }

public int[] toIntArray(List<Integer> integerList) {
int[] intArray = new int[integerList.size()];
for (int i = 0; i < integerList.size(); i++) {
intArray[i] = integerList.get(i);
}
return intArray;
}

As an afterthought, if you're going to reduce something of the length of teh Necronomicon to its numerical value, you'd probably be advised to add the earlier countStringContents method to the class also and set the capacity of your target replacement StringBuffer to twice that number + a few e.g.

String test = "The complete and unexpurgated contents of the Necronomicon.... etc";
LovecraftEvaluator le = new LovecraftEvaluator();
String vowels = "aeiouAEIOU";
int[] positions = le.locateStringContents(new StringBuffer(test), new StringBuffer(vowels));
int numVowels = le.countStringContents(test,vowels);
numvowels = (numVowels * 2) + 50;

int ctr = 0;
StringBuffer sOut = new StringBuffer();
sOut.ensureCapacity(numVowels);
for (int i = 0; i < positions.length; i++)
{
sOut.append("+1");
}
sOut.append(0);