Locating token positions in StringBuffers
This might be useful to someone along the way; I certainly have practical uses for it of my own for fast exact matching & location of long tokens in StringBuffers. It’s probably up for a certain amount of optimisation since it double scans the searchable base StringBuffer object, on the obverse side in certain situations this is probably more efficient than trying to do it all in one bite since it builds two List objects, a preliminary one which contains possible matches, and from this list of possible matches it then refines its search to provide a definitive List of exact matches – the longer the length of the search token the more efficient this ultimately is. The method could also be expanded or enhanced to do regex style pattern matching and case insensitive matching.
/**
* Method for searching the StringBuffer sb to identify the int locations of instances of the contents of StringBuffer sbx
*
* Returns a list of Integer positions of all occurrences of an exact match of the passed StringBuffer sbx in
* StringBuffer sb
*/
public List<Integer> findTokens(StringBuffer sb, StringBuffer sbx)
{
int ctr = 0;
int len = sb.length();
int k = sbx.length();
char tokenStart = sbx.charAt(0);
char tokenEnd = sbx.charAt(k - 1);
List possibles = new ArrayList();
for (int i = 0; i < (len - (k - 1)); i++) {
if((sb.charAt(i) == tokenStart) && (sb.charAt(i + (k - 1)) == tokenEnd))
{
possibles.add(i);
}
}
List definites = new ArrayList();
Iterator it = possibles.iterator();
while (it.hasNext())
{
int start = (Integer) it.next();
boolean OK = true;
int tokCtr = 0;
for (int i = start; i < start + (k - 1); i++) {
if(sb.charAt(i) != sbx.charAt(tokCtr))
{OK = false;} // probably ought to break/label here if you want to make it bleed (I don't, need the trace!)
tokCtr++;
}
if(OK) // don't add if not ok!
{
definites.add(start);
}
}
return definites;
}