Home > Java > Locating token positions in StringBuffers

Locating token positions in StringBuffers

This might be useful to someone along the way; I certainly have practical uses for it of my own for fast exact matching & location of long tokens in StringBuffers. It’s probably up for a certain amount of optimisation since it double scans the searchable base StringBuffer object, on the obverse side in certain situations this is probably more efficient than trying to do it all in one bite since it builds two List objects, a preliminary¬†one which contains possible matches, and from this list of possible matches it then refines its search to provide a definitive List of exact matches – the longer the length of the search token the more efficient this ultimately is. The method could also be expanded or enhanced to do regex style pattern matching and case insensitive matching.

/**
* Method for searching the StringBuffer sb to identify the int locations of instances of the contents of StringBuffer sbx
*
* Returns a list of Integer positions of all occurrences of an exact match of the passed StringBuffer sbx in
*  StringBuffer sb
*/
public List<Integer> findTokens(StringBuffer sb, StringBuffer sbx)
    {
            int ctr = 0;
            int len = sb.length();
            int k = sbx.length();
            char tokenStart = sbx.charAt(0);
            char tokenEnd = sbx.charAt(k - 1);

            List possibles = new ArrayList();
            for (int i = 0; i < (len - (k - 1)); i++) {
            if((sb.charAt(i) == tokenStart) && (sb.charAt(i + (k - 1)) == tokenEnd))
            {
                possibles.add(i);
            }
            }

            List definites = new ArrayList();
            Iterator it = possibles.iterator();
            while (it.hasNext())
            {
            int start = (Integer) it.next();
            boolean OK = true;
            int tokCtr = 0;
            for (int i = start; i < start + (k - 1); i++) {
                if(sb.charAt(i) != sbx.charAt(tokCtr))
                {OK = false;} // probably ought to break/label here if you want to make it bleed (I don't, need the trace!)

                tokCtr++;
                }
               if(OK) // don't add if not ok!
               {
                    definites.add(start);
                }
            }
            return definites;
     }

Advertisements
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: