Archive

Posts Tagged ‘ArrayList’

HashMap->ArrayList breakdown and reconstitution of documents in Java

March 2, 2010 Leave a comment

Given the general utility of HashMaps with ArrayLists as values, as illustrated in the earlier post HashMap key -> ArrayList, it’s fairly obvious that we can in most cases (actually all, but I’m open to persuasion that somewhere an exception may exist) represent a document, such as a short story by Lord Dunsany, a play by William Shakespeare, an XML or Soap document, in fact any document in which repetitious tokens occur or are likely to occur, as a set of keys indicating the tokens, and an arraylist of their relative positions in the document.

Let’s as an example take a block of text such as that found in Churchill’s speech of November 10, 1942 in the wake of Alexander and Montgomery repelling Rommel’s thrust at Alexandria: “Now this is not the end. It is not even the beginning of the end. But it is, perhaps, the end of the beginning.” It’s short, sweet, and contains the criteria we need for a good example, notably multiple repetitive tokens. For the purposes of this examination we’ll totally neuter the punctuation and capitalisation so that we just have tokens:now this is not the end it is not even the beginning of the end but it is perhaps the end of the beginning. At this juncture it is still recognisable as Churchill’s text. Let’s turn this into a reproducible hashmap in which each unique word is a key to a value set which is an ArrayList containing instances of the positions of the word in the text which we supplied it with. I won’t bore you with repeating the addArrayListItem method (it’s in the post immediately below this one…) and the printOutHashMap method is somewhat self-evident but included for compeleteness.

public void someBoringMethodSig()
{
String inString = "now this is not the end it is not even the beginning of the end ";
inString += "but it is perhaps the end of the beginning";

HashMap ha = new HashMap();

String[] tok = inString.split(" ");
int k = tok.length;

 for (int i = 0; i< k; i++)
        {
        addArrayListItem(ha,tok[i],i );
        }

    printOutHashMap(ha);
   // assemble(ha);
    }
public void printOutHashMap(HashMap hamp)
{
Collection c = hamp.values();
Collection b = hamp.keySet();
Iterator itr = c.iterator();
Iterator wd = b.iterator();
    System.out.println("HASHMAP CONTENTS");
    System.out.println("================");
while(itr.hasNext()){
System.out.print("Key: >>" + wd.next() + "<< Values: " );//+ itr.next()); }
    ArrayList x = (ArrayList) itr.next();
    Iterator lit = x.iterator();
    while (lit.hasNext())
    {System.out.print(lit.next() + " ");

    }
    System.out.println("");
}

}

This will give us an output which looks very much like

HASHMAP CONTENTS
================
Key: >>not<< Values: 3 8
Key: >>of<< Values: 12 21
Key: >>but<< Values: 15
Key: >>is<< Values: 2 7 17
Key: >>beginning<< Values: 11 23
Key: >>it<< Values: 6 16
Key: >>now<< Values: 0
Key: >>even<< Values: 9
Key: >>the<< Values: 4 10 13 19 22
Key: >>perhaps<< Values: 18
Key: >>this<< Values: 1
Key: >>end<< Values: 5 14 20

We can obviously reassemble this by recourse to the values, either in forward (starting at 0) or in reverse order (finding the max int value) and reassembling in a decremental loop. The implications of this are significant when one considers the potential applications of this sort of document treatment from a number of perspectives, notably IT security, improved data transmissibility by reduction, cryptography (where one would not transmit the keys but a serializable numeric indicator to a separately transmitted (or held) common dictionary, etc, etc.

Generally the reassembly process looks something like the following :


public void assemble(HashMap hamp)
{
Collection c = hamp.values();
Collection b = hamp.keySet();

int posCtr = 0;

int max = 0;

Iterator xAll = hamp.values().iterator();
while(xAll.hasNext()){
  ArrayList xxx = (ArrayList) xAll.next();
  max += xxx.size();
  }

String stOut = "";
boolean unfinished = true;

while (unfinished)
{
Iterator itr = c.iterator();
Iterator wd = b.iterator();
start:
while(unfinished){

String tmp = (String) wd.next() ; // next key
ArrayList x = (ArrayList) itr.next(); // next ArrayList
Iterator lit = x.iterator(); // next ArrayList iterator

while (lit.hasNext())
{
//System.out.println(lit.next());
if((Integer.valueOf((Integer) lit.next()) == posCtr )) // if it matches the positional counter then this is the word we want...
{
stOut = stOut + " " + tmp ; // add the word in tmp to the output String
posCtr++;

wd = b.iterator(); //don't forget to reset the iterators to the beginning!
itr = c.iterator();
if(posCtr == max) // last word to deal with....
{
System.out.println("FINISHED");
unfinished = false;}
break start; // otherwise goto the start point, rinse, repeat (irl this would be handled more elegantly, illustrative code here!)

}

}

}
}
System.out.println("ASSEMBLED: " + stOut);
}

This can obviously be improved in a number of ways, particularly moving from a String-based paradigm to one in which char[] array representations replace Strings in the HashMap.

Advertisements

HashMap key -> ArrayList

March 1, 2010 Leave a comment

In the earlier and gentle introduction article on HashMaps, HashMaps and how to bend them to your will, I looked at how we might use a HashMap to do vocabulary count assessment of a passed String document. HashMaps are, as we have seen, ostensibly dumb key-value sets. I say ostensibly: the given key or value does not however have to be a single item as HashMap effects storage in both value and key as Object types. From a programmatic perspective this realisation has significant implications.

Let’s look at how we might use a keyed HashMap paired with values expressed as ArrayLists. The following methods will give you an idea of how to put int values in per key, the first method using a single value, the second handling an array of values. Obviously these methods are ripe for extension, exception handling and overloading but they are stripped down here from an illustrative point of view:


public HashMap<String, ArrayList> addArrayListItem(HashMap<String, ArrayList> hamp,String sKey,int value )
{
    if(!hamp.containsKey(sKey)) // no key found?
    {
    ArrayList lx = new ArrayList();
    lx.add(value);
    hamp.put(sKey, lx);
    }
    else // the key pre-exists so we just add the (new) value to the existing arraylist
    {
    ArrayList lx = hamp.get(sKey)    ;
    lx.add(value);
    hamp.put(sKey, lx);
    }
    return hamp;
}

public HashMap<String, ArrayList> addArrayListItems(HashMap<String, ArrayList> hamp,String sKey,int[] value )
{
    int iSize = value.length;
    ArrayList iList = (ArrayList) Arrays.asList(value);

    if(!hamp.containsKey(sKey)) // no key found? // create key and add list
    {
    ArrayList lx = new ArrayList();
    lx.addAll(iList);
    hamp.put(sKey, lx);
    }
    else // the key pre-exists so we just add the array of values to the existing list
    {
    ArrayList lx = hamp.get(sKey)    ;
    lx.addAll(iList);
    hamp.put(sKey, lx);
    }

    return hamp;
}

The getter method to retrieve by key is even easier (although again this method is open to extension, overloading, exception handling etc):

public ArrayList<int[]> getItems(HashMap<String, ArrayList> hamp,String sKey)
{
return hamp.get(sKey);
}

We can moreover use an ArrayList in an arbitrary, or, indeed, in a very carefully structured, fashion, e.g. to store not only integers but Strings, booleans, whatever, in order to achieve our purpose. Again, an untyped ArrayList is Object type agnostic as the following snippet of dummy code illustrates.

        ArrayList al = new ArrayList();
        int i = 1;
        String s = "Monty Python";
        boolean isVerySilly = true;
        al.add(i);
        al.add(s);
        al.add(isVerySilly);
        Iterator it = al.iterator();
        while (it.hasNext())
        {
            System.out.println(it.next());
        }

The realisation of this has similarly significant implications for the intelligent programmer. Exploiting the HashMap/ArrayList (or similar collection type) combination allows you to do some very powerful things with the data coming your way.