User Name
Password

Go Back   Planetarion Forums > Non Planetarion Discussions > Programming and Discussion
Register FAQ Members List Calendar Arcade Today's Posts

Reply
Thread Tools Display Modes
Unread 22 Nov 2002, 00:17   #1
Lord211
Guest
 
Posts: n/a
Hashing algorithms

I need 2 fairly quick and simple, and easy to write in java, hashing algorithms, to hash a string into an int, and im too lazy to sift through hundreds of google pages.

any help you could give would be appreciated
  Reply With Quote
Unread 22 Nov 2002, 02:01   #2
MT
/dev/zero
Retired Mod
 
MT's Avatar
 
Join Date: May 2000
Posts: 415
MT is an unknown quantity at this point
could i suggest a book on data algorithms?

from my current favourite book ..

this is for a CBHT, choose m to be a prime s.t. you would expect a load factor of about 0.6, c[i] being the ith char of the word

hash(word) = sigma(c[i]*i,0,length(word)) % m

.. its ahrd to do sigma signs, so that means the sum of (c[i]*i) from i = 0 to i < length(word)

or in java, as I'm a softy,

Code:
static int hash(String s) throws Exception {
  int hash = 0;
  if (s==null) throw new IllegalArgumentException();
  for (int i = 0;i < s.length();i++)
    hash+=(i*s.charAt(i));
  return hash % PRIME;
}
__________________
#linux : Home of Genius

<idimmu> ok i was chained to a desk with this oriental dude
MT is offline   Reply With Quote
Unread 22 Nov 2002, 16:49   #3
Lord211
Guest
 
Posts: n/a
ah magic, i assume i would substitute prime with my table size, as its for a hashing table.

cheers
  Reply With Quote
Unread 22 Nov 2002, 17:06   #4
Structural Integrity
Rawr rawr
 
Structural Integrity's Avatar
 
Join Date: Dec 2000
Location: Upside down
Posts: 5,300
Structural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriend
What is hashing actually for? I never really understood (I even had problems understanding the concept of hash-tables in mIRC, but that's besides the point).

Can you give examples in which situation one would like to use hashing, why and what you accomplish with it?
__________________
"Yay"
Structural Integrity is offline   Reply With Quote
Unread 22 Nov 2002, 20:47   #5
Nodrog
Registered User
 
Join Date: Jun 2000
Posts: 8,476
Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.
Quote:
Originally posted by Structural Integrity
What is hashing actually for? I never really understood (I even had problems understanding the concept of hash-tables in mIRC, but that's besides the point).

Can you give examples in which situation one would like to use hashing, why and what you accomplish with it?
A hashing algorithm 'converts' an obect into an integer, for storing in a hash table.

A hash table is a way to store data (like an array/list/tree/whatever), but a lot more efficient, as it has constant time on all operations. Most other data structures have certain operations for which they are slow (eg, finding an object in an unsorted array takes linear time, insertations into a binary tree take log(n) time, searching a binary tree takes log(n) time, etc etc etc).

A hash table however takes constant time for all operations. It is normally implemented using a large array (for example, you could use a hash table of size 1000 to store strings). Every string has an integer representation, which is calculated using a hashing algortithm (so the string "what" might be 997, while the string "lol" might be 445). This number is the strings 'key' in the array. The string "what" would be stored at the 997th position in the array (array[997]), while the string "lol" would be stored in array[445]. When you wanted to look up a string in the array, you would first find the strings key, and then look at that position in the array.



This is a horrible explanation, and it gets more complex because different strings may have the same key, and so forth. Do a search on google for a better one.

Last edited by Nodrog; 23 Nov 2002 at 00:02.
Nodrog is offline   Reply With Quote
Unread 22 Nov 2002, 20:50   #6
Lord211
Guest
 
Posts: n/a
they are used in places when you want a quick lookup table.
You can perform a hashing algorithm on a string, to get an index for it instead of scanning an array for it which aparently is much quicker.

eg, you could have an array of 1000 elements long, and you want to find a certain string which is held in one of these elements, it could be in the last one, so instead of scanning through the array to get it, you just perform a quick hash on it, and jump to its place.

Ofc there are problems such as one element already being at that place etc, but there are ways around this and it still proves much quicker than just a regular search algorithm.

the only practicle aplication i can think of where this is used is a compilers table of reserved words.

Hope this helps
  Reply With Quote
Unread 22 Nov 2002, 20:52   #7
Lord211
Guest
 
Posts: n/a
beat me to it
  Reply With Quote
Unread 22 Nov 2002, 21:32   #8
meglamaniac
Born Sinful
 
meglamaniac's Avatar
 
Join Date: Nov 2000
Location: Loughborough, UK
Posts: 4,059
meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.
SI - a very basic example of using hash tables is in the numerous mIRC lookup scripts (ie. !seen <nick>).

I made one myself about a year ago which was above average in that it would search for other matches to the requested nick's address.
The first time round it used a text file.
This was a very bad idea since every time someone spoke, joined, left, etc the text file had to be searched and updated, and then people looking up nicks caused it to be searched as well.
Bear in mind this was on a P200, and I quite often went in creators hour then - the machine would practically fall over (it did, on more than one occasion) simply because constantly searching and writing in a 3Mb text file is too much.
I then re-wrote the storage system (kept the "front end" of the script the same) to use a hash table. There was very little difference in how the information was stored (something along the lines of nick:address:$ctime:<various parameters depending on what it was the user last did>. The hash table was dumped to a file every 5 minutes (in case the machine crashed) and was dumped on exit. It was reloaded from file when mIRC opened.
The search time went from something like 15 seconds to complete a lookup with 1000 nicks in the text file to about 2 seconds to complete a lookup with 1000 nicks in the hash table.

This is probably not a great example, but its much easier to store data like this where you don't know the length of each field, never mind each line, or how many entries you have like this - and it's fast.
I know mIRC hash tables aren't exactly 'real' either, but there you go.

__________________
Worth dying for. Worth killing for. Worth going to hell for. Amen.
meglamaniac is offline   Reply With Quote
Unread 22 Nov 2002, 21:39   #9
Dudels
Guest
 
Posts: n/a
/me puts on the 'to-play-with' list and goes in search of code examples
  Reply With Quote
Unread 22 Nov 2002, 21:49   #10
wu_trax
Guest
 
Posts: n/a
i used a hashmap once to represent a (very small) db-relation-table, because i needed that relation quite often (the <option>-s in a select-field) and didnt want to access the db all the time.
(i didnt write anything on my own though, i used java.util.HashMap )
  Reply With Quote
Unread 22 Nov 2002, 23:25   #11
MT
/dev/zero
Retired Mod
 
MT's Avatar
 
Join Date: May 2000
Posts: 415
MT is an unknown quantity at this point
Two different things ..

hashcodes are useful in two ways - generating indexs to hash tables, and equality tests on certain objects. If its prohibitivly expensive to directly compare two objects, you can first thin the field by seeing if their hashcodes are the same. If they are, then do the expensive operation, otherwise dont.

hashtables are a time efficient way of retrieving data, comparing objects (the ubiquitous contains() method) and storing data. The constraints on a hash table are that for maximum efficiency, the load factor (the number of elements contained in the table/the number of buckets in the table) must be kept below 1, optimally (to allow for growth) below 0.6. If the load factor becomes high (say 1000), then essentially you degrade to O(n) time performance.

Btw nod, I believe you meant O(n) time for linked list traversal.
__________________
#linux : Home of Genius

<idimmu> ok i was chained to a desk with this oriental dude
MT is offline   Reply With Quote
Unread 22 Nov 2002, 23:38   #12
meglamaniac
Born Sinful
 
meglamaniac's Avatar
 
Join Date: Nov 2000
Location: Loughborough, UK
Posts: 4,059
meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.
My bad then (tut - what a bad example...).

Erm - one more thing.
If we now have hashing algorithms and hash tables, the what on earth do you call the type of hash used to secure passwods and other sensitive 1 way data (MD5 and others of thier ilk)?

__________________
Worth dying for. Worth killing for. Worth going to hell for. Amen.
meglamaniac is offline   Reply With Quote
Unread 22 Nov 2002, 23:58   #13
MT
/dev/zero
Retired Mod
 
MT's Avatar
 
Join Date: May 2000
Posts: 415
MT is an unknown quantity at this point
thats still a hash (a hash is just a routine that will convert one set of information into another set of information, and reliabley return the same info time after time (or similar)), hashtables and hashcodes are just yet another abstract data type that your lecturers will start to bang on about in a bit if you are unlucky.
__________________
#linux : Home of Genius

<idimmu> ok i was chained to a desk with this oriental dude
MT is offline   Reply With Quote
Unread 23 Nov 2002, 00:01   #14
meglamaniac
Born Sinful
 
meglamaniac's Avatar
 
Join Date: Nov 2000
Location: Loughborough, UK
Posts: 4,059
meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.meglamaniac has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.
They're allready banging on about set theory and bolean logic fs. It's not particularly hard, it's just that after an hour of either of them your brain has turned into cream cheese purely because it wanted to do something more exciting.

The irony is that either is not really necessary at this stage. Take Aberystwyth with it's excellent CS reputation - my mate is in his second year there and they had no maths modules at all in the first year.
The only reason we're doing all this maths at Loughborough is because some of the countries top Logic bods live in the CS department and like thier egos boosting.

*sigh*
__________________
Worth dying for. Worth killing for. Worth going to hell for. Amen.
meglamaniac is offline   Reply With Quote
Unread 23 Nov 2002, 02:28   #15
JetLinus
Friendly geek of GD :-/
 
JetLinus's Avatar
 
Join Date: Nov 2000
Location: On my metal roid
Posts: 923
JetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud ofJetLinus has much to be proud of
Quote:
Originally posted by meglamaniac
My bad then (tut - what a bad example...).

Erm - one more thing.
If we now have hashing algorithms and hash tables, the what on earth do you call the type of hash used to secure passwods and other sensitive 1 way data (MD5 and others of thier ilk)?

Ok, maybe it's just like this:
As someone said above, "what" might give you 997, while "lol" gives you 445.

Ok, now I got my password and the server has it, and I want to convince the server I got it right.
Let my password be "lol" for this example.
I compile a hash, the server does, and we both check... If the hashes are the same, chances are very high we got the same original string (ie. password).
The advantage of this method is, that we only had to transfer the hash-code. And apparently it's hard to guess the password just from its hash (one-way-function).
Well basically I think the server and client both add some random part to the pw and then apply MD5 to it (which should return pretty unique hashes), so that its no value if you get hold of the hashkey, as for the next login procedure you will need a new one...

Sorry if I didn't make myself clear...
__________________
[»] Entropy increases! :-/
JetLinus is offline   Reply With Quote
Unread 23 Nov 2002, 09:18   #16
queball
Ball
 
queball's Avatar
 
Join Date: Oct 2001
Posts: 4,410
queball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so littlequeball contributes so much and asks for so little
Quote:
Originally posted by MT
abstract data type
Do you just put in the word "abstract" because it makes the phrase complete?
queball is offline   Reply With Quote
Unread 23 Nov 2002, 12:12   #17
Structural Integrity
Rawr rawr
 
Structural Integrity's Avatar
 
Join Date: Dec 2000
Location: Upside down
Posts: 5,300
Structural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriendStructural Integrity needs a job and a girlfriend
So, when I understand correctly hash tables are primailly used to check the presence of, for example, a string. When you want to get the x'th entry it becomes less efficient than a simple array because you can have void entries in your hash-array.

When a duplicate hash index occurs it puts it on the same place in the array but with a sub index. (my little theory)
That explains why mIRC manual mentiones that a hash table becomes more efficient with more hash-array-fields (or something like that) because more strings can be assigned to an unique index that way and are not assigned a sub index.

\o/
__________________
"Yay"
Structural Integrity is offline   Reply With Quote
Unread 23 Nov 2002, 15:07   #18
Nodrog
Registered User
 
Join Date: Jun 2000
Posts: 8,476
Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.Nodrog has ascended to a higher existance and no longer needs rep points to prove the size of his e-penis.
Quote:
Originally posted by Structural Integrity
So, when I understand correctly hash tables are primailly used to check the presence of, for example, a string. When you want to get the x'th entry it becomes less efficient than a simple array because you can have void entries in your hash-array./
Yes, however a decent hash table implementation will dynamically resize the array if it gets too full. This is where the "load factor" comes into play - if a hash table has a load factor of 0.5, it will resize itself when it is 50% full.
Quote:
Originally posted by Structural Integrity
When a duplicate hash index occurs it puts it on the same place in the array but with a sub index. (my little theory)
Depends on the implelmentation. You can indeed make linked lists or whatever at indexes with multiple objects (this is called 'chaining'), but there's also other ways to do it. Probing is one of the major ones, and this is where all the shenanigans with prime numbers comes into play.

http://ciips.ee.uwa.edu.au/~morris/Y...sh_tables.html
Nodrog is offline   Reply With Quote
Reply



Forum Jump


All times are GMT +1. The time now is 21:30.


Powered by vBulletin® Version 3.8.1
Copyright ©2000 - 2024, Jelsoft Enterprises Ltd.
Copyright ©2002 - 2018