
sleuth
Enthusiast
Dec 11, 2000, 11:20 PM
Post #2 of 8
(60314 views)
|
Ok, even though no one participated in the quiz, since about 11 people at least looked at it, I'll give an answer. So, first step you must take is open the data base that is going to merge with the other data base you have, usually the smaller of the two. In this data base, I'll call "db-1A", contains all the countries in the world with there ISO country code in the left field, in this format, US|United States| GB|Great Britan| That's pretty simple. Now the data base (db-2) you want to merge this data with only contains various cities around the world with there country at the left, like so, London|Great Britan| Los Angeles|United States| Hong Kong|China| Now, You are sure that the country at the right in db-2 is spelled correctly so that it will successfully match the second field in db-1, even if there isn't a match for it, it WILL NOT effect the overall process, the only thing that will happen is the third field in db-2 that you are looking to place the ISO country code from db-1 will be blank. So no big catastrophe there. Your end result is the following, London|Great Britan|GB| Los Angeles|United States|US| You can either write the merged information back to db-2 which is the data that you added the third field to, or you could create a new data base with the information to protect the original from being lost, which is the wiser choice in case you code doesn't produce the results you expected. Ok, now for the code,
# First you open the data base of Iso codes to read the first field, which is the field desired to be carried over to db-2. open(ISO_Codes, "<db-1.db"); while(<ISO_Codes>) { # read the file line by line, and split the line into fields, the pipe "|" is our delimiter. ($CODE, $COUNTRY)=split(/\|/, $_); # read the first and second field into a hash. $code{$COUNTRY} = "$CODE"; } close(ISO_Codes); #close the data base Now that's the first part of your program, what it does is it prepares the rest of the program for the data it is going to shortly need. The hash is the important part of this program, and the real lesson here, you have field2 (the country name) in db-1, and also in field 2 of db-2, and you want field one (the country code) in db-1 to carry over to a third field in db-2, knowing that, you want to use the country as an anchor to identify the appropriate iso code for that country. That's the purpose of the hash. $code{$COUNTRY} = "$CODE"; is how you can call on that anchor to get the country code for any country in db-1. Like so, If you print "$code{'United States'}"; the out put will be: US Since you assigned the data "US" (which is field one in db-1) to the name "United States", you can get the "value" of the hash "key" which in this case, was United States. Since we went through db-1 line by line and assigned the information in field1 to the name in field2, this applies of course, to all the records in that data base. So Now that your clear on how that works, we'll continue on, Now you want to create a third field in db-2 with it's country code, so lets,
open(cities, "<db-2.db"); while(<cities>){ chomp; # I'll use the same scalar as above, but the casing is different, so $COUNTRY won't effect this loop. ($city, $country)=split(/\|/, $_); push(@new,"$city|$country|$code{$country}|\n"); }close(cities); open(finished, ">db-3.db"); print finished @new; close(finished); That just merged the information, I started out by using chomp; , why? because the last field in each line is the field $country, so $country has a \n (newline) assigned to the end of it, unless I use chomp;. chomp by default uses $_ when you don't specify anything, so in a while loop, using chomp; is like writing chomp($_); And this way $country doesn't contain any newlines and won't mess with our new format we are trying to make, since we needed to use push, to create the array @new, with our new data if a newline had been assigned to $country by default, you would get London|Great Britan |GB| Los Angeles|United States |US| Instead of London|Great Britan|GB| Los Angeles|United States|US| It's one of those gotchas that sometimes is forgotten, and suddenly you get these strange files your script has created where it looks like the data is all over the place. Now that we pushed our new data into the array @new, we can now either write it to a new file, like I chose, or write it back to db-2, I choose to create a new file, well, I said why already. One thing however, this method is case sensitive, and I specifically asked for case insensitive. To do that is quite simple however if your familiar with lc(); the built in perl function for converting scalars, hashes, etc into lower case format. So if we make sure that the name if the hash keys are all lower case when reading db-1, and make sure that $country is lower case in the while loop of db-2, you will have a case-insensitive way of matching. The final code with the case insensitive matching is as follows:
# First you open the data base of Iso codes to read the first field, which is the field desired to be carried over to db-2. open(ISO_Codes, "<db-1.db"); while(<ISO_Codes>) { # read the file line by line, and split the line into fields, the pipe "|" is our delimiter. ($CODE, $COUNTRY)=split(/\|/, $_); # read the first and second field into a hash. $COUNTRY = lc($COUNTRY); $code{$COUNTRY} = "$CODE"; } close(ISO_Codes); #close the data base open(cities, "<db-2.db"); while(<cities>){ chomp; # I'll use the same scalar as above, but the casing is different, so $COUNTRY won't effect this loop. ($city, $country)=split(/\|/, $_); $lcCountry = lc($country); push(@new,"$city|$country|$code{$lcCountry}|\n"); }close(cities); open(finished, ">db-3.db"); print finished @new; close(finished); I hope you have learned something, if you have found your self a bit lost due to any miss understandings about hashes (associative arrays), then please read: http://tlc.perlarchive.com/0012/02.shtml To learn all about hashes, they are very useful, the Administrator (Jasmine) of this site herself has written that. if you have any questions, just post or e-mail me. Sleuth
|