GeoLight というのはIP データベースのひとつであるGeoIPのツール・データベースのうちフリーなモジュール・データベース
GeoLite databasesのASN DatabaseをVoltDBに入れて使い勝手をよくできるか試してみた。
- ダウンロード
$ wget http://download.maxmind.com/download/geoip/database/asnum/GeoIPASNum2.zip $ unzip GeoIPASNum2.zip
create table ASN ( pidx bigint not null, ipstart bigint not null, ipend bigint not null, asninfo varchar(1024) ); PARTITION TABLE ASN ON COLUMN pidx; create index idx_asn_1 on ASN(ipstart); create index idx_asn_2 on ASN(ipend);
$ awk -F, '{ printf "\"%d\",\"%d\",\"%d\",%s\n",$1/256/256,$1,$2,$3 }' FPAT='([^,]+)|("[^"]+")' GeoIPASNum2.csv > loading.csv $ ~/voltdb/bin/csvloader -f loading.csv ASN
- ラフなキーの割にパーティション分散もわるくなさそう
$ ~/voltdb/bin/sqlcmd --query="exec @Statistics table 0" | grep ASN 1506056466013 0 test24 0 0 ASN PersistentTable 32396 2048 1044 1911 NULL 0 1506056466013 0 test24 1 1 ASN PersistentTable 32677 2048 1053 1936 NULL 0 1506056466012 0 test24 2 2 ASN PersistentTable 33973 2048 1094 2008 NULL 0 1506056466012 0 test24 3 3 ASN PersistentTable 34213 2048 1102 2009 NULL 0 1506056466013 0 test24 4 4 ASN PersistentTable 32722 2048 1054 1925 NULL 0 1506056466013 0 test24 5 5 ASN PersistentTable 31502 2048 1015 1851 NULL 0 1506056466012 0 test24 6 6 ASN PersistentTable 32939 2048 1061 1946 NULL 0 1506056466014 0 test24 7 7 ASN PersistentTable 32569 2048 1049 1921 NULL 0
- IP指定して検索できた*1
3> select asninfo from ASN where ipstart<=inet_aton('104.244.42.66') and ipend>=inet_aton('104.244.42.66'); ASNINFO --------------------- AS13414 Twitter Inc. (Returned 1 rows in 0.15s) 4> select asninfo from ASN where pidx = inet_aton('104.244.42.66')/256/256 and ipstart<=inet_aton('104.244.42.66') and ipend>=inet_aton('104.244.42.66'); ASNINFO --------------------- AS13414 Twitter Inc.
- ところがパーティションは効いてない?( RECEIVE FROM ALL PARTITIONSとある、sqlcmdだからか(@AdHocを発行してると思うので))
13> explain select asninfo from asn where pidx=?/256/256 and ipstart<=? and ipend>=?; EXECUTION_PLAN ------------------------------- RETURN RESULTS TO STORED PROCEDURE RECEIVE FROM ALL PARTITIONS SEND PARTITION RESULTS TO COORDINATOR INDEX SCAN of "ASN" using "IDX_ASN_1" reverse range-scan covering from (IPSTART <= ?1) to end, filter by (((column#0 = ((?0 / 256) / 256)) AND (column#2 >= ?2)) AND (NOT (column#1 IS NULL))) (Returned 1 rows in 0.01s) 14> explain select asninfo from asn where ipstart<=? and ipend>=?; EXECUTION_PLAN ------------------------------- RETURN RESULTS TO STORED PROCEDURE RECEIVE FROM ALL PARTITIONS SEND PARTITION RESULTS TO COORDINATOR INDEX SCAN of "ASN" using "IDX_ASN_1" reverse range-scan covering from (IPSTART <= ?0) to end, filter by ((column#2 >= ?1) AND (NOT (column#1 IS NULL))) (Returned 1 rows in 0.00s)
- procedureまで作っておいて初めて実用的かも
create procedure asn_ipv4 as select asninfo from asn where pidx=?/256/256 and ipstart<=? and ipend>=?;
Simple形式だとパラメータ3つ同じものを指定することになり格好悪い・面倒
使い勝手のためにはJavaで用意か
package app; import org.voltdb.*; import java.net.*; public class GetAsn extends VoltProcedure { final SQLStmt selectAsn = new SQLStmt("select asninfo from asn where pidx=? and ipstart<=? and ipend>=?"); // procedure entry public VoltTable[] run(String ipaddr) { // リターンするテーブル構造 final VoltTable ret = new VoltTable( new VoltTable.ColumnInfo("asn", VoltType.STRING)); // ipaddrを数値に変換 long ipval; try { InetAddress addr = InetAddress.getByName(ipaddr); byte[] b = addr.getAddress(); ipval = (((long) (b[0]&0xff))<<24) | (((long) (b[1]&0xff))<<16) | (((long) (b[2]&0xff))<<8) | ((long) (b[3]&0xff)); } catch (UnknownHostException e) { throw new VoltAbortException("Couldnt get InetAddress.\n" + e.toString()); } // do search voltQueueSQL(selectAsn, ipval/256/256, ipval, ipval); VoltTable selectResult = voltExecuteSQL()[0]; selectResult.resetRowPosition(); while(selectResult.advanceRow()){ String asninfo = selectResult.getString("asninfo"); ret.add(selectResult.cloneRow()); } return new VoltTable[] { ret }; } }
javaは符号なし変数が面倒、InetAddress.getByNameつかっちゃったからhostnameでもいけそう
$ sqlcmd SQL Command :: localhost:21212 1> load classes test.jar; Command succeeded. 2> exec GetAsn 183.79.249.124 ; asn -------------------- AS24572 Yahoo Japan 3> exec GetAsn www.yahoo.co.jp ; asn ------------------------------- AS23816 Yahoo Japan Corporation (Returned 1 rows in 0.01s)
他のツールと併用しない、単に辞書引きツールだけだったらsqliteでいいかな…
スピードのアドバンテージはまたの機会にはかる
*1:この例のIPは目に付いた通信先の多かったものを適当に選択