{"id":9061,"date":"2013-12-03T00:00:20","date_gmt":"2013-12-02T15:00:20","guid":{"rendered":"http:\/\/labs.gree.jp\/blog\/?p=9061"},"modified":"2013-12-18T15:07:05","modified_gmt":"2013-12-18T06:07:05","slug":"ruby-scripting-in-hive-query-language-2","status":"publish","type":"post","link":"https:\/\/labs.gree.jp\/blog\/2013\/12\/9061\/","title":{"rendered":"Ruby scripting in Hive Query Language"},"content":{"rendered":"<p>\u3053\u3093\u306b\u3061\u306f\u3002Web Game\u4e8b\u696d\u7d71\u62ec\u672c\u90e8 \u30c7\u30fc\u30bf\u57fa\u76e4\u30c1\u30fc\u30e0\u306e lan \u3067\u3059\u3002<\/p>\n<p>Advent Calendar 3\u65e5\u76ee\u306e\u4eca\u65e5\u306f\u3001Hadoop\u306e\u4e0a\u306b\u4e57\u308b\u30c7\u30fc\u30bf\u30a6\u30a7\u30a2\u30cf\u30a6\u30b9\u3067\u3042\u308bApache Hive\u306b\u3064\u3044\u3066\u3001\u8a71\u3092\u3055\u305b\u3066\u9802\u304d\u305f\u3044\u3068\u601d\u3044\u307e\u3059\u3002\u30b0\u30ea\u30fc\u3067\u306f\u3001Hadoop\u3092\u30ec\u30dd\u30fc\u30c6\u30a3\u30f3\u30b0\u3084\u5927\u898f\u6a21\u30d0\u30c3\u30c1\u51e6\u7406\u306a\u3069\u306b\u4f7f\u3063\u3066\u3044\u307e\u3059\u3002\u73fe\u5728\u306f\u3001Apache Hive 0.12.0\u7248\u3092\u5c0e\u5165\u3057\u3066\u304a\u308a\u307e\u3059\u3002OSS\u30b3\u30df\u30e5\u30cb\u30c6\u30a3\u304b\u3089\u6700\u65b0\u306e\u6210\u679c\u3092\u3067\u304d\u308b\u3060\u3051\u65e9\u304f\u53d6\u308a\u8fbc\u3093\u3067\u3044\u304d\u305f\u3044\u3068\u8003\u3048\u3066\u3044\u308b\u304b\u3089\u3067\u3059\u3002<\/p>\n<p>\u30ec\u30dd\u30fc\u30c8\u306a\u3069\u306e\u305f\u3081\u306b\u793e\u5185\u7528\u306eUDF\u30e9\u30a4\u30d6\u30e9\u30ea\u3082\u4f5c\u3063\u3066\u3044\u307e\u3059\u3002\u3057\u304b\u3057\u3001\u305d\u308c\u3067\u3082\u30b5\u30dd\u30fc\u30c8\u3057\u306b\u304f\u3044\u30b7\u30fc\u30f3\u304c\u307e\u3060\u307e\u3060\u3042\u308a\u307e\u3059\u3002\u305d\u306e\u305f\u3081\u306b\u3001\u5c0f\u3055\u306a\u5c0f\u3055\u306a\u30cf\u30c3\u30ad\u30f3\u30b0\u3092\u3057\u3066\u3001HQL\u5185\u306bRuby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u3092\u5229\u7528\u3067\u304d\u308b\u6a5f\u80fd\u3092\u958b\u767a\u3057\u307e\u3057\u305f\u3002Redis\u306eLua\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u306e\u3088\u3046\u306a\u3082\u306e\u3092\u30a4\u30e1\u30fc\u30b8\u3059\u308b\u3068\u308f\u304b\u308a\u3084\u3059\u3044\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002<\/p>\n<p>\u4eca\u56de\u7d39\u4ecb\u3059\u308b\u306e\u306f<strong>rb_exec<\/strong>\u3068<strong>rb_inject<\/strong>\u3001\u3068\u3044\u3046\u4e8c\u3064\u306eUDF\u3067\u3059\u3002<br \/>\n<strong>[\u8ffd\u8a18] 2013-12-18 \u516c\u958b\u3055\u308c\u307e\u3057\u305f\u3002<\/strong><a href=\"https:\/\/github.com\/gree\/hive-ruby-scripting\" title=\"https:\/\/github.com\/gree\/hive-ruby-scripting\">https:\/\/github.com\/gree\/hive-ruby-scripting<\/a><\/p>\n<h1>\u7c21\u5358\u306a\u4f8b<\/h1>\n<p>\u307f\u306a\u3055\u3093\u3054\u5b58\u77e5\u306e\u901a\u308a\u30019\u670820\u65e5\u304b\u3089NTT\u30c9\u30b3\u30e2\u304b\u3089\u3082iPhone\u304c\u8ca9\u58f2\u3055\u308c\u306f\u3058\u3081\u307e\u3057\u305f\u3002\u3042\u308b\u65e5\u3001\u30b5\u30fc\u30d3\u30b9\u306e\u30a2\u30af\u30bb\u30b9\u30ed\u30b0\u304b\u3089\u3001NTT\u30c9\u30b3\u30e2\u306eiPhone\u30e6\u30fc\u30b6\u30fc\u306e\u30a2\u30af\u30bb\u30b9\u63a8\u79fb\u306f\u3069\u3046\u5909\u308f\u3063\u305f\u304b\u3092\u78ba\u8a8d\u3057\u305f\u3044\u306e\u3060\u304c\u3069\u3046\u3059\u308c\u3070\u3088\u3044\u304b\u3001\u3068\u540c\u50da\u306b\u805e\u304b\u308c\u307e\u3057\u305f\u3002\u53b3\u5bc6\u3067\u306f\u306a\u3044\u3067\u3059\u304c\u3001\u30a2\u30af\u30bb\u30b9\u30ed\u30b0\u304b\u3089<a href=\"https:\/\/www.nttdocomo.co.jp\/service\/developer\/smart_phone\/spmode\/index.html\">NTT\u30c9\u30b3\u30e2\u306e\u30b9\u30de\u30fc\u30c8\u30d5\u30a9\u30f3IP\u30a2\u30c9\u30ec\u30b9<\/a>\u304b\u3064User-Agent\u304ciPhone\u3067\u3042\u308b\u30ec\u30b3\u30fc\u30c9\u3092\u6642\u7cfb\u5217\u3067\u96c6\u8a08\u3059\u308c\u3070\u3001\u304a\u304a\u3088\u305d\u5206\u304b\u308b\u306f\u305a\u3067\u3059\u3002<\/p>\n<p>\u305f\u3060\u3057\u3001\u3042\u308bIP\u30a2\u30c9\u30ec\u30b9\u304c<strong>1.66.96.0\/21<\/strong>\u306e\u3088\u3046\u306aIP\u7bc4\u56f2\u306b\u5165\u3063\u3066\u3044\u308b\u304b\u306f\u3001\u5358\u7d14\u306b\u6587\u5b57\u5217\u306e\u6b63\u898f\u8868\u73fe\u3068\u304b\u3067\u5224\u65ad\u3059\u308b\u3053\u3068\u306f\u3067\u304d\u307e\u305b\u3093\u3002Hive\u306b\u4f7f\u3048\u305d\u3046\u306a\u95a2\u6570\u3082\u306a\u3055\u305d\u3046\u3067\u3059\u3002\u305d\u3053\u3067\uff0c<strong>in_IP_range()<\/strong>\u307f\u306a\u3044\u306aUDF\u3092\u4f5c\u308b\u3053\u3068\u304c\u8003\u3048\u3089\u308c\u307e\u3059\u3002\u30b3\u30fc\u30c7\u30a3\u30f3\u30b0\u3059\u308b\u3053\u3068\u81ea\u8eab\u306f\u96e3\u3057\u3044\u3053\u3068\u3067\u306f\u306a\u3044\u3067\u3059\u304c\u3001Java\u3067UDF\u3092\u66f8\u304f\u3001Unit Test\u3067\u52d5\u4f5c\u78ba\u8a8d\u3001Jar\u30d5\u30a1\u30a4\u30eb\u3092\u4f5c\u308b\u3001\u672c\u756aHive\u306b\u5165\u308c\u308b...\u3000\u7d50\u69cb\u6642\u9593\u304c\u304b\u304b\u308a\u305d\u3046\u3067\u3059\u306d\u3002<\/p>\n<p>\u3053\u306e\u3088\u3046\u306a\u5834\u5408\u3001Ruby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u6a5f\u80fd<strong>rb_exec<\/strong>\u3092\u4f7f\u3046\u3068\u3001\u6570\u5206\u3067\u30af\u30a8\u30ea\u3092\u66f8\u304d\u7d42\u3048\u308b\u3053\u3068\u304c\u3067\u304d\u307e\u3059\uff01<\/p>\n<pre lang=\"ruby\">\nset rb.script = \nrequire \"ipaddr\"\n\n@docomo_ips = %Q[\n    1.66.96.0\/21  \n    1.66.104.0\/23  \n    1.72.0.0\/21  \n<ul>\n<li> \u4e00\u90e8\u7701\u7565...  <\/li>\n<\/ul>\n\n    183.75.128.0\/18  \n].split(\"n\").map{|ip| IPAddr.new ip}\n\ndef is_docomo_ip(ip_str)  \n    ip = IPAddr.new ip_str  \n    @docomo_ips.detect { |i| i.include? ip } ? \"true\" : \"false\"  \nend  \n;\n\nselect access_date, count(distinct user_id) from access  \n    where access_date >= \"20130901\" and user_agent like '%iPhone%'  \n          and rb_exec('is_docomo_ip', ipaddr) = 'true'  \n    group by access_date \n;  \n<\/pre>\n<p>\u3054\u89a7\u306e\u3068\u304a\u308a\u3001<strong>is_docomo_ip<\/strong>\u3068\u3044\u3046\u30e1\u30bd\u30c3\u30c9\u3092Ruby\u3067\u5b9a\u7fa9\u3057\u3001\u30af\u30a8\u30ea\u306bNTT\u30c9\u30b3\u30e2\u306eIP\u30a2\u30c9\u30ec\u30b9\u3067\u3042\u308b\u304b\u306e\u5224\u65ad\u306b\u4f7f\u3044\u307e\u3059\u3002<strong>is_docomo_ip<\/strong>\u30e1\u30bd\u30c3\u30c9\u306e\u4e2d\u306b\u306f\u3001NTT\u30c9\u30b3\u30e2\u306e\u30b9\u30de\u30fc\u30c8\u30d5\u30a9\u30f3IP\u7bc4\u56f2\u30ea\u30b9\u30c8\u3092\u30eb\u30fc\u30d7\u3057\u3001<a href=\"http:\/\/www.ruby-doc.org\/stdlib-1.9.3\/libdoc\/ipaddr\/rdoc\/IPAddr.html\">IPAddr<\/a>\u30af\u30e9\u30b9\u3092\u4f7f\u3063\u3066\u3001\u6e21\u3055\u308c\u305fIP\u30a2\u30c9\u30ec\u30b9\u306f\u305d\u308c\u306b\u5165\u3063\u305f\u304b\u3069\u3046\u304b\u3092\u5224\u65ad\u3057\u307e\u3059\u3002\u3082\u3057IP\u30a2\u30c9\u30ec\u30b9\u304cNTT\u30c9\u30b3\u30e2\u306e\u7bc4\u56f2\u3067\u3042\u308c\u3070\u3059\u3050\u306b<strong>true<\/strong>\u3092\u8fd4\u3059\u3001\u3082\u3057\u306a\u3051\u308c\u3070<strong>false<\/strong>\u3092\u8fd4\u3057\u307e\u3059\u3002Ruby\u306e<a href=\"http:\/\/ruby-doc.org\/core-1.9.3\/Enumerable.html#method-i-detect\">detect<\/a>\u306e\u304a\u304b\u3052\u3067\u3001\u4e00\u884c\u3067\u51e6\u7406\u3092\u66f8\u3051\u307e\u3057\u305f\u3002<\/p>\n<p>\u3053\u308c\u3092\u4f7f\u3063\u3066\u3001\u30b0\u30ea\u30fc\u306e\u3042\u308b\u30b5\u30fc\u30d3\u30b9\u306e9\u6708\u304b\u308910\u6708\u672b\u307e\u3067\u306e\u30a2\u30af\u30bb\u30b9\u30ed\u30b0\u306b\u5bfe\u3057\u3066\u96c6\u8a08\u3057\u3066\u307f\u307e\u3059\u3002\u767a\u58f2\u65e5\u306e9\u670820\u65e5\u304b\u3089\u30a2\u30af\u30bb\u30b9\u304c\u9806\u8abf\u306b\u5897\u52a0\u3057\u3066\u3044\u308b\u3053\u3068\u304c\u308f\u304b\u308a\u307e\u3057\u305f\uff01<strong>rb_exec<\/strong>\u3092\u4f7f\u3063\u3066\u3001\u7a00\u306aAd Hoc\u30af\u30a8\u30ea\u3082\u697d\u306b\u66f8\u3051\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002<br \/>\n<a href=\"http:\/\/labs.gree.jp\/blog\/wp-content\/uploads\/2013\/12\/0e981b0c-5665-11e3-8fd1-860e4d930219.png\"><img decoding=\"async\" src=\"http:\/\/labs.gree.jp\/blog\/wp-content\/uploads\/2013\/12\/0e981b0c-5665-11e3-8fd1-860e4d930219.png\" alt=\"0e981b0c-5665-11e3-8fd1-860e4d930219\" width=\"100%\" height=\"100%\" class=\"alignnone size-full wp-image-9085\" \/><\/a><\/p>\n<h1>\u5b9f\u88c5<\/h1>\n<p>Hive\u306e\u4e2d\u3067\u3001Ruby\u30b9\u30af\u30ea\u30d7\u30c8\u3092\u51e6\u7406\u3067\u304d\u308b\u306e\u306fJRuby\u306e\u304a\u9670\u3067\u3059\u3002<\/p>\n<p>\u306a\u305cPython\u3084Scala\u3068\u304b\u3067\u306f\u306a\u304f\u3001Ruby\u306b\u3057\u305f\u304b\u3068\u3044\u3046\u3068\u3001<br \/>\n<a href=\"http:\/\/en.wikipedia.org\/wiki\/List_of_JVM_languages#High-profile_languages\">\u4e3b\u6d41\u306aJVM\u8a00\u8a9e<\/a>\u306e\u4e2d\u306b\u3001\u6027\u80fd\u9762\u3068\u66f8\u304d\u3084\u3059\u3055(\u52c9\u5f37\u30b3\u30b9\u30c8\u3082)\u306e\u30d0\u30e9\u30f3\u30b9\u304b\u3089\u8003\u3048\u308b\u3068\u3001\u3084\u306f\u308a\u73fe\u6642\u70b9\u3067\u306fRuby\u304c\u4e00\u756a\u3044\u3044\u306e\u3067\u306f\u306a\u3044\u304b\u3068\u601d\u3046\u304b\u3089\u3067\u3059\u3002\u500b\u4eba\u7684\u306b\u3001Ruby\u306eEnumerable\u30af\u30e9\u30b9\u306e\u30e1\u30bd\u30c3\u30c9\u7fa4\u306f\u5927\u597d\u304d\u3067\u3059\u3002Hive\u306eArray\u578b\u304bMap\u578b\u306e\u30c7\u30fc\u30bf\u306b\u5bfe\u3057\u3066each\u3084find\u3084map\u306a\u3069Ruby\u306eEnumerable\u30af\u30e9\u30b9\u306e\u30e1\u30bd\u30c3\u30c9\u3092\u4f7f\u3063\u3066\u51e6\u7406\u3092\u3059\u308b\u306e\u306f\u304b\u3063\u3053\u3044\u3044\u3068\u601d\u3044\u307e\u305b\u3093\u304b\uff1f<\/p>\n<p>Java\u304b\u3089JRuby\u3092\u547c\u3073\u51fa\u3059\u306b\u306f<a href=\"https:\/\/github.com\/jruby\/jruby\/wiki\/RedBridge#wiki-JRuby_Embed_originally_known_as_Red_Bridge\">Red Bridge<\/a>\u3092\u4f7f\u3063\u3066\u3044\u307e\u3059\u3002\u3055\u3089\u306b\u6027\u80fd\u3092\u3088\u308a\u5411\u4e0a\u3059\u308b\u305f\u3081\u306b\u3001JRuby\u306e<a href=\"https:\/\/github.com\/jruby\/jruby\/wiki\/RedBridge#wiki-Disabling_Sharing_Variables\">sharing variables<\/a>\u6a5f\u80fd\u3092\u7981\u6b62\u3057\u3066\u3044\u307e\u3059\u3002Ruby\u3067\u5b9a\u7fa9\u3057\u305f\u30b9\u30af\u30ea\u30d7\u30c8\u3068\u305d\u306e\u4e2d\u306e\u30e1\u30bd\u30c3\u30c9\u306f\u3088\u308a\u52b9\u7387\u304c\u3044\u3044<a href=\"https:\/\/github.com\/jruby\/jruby\/wiki\/RedBridgeExamples#wiki-Method_Call\"><code>parse once, call many times<\/code><\/a>\u306a\u65b9\u6cd5\u306b\u3057\u307e\u3057\u305f\u3002<\/p>\n<p><strong>rb_exec<\/strong>\u306e\u5b9f\u88c5\u3092\u7c21\u5358\u306b\u8aac\u660e\u3059\u308b\u3068\u3001\u4ee5\u4e0b\u306e\u6d41\u308c\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<ol>\n<li><a href=\"https:\/\/github.com\/apache\/hive\/blob\/trunk\/ql\/src\/java\/org\/apache\/hadoop\/hive\/ql\/udf\/generic\/GenericUDF.java\" title=\"GenericUDF\">GenericUDF<\/a>\u3092\u62e1\u5f35<\/li>\n<li>UDF\u306e\u521d\u671f\u5316\u306e\u6642\u306bJRuby\u306eContainer\u3068\u5b9f\u884c\u30b3\u30f3\u30c6\u30ad\u30b9\u30c8\u306e\u6e96\u5099\u3092\u884c\u3046<\/li>\n<li>UDF\u306e\u5b9f\u51e6\u7406\u3092\u3059\u308bevaluate\u95a2\u6570\u306e\u4e2d\u306b<\/li>\n<li>\u6307\u540d\u3055\u308c\u305fJRuby\u30e1\u30bd\u30c3\u30c9\u306e\u540d\u524d\u3068\u5f15\u6570\u305f\u3061\u3092Hive\u304b\u3089\u3082\u3089\u3063\u3066\u3001\u578b\u5909\u63db\u306a\u3069\u3092\u3059\u308b<\/li>\n<li>\u5f15\u6570\u305f\u3061\u3092\u6307\u540d\u3055\u308c\u305fJRuby\u306e\u30e1\u30bd\u30c3\u30c9\u306b\u6e21\u3057\u3001\u5b9f\u884c\u3055\u305b\u308b<\/li>\n<li>JRuby\u30e1\u30bd\u30c3\u30c9\u306e\u8fd4\u308a\u5024\u3092evaluate\u306e\u7d50\u679c\u3068\u3057\u3066Hive\u306b\u8fd4\u3059<\/li>\n<\/ol>\n<p>Hive 0.11.0\u7248\u304b\u3089\u3001GenericUDF\u306f\u521d\u671f\u5316\u306e\u6bb5\u968e\u306b\u4e00\u56de<a href=\"http:\/\/hive.apache.org\/docs\/r0.11.0\/api\/org\/apache\/hadoop\/hive\/ql\/exec\/MapredContext.html\">Mapred\u30b8\u30e7\u30d6\u306e\u30b3\u30f3\u30c6\u30ad\u30b9\u30c8<\/a>\u306b\u30a2\u30af\u30bb\u30b9\u3059\u308b\u30c1\u30e3\u30f3\u30b9\u304c\u8ffd\u52a0\u3055\u308c\u307e\u3057\u305f\u3002(\u8a73\u7d30\u306f\u3053\u3061\u3089 <a href=\"https:\/\/issues.apache.org\/jira\/browse\/HIVE-1016\">JIRA:HIVE-1016<\/a>) \u305d\u308c\u306b\u3088\u3063\u3066\u3001Hive Session\u306bset\u6587\u3092\u4f7f\u3063\u3066\u5b9a\u7fa9\u3057\u305fRuby\u30b9\u30af\u30ea\u30d7\u30c8\u306fUDF\u306e\u4e2d\u3067\u3082\u30a2\u30af\u30bb\u30b9\u3059\u308b\u3053\u3068\u304c\u3067\u304d\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3057\u305f\u3002\u4e00\u65b9\u3001\u305d\u306e\u524d\u306e\u30d0\u30fc\u30b8\u30e7\u30f3(hive-0.10.0-cdh4.4.0\u306a\u3069)\u3067\u306f\u3001Ruby\u30b9\u30af\u30ea\u30d7\u30c8\u3092set\u6587\u3067\u4e8b\u524d\u306b\u5b9a\u7fa9\u3059\u308b\u306e\u3067\u306f\u306a\u304f\u3066\u3001<strong>rb_exec<\/strong>\u95a2\u6570\u306e\u5f15\u6570\u306e\u4e00\u3064\u3068\u3057\u3066\u6e21\u3055\u306a\u3051\u308c\u3070\u306a\u308a\u307e\u305b\u3093\u3002<\/p>\n<p>\u6700\u5f8c\u306b\u3082\u3046\u4e00\u3064\u306e\u8ab2\u984c\u304c\u3042\u308a\u307e\u3059\u3002\u3069\u3046\u3084\u3063\u3066Ruby\u30e1\u30bd\u30c3\u30c9\u306e\u7d50\u679c\u306e\u578b\u3092Hive\u306b\u4f1d\u3048\u308b\u304b\u3001\u3068\u3044\u3046\u70b9\u3067\u3059\u3002<br \/>\nGenericUDF\u3092\u5b9a\u7fa9\u3059\u308b\u6642\u3001\u521d\u671f\u5316\u306e\u6642\u306bUDF\u306e\u7d50\u679c\u306e\u578b(<strong>ObjectInspector<\/strong>)\u3092Hive\u306b\u4f1d\u3048\u308b\u5fc5\u8981\u304c\u3042\u308a\u307e\u3059\u3002\u4e0b\u8a18\u306einitialize\u3068\u3044\u3046\u95a2\u6570\u306e\u3053\u3068\u3067\u3059\u3002<\/p>\n<pre lang=\"java\">\n\/**  \n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_0\"> Initialize this GenericUDF. This will be called once and only once per  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_1\"> GenericUDF instance.  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_2\">  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_3\"> @param arguments  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_4\"> The ObjectInspector for the arguments  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_5\"> @throws UDFArgumentException  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_6\"> Thrown when arguments have wrong types, wrong length, etc.  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_7\"> @return The ObjectInspector for the return value  <\/h3>\n<h3 id=\"hs_5b00fb0968b39bbcdd877b498d76a482_header_8\">\/  <\/h3>\npublic abstract ObjectInspector initialize(ObjectInspector[] arguments)  \nthrows UDFArgumentException;  \n<\/pre>\n<p>\u3057\u304b\u3057\u3001Ruby\u306f\u52d5\u7684\u8a00\u8a9e\u306a\u306e\u3067\u3001\u30b9\u30af\u30ea\u30d7\u30c8\u304b\u3089\u30e1\u30bd\u30c3\u30c9\u306e\u8fd4\u3059\u578b\u3092\u81ea\u52d5\u5224\u65ad\u3059\u308b\u306e\u306f\u56f0\u96e3\u3067\u3059\u3002\u305d\u306e\u89e3\u6c7a\u65b9\u6cd5\u3068\u3057\u3066\u3001\u8fd4\u308a\u5024\u3092\u5e38\u306bString\u3068\u3057\u3066\u8fd4\u3059\u3053\u3068\u3082\u8003\u3048\u3089\u308c\u307e\u3059\u3002(\u5b9f\u969b\u3001Hive\u306eNative\u306a<a href=\"https:\/\/issues.apache.org\/jira\/browse\/HIVE-471\"><code>reflect<\/code><\/a>\u95a2\u6570\u306f<a href=\"https:\/\/issues.apache.org\/jira\/browse\/HIVE-4025\">0.11\u7248\u4ee5\u524d\u306f\u305a\u3063\u3068\u7d50\u679c\u3092String\u306b\u3057\u3066\u304b\u3089\u8fd4\u3059\u3088\u3046\u3067\u3057\u305f<\/a>) \u305f\u3060\u3057\u3001Hive\u306e\u914d\u5217\u30c7\u30fc\u30bf\u306b\u5bfe\u3057\u3066Ruby\u306emap\u30e1\u30bd\u30c3\u30c9\u3067\u5909\u63db\u3057\u306a\u304c\u3089\u3001\u7d50\u679c\u306f\u914d\u5217\u3067\u306f\u306a\u304fString\u306b\u5909\u63db\u3055\u308c\u3066\u3057\u307e\u3046\u3068\u6b8b\u5ff5\u3067\u3059\uff08\u7b11\uff09\u3002<br \/>\n\u6700\u7d42\u7684\u306b<strong>\u30e6\u30fc\u30b6\u30fc\u304cUDF\u306e\u7d50\u679c\u306e\u578b\u306e\u30d2\u30f3\u30c8\u3092Hive\u306b\u6559\u3048\u308b<\/strong>\u3068\u3044\u3046\u5b9f\u88c5\u306b\u3057\u307e\u3057\u305f\u3002\u30c7\u30d5\u30a9\u30eb\u30c8\u306e\u5834\u5408\u306f\u81ea\u52d5\u7684\u306bString\u3092\u8fd4\u3059\u3053\u3068\u306b\u3057\u3001\u305d\u308c\u4ee5\u5916\u306e\u5834\u5408\u306f\u3001\u7d50\u679c\u3068\u540c\u3058\u578b\u306e\u5024\u3092\u4e00\u756a\u76ee\u306e\u5f15\u6570\u306b\u5165\u308c\u3066\u3001Hive\u306b\u30d2\u30f3\u30c8\u3068\u3057\u3066\u4f1d\u3048\u308b\u3053\u3068\u306b\u3057\u307e\u3057\u305f\u3002\u4f8b\u3048\u3070<strong>Map&lt;String,String><\/strong>\u578b\u306e\u7d50\u679c\u3067\u3042\u308c\u3070:<\/p>\n<p><center><br \/>\n<strong>rb_exec(Map('k','v'), 'gen_map', args...)  <\/strong><br \/>\n<\/center><\/p>\n<p>\u3068\u3044\u3046\u8a18\u8ff0\u306b\u306a\u308a\u307e\u3059\u3002Hive\u306eMap\u5b9a\u7fa9\u95a2\u6570\u3092\u4f7f\u3063\u3066\u3001\u4e00\u3064\u30ad\u30fc\u30d0\u30ea\u30e5\u30fc\u30da\u30a2\u3057\u304b\u3044\u306a\u3044Map\u3092\u4f5c\u3063\u3066<strong>rb_exec<\/strong>\u306b\u6e21\u305b\u3070\u3001<strong>rb_exec<\/strong>\u306fUDF\u306e\u8fd4\u308a\u5024\u306e\u578b\u3082\u305d\u308c\u3068\u540c\u3058\u306e<strong>Map&lt;String, String><\/strong>\u3067\u3042\u308b\u306e\u3092\u8a8d\u8b58\u3057\u3066\u304f\u308c\u307e\u3059\u3002\u554f\u984c\u89e3\u6c7a\uff01<\/p>\n<h1>UDAF\u3082Ruby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0<\/h1>\n<p>Hive\u3067\u306f\u3001\u4e00\u822c\u7684\u306aUDF\u306e\u4ed6\u3001<strong>UDAF<\/strong>(user defined aggregate function)\u3068\u3044\u3046\u3082\u3046\u4e00\u7a2e\u985e\u306e\u3088\u304f\u4f7f\u308f\u308c\u3066\u3044\u308b\u95a2\u6570\u304c\u3042\u308a\u307e\u3059\u3002count\u3084sum\u306a\u3069\u306e\u3088\u3046\u306a\u8907\u6570\u306e\u884c\u5165\u529b\u304b\u3089\u4e00\u884c\u306e\u7d50\u679c\u3092\u7b97\u51fa\u3059\u308b\u95a2\u6570\u306e\u3053\u3068\u3067\u3059\u3002Ruby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u6a5f\u80fd\u3092\u4e00\u822c\u7684\u306aUDF\u3060\u3051\u3067\u306f\u306a\u304f\u3001UDAF\u306b\u3082\u5229\u7528\u3067\u304d\u308b\u3088\u3046\u306b\u3057\u307e\u3057\u305f\u3002\u305d\u308c\u304c<strong>rb_inject<\/strong>\u3068\u3044\u3046\u95a2\u6570\u3067\u3059\u3002\uff08 \u305f\u3060\u3057\u3001UDAF\u306e\u51e6\u7406\u306b\u306f\u884c\u5165\u529b\u306b\u5bfe\u3059\u308b\u51e6\u7406\u3068\u90e8\u5206\u7684\u306a\u7d50\u679c\u3092\u30de\u30fc\u30b8\u3059\u308b\u51e6\u7406\u306a\u3069\u3044\u304f\u3064\u304b\u306e\u6bb5\u968e\u304c\u3042\u308a\u307e\u3059\u306e\u3067\u3001Ruby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u6a5f\u80fd\u306e\u5b9f\u88c5\u3082UDF\u3088\u308a\u8907\u96d1\u306b\u306a\u3063\u3066\u3044\u307e\u3059\u304c\u3001\u4eca\u56de\u306f\u305d\u308c\u306b\u95a2\u3059\u308b\u8a71\u3092\u30b9\u30ad\u30c3\u30d7\u3055\u305b\u3066\u3044\u305f\u3060\u304d\u307e\u3059\u3002\uff09<\/p>\n<pre lang=\"sql\">  \n<ul>\n<li> sum UDAF implemented w\/ rb_inject<\/li>\n<\/ul>\n\n\nset rb.script =  \ndef sum(memo, arg)  \n    memo + arg  \nend  \n;\n\nselect  \n    user_id,  \n    rb_inject('sum', coin) as coin_all  \n    from user_coin  \n    group by user_id  \n;\n<\/pre>\n<p>\u3054\u89a7\u306e\u901a\u308a\u3001Ruby\u306e<a href=\"http:\/\/ruby-doc.org\/core-1.9.3\/Enumerable.html#method-i-inject\">inject<\/a>\u95a2\u6570\u3068\u975e\u5e38\u306b\u4f3c\u3066\u3044\u307e\u3059\u3002\u307e\u305aRuby\u30b9\u30af\u30ea\u30d7\u30c8\u306e\u4e2d\u306b\u3001sum\u3068\u3044\u3046\u30e1\u30bd\u30c3\u30c9\u3092\u5b9a\u7fa9\u3057\u307e\u3059\u3002memo\u306b\u73fe\u5728\u306e\u8981\u7d20\u3092\u52a0\u3048\u3066\u8fd4\u3059\u5358\u7d14\u306a\u52a0\u7b97\u3067\u3059\u3002<strong>memo<\/strong>\u306f\u524d\u56de\u306e\u623b\u308a\u5024\u304b\u521d\u671f\u5024\u3067\u3059\u3002 \u305d\u306e\u3042\u3068\u3001\u30af\u30a8\u30ea\u306e\u4e2d\u306b\u3001<strong>rb_inject<\/strong> UDAF\u306b\u547c\u3073\u305f\u3044Ruby\u30e1\u30bd\u30c3\u30c9\u3068\u30e1\u30bd\u30c3\u30c9\u306b\u6e21\u3057\u305f\u3044\u5f15\u6570\u3092\u6e21\u3057\u307e\u3059\u3002\u3053\u3053\u3067\u306f<strong>coin<\/strong>\u3068\u3044\u3046\u30ab\u30e9\u30e0\u3067\u3059\u3002\u3053\u308c\u3067Hive\u306enative\u306asum\u95a2\u6570\u3068\u540c\u3058\u96c6\u7d04\u51e6\u7406\u3092Ruby\u3067\u3082\u5b9a\u7fa9\u3067\u304d\u307e\u3057\u305f\u3002\u5b9f\u969b\u3001\u914d\u5217\u306eunion\u3084intersection\u3068\u304b\u3001group by\u3057\u305f\u3042\u3068\u306e\u30b5\u30f3\u30d7\u30ea\u30f3\u30b0\u3068\u304b\u3001<strong>rb_inject<\/strong>\u3092\u4f7f\u3063\u3066Ad Hoc\u306a\u96c6\u7d04\u51e6\u7406\u3082\u975e\u5e38\u306b\u4fbf\u5229\u306b\u66f8\u3051\u308b\u3088\u3046\u306b\u306a\u308a\u307e\u3059\u3002<\/p>\n<h1>\u30d1\u30d5\u30a9\u30fc\u30de\u30f3\u30b9<\/h1>\n<p>Ruby\u3067\u66f8\u304f\u51e6\u7406\u306e\u6027\u80fd\u306f\u3069\u3046\u3067\u3057\u3087\u3046\u304b\u3001\u304a\u305d\u3089\u304f\u591a\u304f\u306e\u65b9\u304cJava\u3067\u5b9f\u88c5\u3055\u308c\u305fnative\u306aUDF\u3068\u6bd4\u3079\u308b\u3068\u9045\u304f\u306a\u308b\u3060\u308d\u3046\u3068\u60f3\u50cf\u3059\u308b\u306f\u305a\u3067\u3059\u3002\u305d\u308c\u306f\u305d\u306e\u901a\u308a\u3067\u3059\u3002\u57fa\u672c\u7684\u306b\u306fnative\u306aHive\u95a2\u6570\u3088\u308a\u51e6\u7406\u6642\u9593\u304c\u304b\u304b\u3063\u3066\u3044\u307e\u3059\u3002JRuby\u81ea\u8eab\u306e\u6027\u80fd\u306e\u95a2\u4fc2\u3082\u3042\u308a\u307e\u3059\u3057\u3001Hadoop\u306eWritable Object\u3068Java\u306e\u6a19\u6e96\u306aObject\u3001JRuby\u306eObject\u306e\u9593\u306e\u578b\u5909\u63db\u306b\u306f\u304b\u306a\u308a\u306a\u30aa\u30fc\u30d0\u30fc\u30d8\u30c3\u30c9\u304c\u3042\u308a\u307e\u3059\u3002\u4e00\u65b9\u3001Hive\u30b8\u30e7\u30d6\u306e\u591a\u304f\u306fI\/O\u30d0\u30a6\u30f3\u30c9\u306a\u51e6\u7406\u306a\u306e\u3067\u3001\u6027\u80fd\u3078\u306e\u5f71\u97ff\u306f\u5927\u304d\u304f\u306a\u3044\u304b\u3082\u3057\u308c\u307e\u305b\u3093\u3002\u8efd\u304f\u6bd4\u8f03\u30c6\u30b9\u30c8\u3092\u884c\u3063\u305f\u306e\u3067\u3001\u305d\u306e\u7d50\u679c\u3092\u7d39\u4ecb\u3057\u307e\u3059\u3002<\/p>\n<p>\u30c6\u30b9\u30c8\u7528\u306eMapreduce\u30bf\u30b9\u30af\u306eJVM(v1.7.0_25)\u8a2d\u5b9a\u306f\u4ee5\u4e0b\u306e\u901a\u308a\u3067\u3059:<\/p>\n<ul>\n<li>-server<\/li>\n<li>-Xmx1500m<\/li>\n<li>-XX:+UseParallelOldGC<\/li>\n<li>-Djruby.compile.invokedynamic=true<\/li>\n<li>-Dfile.encoding=UTF-8<\/li>\n<li>-XX:+AggressiveOpts<\/li>\n<li>-XX:+UnlockDiagnosticVMOptions<\/li>\n<li>-XX:+UnlockExperimentalVMOptions<\/li>\n<li>-Djruby.compile.invokedynamic=true<\/li>\n<li>-Djruby.ji.objectProxyCache=false<\/li>\n<\/ul>\n<p>\u3044\u304f\u3064\u304b\u7a2e\u985e\u306e\u30af\u30a8\u30ea\u3092\u305d\u308c\u305e\u308cNative\u306aUDF\u3068<strong>rb_exec<\/strong>\u3067\u5b9f\u884c\u3057\u30013\u56de\u306e\u4e2d\u304b\u3089\u6700\u3082\u65e9\u304b\u3063\u305f\u30b1\u30fc\u30b9\u306e\u6642\u9593\u3067\u6bd4\u8f03\u3059\u308b\u3068:<\/p>\n<table>\n<thead>\n<tr>\n<th><\/th>\n<th>\u6570\u5024\u8a08\u7b97<\/th>\n<th>\u6587\u5b57\u5217\u51e6\u7406<\/th>\n<th>JSON lookup<\/th>\n<th>Array\u30bd\u30fc\u30c8<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td>native UDF<\/td>\n<td>77s<\/td>\n<td>82s<\/td>\n<td>56s<\/td>\n<td>42s<\/td>\n<\/tr>\n<tr>\n<td>rb_exec<\/td>\n<td>80s<\/td>\n<td>122s<\/td>\n<td>114s<\/td>\n<td>44s<\/td>\n<\/tr>\n<tr>\n<td>\u6bd4\u8f03<\/td>\n<td>-3.9%<\/td>\n<td>-48.8%<\/td>\n<td>-103.6%<\/td>\n<td>-4.7%<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>\u3042\u307e\u308a\u5dee\u304c\u306a\u3044\u30b7\u30fc\u30f3\u3082\u3042\u308a\u307e\u3059\u304c\u3001\u60aa\u3044\u30b1\u30fc\u30b9\u3060\u30682\u500d\u304f\u3089\u3044\u306e\u51e6\u7406\u6642\u9593\u3068\u306a\u3063\u3066\u3057\u307e\u3063\u3066\u307e\u3059\u3002\u3067\u3082\u3001\u66f8\u3051\u308b\u51e6\u7406\u306e\u67d4\u8edf\u6027\u3068\u306e\u30c8\u30ec\u30fc\u30c9\u30aa\u30d5\u3068\u3057\u3066\u306f\u5341\u5206\u306b\u53d7\u3051\u5165\u308c\u3089\u308c\u308b\u30ec\u30d9\u30eb\u3058\u3083\u306a\u3044\u304b\u3068\u601d\u3044\u307e\u3059\u3002\u5b9f\u306fRuby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u3067\u5b9f\u884c\u3059\u308b\u30af\u30a8\u30ea\u306f\u5fc5\u305anative\u306aUDF\u3092\u4f7f\u3046\u30af\u30a8\u30ea\u3088\u308a\u9045\u304f\u306a\u308b\u308f\u3051\u3067\u3082\u3042\u308a\u307e\u305b\u3093\u3002\u4f8b\u3048\u3070\u4ee5\u4e0b\u306e\u4e8c\u3064\u30af\u30a8\u30ea:<\/p>\n<pre lang=\"sql\">  \n<ul>\n<li> \u7279\u306b\u610f\u5473\u304c\u306a\u3044\u30c6\u30b9\u30c8\u5c02\u7528\u30af\u30a8\u30ea  <\/li>\n<li> use native UDF  <\/li>\n<\/ul>\n\nselect count(v)  \n    from (  \n        select  \n            elapsed_time,  \n            abs(round( sin( log2( pi() * rand() * 2 * elapsed_time ) + elapsed_time ))) as v  \n            from access  \n) t;\n\n<ul>\n<li> use rb_exec  <\/li>\n<\/ul>\n\nset rb = '  \n@r = Random.new  \ndef cal(x)  \n    Math.sin(Math.log2(<a href=\"#hs_5b00fb0968b39bbcdd877b498d76a482_footnote_1\" id=\"hs_5b00fb0968b39bbcdd877b498d76a482_footnotelink_1\" title=\"Math::PI * @r.rand * 2 * x ) + x \">*1<\/a>).round.abs  \nend  \n';  \nselect count(v)  \nfrom (  \n    select elapsed_time, rb_exec('cal', elapsed_time) as v  \n        from log  \n) t;  \n<\/pre>\n<p>\u5b9f\u969b\u306b\u3053\u308c\u30922\u5104\u884c\u304f\u3089\u3044\u306e\u30c6\u30fc\u30d6\u30eb\u306b\u5bfe\u3057\u3066\u6027\u80fd\u30c6\u30b9\u30c8\u3092\u3057\u3066\u307f\u305f\u3068\u3053\u308d\u3001Ruby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u306e\u307b\u3046\u304c\u65e9\u3044\u3068\u3044\u3046\u7d50\u679c\u306b\u306a\u308a\u307e\u3057\u305f\uff01<strong>rb_exec<\/strong>\u306e\u307b\u3046\u306f74\u79d2\u3067\u5b8c\u4e86\u3057\u307e\u3057\u305f\u304c\u3001native UDF\u306e\u307b\u3046\u306f92\u79d2\u304b\u304b\u3063\u3066\u3057\u307e\u3044\u307e\u3057\u305f\u3002\u7406\u7531\u306f\u3001\u304a\u305d\u3089\u304fnative\u306e\u6f14\u7b97\u306b\u306fUDF\u3092\u4f7f\u3044\u904e\u304e\u3067\u3001UDF\u306e\u547c\u3073\u51fa\u3057\u306b\u304b\u306a\u308a\u306a\u30aa\u30fc\u30d0\u30fc\u30d8\u30c3\u30c9\u304c\u767a\u751f\u3057\u3066\u3057\u307e\u3046\u306e\u3067\u306f\u306a\u3044\u304b\u3068\u8003\u3048\u3066\u3044\u307e\u3059\u3002<\/p>\n<h1>\u307e\u3068\u3081<\/h1>\n<p>Apache Hive\u306bRuby\u30b9\u30af\u30ea\u30d7\u30c6\u30a3\u30f3\u30b0\u6a5f\u80fd\u3092\u8ffd\u52a0\u3057\u305f\u3053\u3068\u3092\u7d39\u4ecb\u3055\u305b\u3066\u9802\u304d\u307e\u3057\u305f\u3002\u3053\u308c\u306b\u3088\u3063\u3066\u3001Hive\u30af\u30a8\u30ea\u306e\u8868\u73fe\u529b\u3068\u67d4\u8edf\u6027\u306f\u5927\u5e45\u306b\u5411\u4e0a\u3067\u304d\u307e\u3057\u305f\u3002\u306a\u304a\u3001\u5b8c\u5168\u306bnative\u306a\u95a2\u6570\u306e\u4ee3\u308f\u308a\u306b\u3059\u308b\u3064\u3082\u308a\u306f\u306a\u3044\u3067\u3059\u304c\u3001\u4ee5\u4e0b\u306e\u3044\u304f\u3064\u304b\u306e\u30b7\u30fc\u30f3\u3067\u306f\u6d3b\u7528\u3067\u304d\u308b\u3068\u601d\u3063\u3066\u3044\u307e\u3059\u3002<\/p>\n<ul>\n<li>Ad Hoc\u306a\u30af\u30a8\u30ea  <\/li>\n<li>\u30e9\u30d4\u30c3\u30c9\u30d7\u30ed\u30c8\u30bf\u30a4\u30d4\u30f3\u30b0  <\/li>\n<li>Hive\u306ecomplex\u578b\u30c7\u30fc\u30bf\u306b\u5bfe\u3059\u308b\u51e6\u7406  <\/li>\n<li>Ruby\u306e\u30e9\u30a4\u30d6\u30e9\u30ea\u3092\u6d3b\u7528<\/li>\n<\/ul>\n<h1> <\/h1>\n<p>\u660e\u65e5\u306f\u3001\u52dd\u53c8 \u5065\u592a\u3055\u3093\uff01\uff01\u3053\u306e\u30d6\u30ed\u30b0\u3067\u3082\u4f7f\u3063\u3066\u3044\u308bWordPress\u306e\u5197\u9577\u69cb\u6210\u306b\u3064\u3044\u3066\u306e\u304a\u8a71\u3067\u3059\uff01\uff01\uff01\uff01<\/p>\n<p><a href=\"https:\/\/github.com\/y-lan\">Yuyang Lan<\/a>\u3001\u7de8\u96c6\uff1a\u6a4b\u672c\u6cf0\u4e00<\/p>\n<div class=\"footnote\">\n<p id=\"hs_5b00fb0968b39bbcdd877b498d76a482_footnote_1\"><a href=\"#hs_5b00fb0968b39bbcdd877b498d76a482_footnotelink_1\">*1<\/a>: Math::PI * @r.rand * 2 * x ) + x <\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u3053\u3093\u306b\u3061\u306f\u3002Web Game\u4e8b\u696d\u7d71\u62ec\u672c\u90e8 \u30c7\u30fc\u30bf\u57fa\u76e4\u30c1\u30fc\u30e0\u306e lan \u3067\u3059\u3002 Advent Calendar 3\u65e5\u76ee\u306e\u4eca\u65e5\u306f\u3001Hadoop\u306e\u4e0a\u306b\u4e57\u308b\u30c7\u30fc\u30bf\u30a6\u30a7\u30a2\u30cf\u30a6\u30b9\u3067\u3042\u308bApache Hive\u306b\u3064\u3044\u3066\u3001\u8a71\u3092\u3055\u305b\u3066\u9802\u304d\u305f [&hellip;]<\/p>\n","protected":false},"author":59,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[],"tags":[54,57,25],"class_list":["post-9061","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","tag-advent-calendar","tag-hadoop","tag-ruby"],"jetpack_featured_media_url":"","jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/9061","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/users\/59"}],"replies":[{"embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/comments?post=9061"}],"version-history":[{"count":3,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/9061\/revisions"}],"predecessor-version":[{"id":10019,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/posts\/9061\/revisions\/10019"}],"wp:attachment":[{"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/media?parent=9061"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/categories?post=9061"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/labs.gree.jp\/blog\/wp-json\/wp\/v2\/tags?post=9061"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}