用於訪問hive serde的登錄的正則表達式 - regex for access log in hive serde -开发者知识库

用於訪問hive serde的登錄的正則表達式 - regex for access log in hive serde -开发者知识库,第1张

I want to extract out (ip, requestUrl, timeStamp) from the access logs to load to hive database. One line from access log is as follows.

我想從訪問日志中提取出(ip,requestUrl,timeStamp)以加載到hive數據庫。訪問日志中的一行如下。


66.249.68.6 - - [14/Jan/2012:06:25:03 -0800] "GET /example.com HTTP/1.1" 200 708 "-" "Mozilla/5.0 (compatible; Googlebot/2.1;  http://www.google.com/bot.html)"

I tried with following and several variations of regex without any success. (The loaded table is with all NULL values indicating the regex doesn't match the input).

我嘗試使用以下幾種正則表達式而沒有任何成功。 (加載的表包含所有NULL值,表示正則表達式與輸入不匹配)。


CREATE TABLE access_log (
  remote_ip STRING,
  request_date STRING,
  method STRING,
  request STRING,
  protocol STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'
WITH SERDEPROPERTIES  (
"input.regex" = "([^ ]) . . [([^]] )] \"([^ ]) ([^ ]) ([^ \"])\" *",
"output.format.string" = "%1$s %2$s %3$s %4$s %5$s"
)
STORED AS TEXTFILE;

I am not very experienced with regex. Can anybody help me with this?

我對正則表達式不是很有經驗。任何人都可以幫我嗎?

3 个解决方案

#1


7  

I use rubular to test my regex. You can also use this expression

我用rubular來測試我的正則表達式。您也可以使用此表達式

([^ ]*) ([^ ]*) ([^ ]*) (?:-|\[([^\]]*)\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*)

You get the following output

您將獲得以下輸出

1.  66.249.68.6
2.  -
3.  -
4.  14/Jan/2012:06:25:03 -0800
5.  "GET /example.com HTTP/1.1"
6.  200

最佳答案:

本文经用户投稿或网站收集转载,如有侵权请联系本站。

发表评论

0条回复