perl处理UTF-8编码,对URL地址进行编码

首先进行声明
use Encode;

  • gbk转uft-8:
    $line = encode("utf-8",decode("gbk",$line));

    $line = encode_utf8(decode("gbk",$line));

  • utf-8转gbk:
    $line = encode("gbk", decode("utf8", $line));

  • uft-8转gb2312:
    $line = encode("gb2312", decode("utf8", $line));

测试可用!by:Neeao


perl中对URL地址进行编码
URL中如果有中文的字符,要先对其进行编码才能进行下一步的处理。

  • 用替换方法进行编码:

    Url encode:对 \n 不转码

    1
    perl -pe 's/([^\w\-\.\@])/$1 eq "\n" ? "\n":sprintf("%%%2.2x",ord($1))/eg' keywords.list
  • 用替换方法进行解码:

    Url Decode:

    1
    perl -pe 's/%(..)/pack("c", hex($1))/eg' keywords.list
  • 用URI::URL模块进行编码

    1
    2
    3
    4
    use URI::URL;
    my $str = "http://www.google.com/lxmxn's blog&b=Hello,perl";
    my $url = URI::URL->new( $str );
    print $url;
  • 用URI::Escape模块进行编码

    1
    2
    3
    4
    5
    use URI::Escape;
    my $str='北极神话';
    print uri_escape($str);

    #result:%B1%B1%BC%AB%C9%F1%BB%B0