PHP的curl常用的5个例子(php开启curl扩展)

网友投稿 1012 2022-07-29

1,抓取无访问控制文件

PHP的curl常用的5个例子(php开启curl扩展)

$ch= curl_init();

curl_setopt($ch, CURLOPT_URL,"http://localhost/mytest/phpinfo.php");

curl_setopt($ch, CURLOPT_HEADER, false);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);//如果把这行注释掉的话,就会直接输出

$result=curl_exec($ch);

curl_close($ch);

?>

2,使用代理进行抓取

为什么要使用代理进行抓取呢?以google为例吧,如果去抓google的数据,短时间内抓的很频繁的话,你就抓取不到了。google对你的ip地址做限制这个时候,你可以换代理重新抓。

$ch= curl_init();

curl_setopt($ch, CURLOPT_URL,"http://blog.51yip.com");

curl_setopt($ch, CURLOPT_HEADER, false);

curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);

curl_setopt($ch, CURLOPT_HTTPPROXYTUNNEL, TRUE);

curl_setopt($ch, CURLOPT_PROXY, 125.21.23.6:8080);

//url_setopt($ch, CURLOPT_PROXYUSERPWD, 'user:password');如果要密码的话,加上这个

$result=curl_exec($ch);

curl_close($ch);

?>

3,post数据后,抓取数据

单独说一下数据提交数据,因为用 curl的时候,很多时候会有数据交互的,所以比较重要的。

$ch= curl_init();

/*在这里需要注意的是,要提交的数据不能是二维数组或者更高

*例如array('name'=>serialize(array('tank','zhang')),'sex'=>1,'birth'=>'20101010')

*例如array('name'=>array('tank','zhang'),'sex'=>1,'birth'=>'20101010')这样会报错的*/

$data=array('name'=>'test','sex'=>1,'birth'=>'20101010');

curl_setopt($ch, CURLOPT_URL,'http://localhost/mytest/curl/upload.php');

curl_setopt($ch, CURLOPT_POST, 1);

curl_setopt($ch, CURLOPT_POSTFIELDS,$data);

curl_exec($ch);

?>

在 upload.php文件中,print_r($_POST);利用curl就能抓取出upload.php输出的内容Array ( [name] => test [sex] => 1 [birth] => 20101010 )

4,抓取一些有页面访问控制的页面

以前写过一篇,页面访问控制的3种方法有兴趣的可以看一下。

如果用上面提到的方法抓的话,会报以下错误

You are not authorized to view this page

Youdonot have permission to view this directoryorpage using the credentials that you supplied because your Web browser is sending a WWW-Authenticate header field that the Web server is not configured to accept.

这个时候,我们就要用CURLOPT_USERPWD来进行验证了

$ch= curl_init();

curl_setopt($ch, CURLOPT_URL,"http://club-china");

/*CURLOPT_USERPWD主要用来破解页面访问控制的

*例如平时我们所以htpasswd产生页面控制等。*/

//curl_setopt($ch, CURLOPT_USERPWD, '231144:2091XTAjmd=');

curl_setopt($ch, CURLOPT_HTTPGET, 1);

curl_setopt($ch, CURLOPT_REFERER,"http://club-china");

curl_setopt($ch, CURLOPT_HEADER, 0);

$result=curl_exec($ch);

curl_close($ch);

?>

5,模拟登录到sina

我们要抓取数据,可能是登录以后的内容,这个时候我们就要用到curl的模拟登录功能了。

functionchecklogin($user,$password)

{

if( emptyempty($user) || emptyempty($password) )

{

return0;

}

$ch= curl_init( );

curl_setopt($ch, CURLOPT_REFERER,"http://mail.sina.com-/index.html");

curl_setopt($ch, CURLOPT_HEADER, true );

curl_setopt($ch, CURLOPT_RETURNTRANSFER, true );

curl_setopt($ch, CURLOPT_USERAGENT, USERAGENT );

curl_setopt($ch, CURLOPT_COOKIEJAR, COOKIEJAR );

curl_setopt($ch, CURLOPT_TIMEOUT, TIMEOUT );

curl_setopt($ch, CURLOPT_URL,"http://mail.sina.com-/cgi-bin/login.cgi");

curl_setopt($ch, CURLOPT_POST, true );

curl_setopt($ch, CURLOPT_POSTFIELDS,"&logintype=uid&u=".urlencode($user)."&psw=".$password);

$contents= curl_exec($ch);

curl_close($ch);

if( !preg_match("/Location: (.*)\\/cgi\\/index\\.php\\?check_time=(.*)\n/",$contents,$matches) )

{

return0;

}else{

return1;

}

}

define("USERAGENT",$_SERVER['HTTP_USER_AGENT'] );

define("COOKIEJAR", tempnam("/tmp","cookie") );

define("TIMEOUT", 500 );

echochecklogin("zhangying215","xtaj227");

?>

打开/tmp下面的cookie文件看一下

# Netscape HTTP Cookie File

# http://curl.haxx.se/rfc/cookie_spec.html

# This file was generated by libcurl! Edit at your own risk.

mail.sina.com- FALSE / FALSE 0 SINAMAIL-WEBFACE-SESSID 65223c4bd8900284ed463d2a3e1ac182

#HttpOnly_.sina.com- TRUE / FALSE 0 SUE es%3D8d96db0820c6c79922ad57d422f575e8%26ev%3Dv0%26es2%3Dcddfb8400dc5ca95902367ddcd7f57dd

.sina.com- TRUE / FALSE 0 SUP cv%3D1%26bt%3D1286900433%26et%3D1286986833%26lt%3D1%26uid%3D1445632344%26user%3D%25E5%25BC%25A0%25E6%2598%25A02001%26ag%3D2%26name%3Dzhangying20015%2540sina.com%26nick%3D%25E5%25BC%25A0%25E6%2598%25A02001%26sex%3D1%26ps%3D0%26email%3Dzhangying20015%2540sina.com%26dob%3D1982-07-18

#HttpOnly_.sina.com- TRUE / FALSE 0 SID BihcallomxMx-QZxzGrOlcSQx%2F0B%2F0cmr.NyQ%2F0B%2FcmGGalmarlmcHrcGlSmrmxmfxal_CBZ%2F_afugCmmGirBYHm0Bc%40fr5ciZiGG5i

#HttpOnly_.sina.com- TRUE / FALSE 0 SPRIAL bfb4102951fd5892a3fd5b42d442cd26

#HttpOnly_.sina.com- TRUE / FALSE 0 SINA_USER %D5%C5%D2001

版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。

上一篇:UI设计师该如何学习前端?(UI设计师项目经验)
下一篇:前端工程师的兴起(前端工程师的发展前景)
相关文章

 发表评论

暂时没有评论,来抢沙发吧~