之前已经介绍过了网络相关的一些基础知识了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
以及简单的网页内容抓取,用C#是如何实现的:
现在接着来介绍,以模拟登陆百度首页:
为例,说明如何通过C#模拟登陆网站。
不过,此处需要介绍一下此文前提:
假定你已经看完了:
【整理】关于抓取网页,分析网页内容,模拟登陆网站的逻辑/流程和注意事项
了解了基本的网络相关基本概念;
看完了:
【总结】浏览器中的开发人员工具(IE9的F12和Chrome的Ctrl+Shift+I)-网页分析的利器
知道了如何使用IE9的F12等工具去分析网页执行的过程。
1.模拟登陆网站之前,需要搞清楚,登陆该网站的内部执行逻辑
此想要通过程序,即C#代码,实现模拟登陆百度首页之前。
你自己本身先要搞懂,本身登陆该网站,内部的逻辑是什么样的。
而关于如何利用工具,分析出来,百度首页登录的内部逻辑过程,参见:
【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程
2.然后才是用对应的语言(C#)去实现,模拟登陆的逻辑
看懂了上述用F12分析出来的百度首页的登陆的内部逻辑过程,接下来,用C#代码去实现,相对来说,就不是很难了。
注:
(1)关于在C#中如何利用cookie,不熟悉的,先去看:
【经验总结】Http,网页访问,HttpRequest,HttpResponse相关的知识
(2)对于正则表达式不熟悉的,去参考:
(3)对C#中的正则表达式的类Regex,不熟悉的,可参考:
此处,再把分析出来的流程,贴出来,以便方便和代码对照:
顺序 |
访问地址 |
访问类型 |
发送的数据 |
需要获得/提取的返回的值 |
1 | http://www.baidu.com/ | GET | 无 | 返回的cookie中的BAIDUID | |
2 | https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true | GET | 包含BAIDUID这个cookie | 从返回的html中提取出token的值 | |
3 | https://passport.baidu.com/v2/api/?login | POST | 一堆的post data,其中token的值是之前提取出来的 | 需要验证返回的cookie中,是否包含BDUSS,PTOKEN,STOKEN,SAVEUSERID |
然后,最终就可以写出相关的,用于演示模拟登录百度首页的C#代码了。
【版本1:C#实现模拟登陆百度首页的完整代码 之 精简版】
其中,通过UI中,点击“获取cookie BAIDUID”:
然后调用下面这部分代码:
private void btnGetBaiduid_Click(object sender, EventArgs e) { //http://www.baidu.com/ string baiduMainUrl = txbBaiduMainUrl.Text; //generate http request HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl); //add follow code to handle cookies req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); req.Method = "GET"; //use request to get response HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); txbGotBaiduid.Text = ""; foreach (Cookie ck in resp.Cookies) { txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value; if (ck.Name == "BAIDUID") { gotCookieBaiduid = true; } } if (gotCookieBaiduid) { //store cookies curCookies = resp.Cookies; } else { MessageBox.Show("错误:没有找到cookie BAIDUID !"); } }
获得上述所看到的BAIDUID这个cookie的值了。
然后接着点击“获取token值”,然后调用下面的代码:
private void btnGetToken_Click(object sender, EventArgs e) { if (gotCookieBaiduid) { string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl); //add previously got cookies req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); req.Method = "GET"; HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); StreamReader sr = new StreamReader(resp.GetResponseStream()); string respHtml = sr.ReadToEnd(); //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3'; string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"; Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml); if (foundTokenVal.Success) { //extracted the token value txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value; extractTokenValueOK = true; } else { txbExtractedTokenVal.Text = "错误:没有找到token的值!"; } } else { MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !"); } }
就可以获取对应的token的值了:
接着再去填上你的百度的用户名和密码,然后再点击“模拟登陆百度首页”,就会调用如下代码:
private void btnEmulateLoginBaidu_Click(object sender, EventArgs e) { if (gotCookieBaiduid && extractTokenValueOK) { string staticpage = "http://www.baidu.com/cache/user/html/jump.html"; //init post dict info Dictionary<string, string> postDict = new Dictionary<string, string>(); //postDict.Add("ppui_logintime", ""); postDict.Add("charset", "utf-8"); //postDict.Add("codestring", ""); postDict.Add("token", txbExtractedTokenVal.Text); postDict.Add("isPhone", "false"); postDict.Add("index", "0"); //postDict.Add("u", ""); //postDict.Add("safeflg", "0"); postDict.Add("staticpage", staticpage); postDict.Add("loginType", "1"); postDict.Add("tpl", "mn"); postDict.Add("callback", "parent.bdPass.api.login._postCallback"); postDict.Add("username", txbBaiduUsername.Text); postDict.Add("password", txbBaiduPassword.Text); //postDict.Add("verifycode", ""); postDict.Add("mem_pass", "on"); string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login"; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl); //add cookie req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); //set to POST req.Method = "POST"; req.ContentType = "application/x-www-form-urlencoded"; //prepare post data string postDataStr = quoteParas(postDict); byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr); req.ContentLength = postBytes.Length; //send post data Stream postDataStream = req.GetRequestStream(); postDataStream.Write(postBytes, 0, postBytes.Length); postDataStream.Close(); //got response HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); //got returned html StreamReader sr = new StreamReader(resp.GetResponseStream()); string loginBaiduRespHtml = sr.ReadToEnd(); //check whether got all expected cookies Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>(); string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"}; foreach (String cookieToCheck in cookiesNameList) { cookieCheckDict.Add(cookieToCheck, false); } foreach (Cookie singleCookie in resp.Cookies) { if (cookieCheckDict.ContainsKey(singleCookie.Name)) { cookieCheckDict[singleCookie.Name] = true; } } bool allCookiesFound = true; foreach (bool foundCurCookie in cookieCheckDict.Values) { allCookiesFound = allCookiesFound && foundCurCookie; } loginBaiduOk = allCookiesFound; if (loginBaiduOk) { txbEmulateLoginResult.Text = "成功模拟登陆百度首页!"; } else { txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!"; txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:"; txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString(); txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine; txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:"; txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml; } } else { MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"); } }
如果用户名和密码都是正确的话,即可成功登陆:
当然,如果故意输入错误的用户名和密码,则会显示登陆错误,并且打印出返回的headers值和html代码:
完整的C#模拟登陆百度首页的代码,如下:
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Net; using System.IO; using System.Text.RegularExpressions; using System.Web; namespace emulateLoginBaidu { public partial class frmEmulateLoginBaidu : Form { CookieCollection curCookies = null; bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk; public frmEmulateLoginBaidu() { InitializeComponent(); } private void frmEmulateLoginBaidu_Load(object sender, EventArgs e) { //init curCookies = new CookieCollection(); gotCookieBaiduid = false; extractTokenValueOK = false; loginBaiduOk = false; } /****************************************************************************** functions in crifanLib.cs *******************************************************************************/ //quote the input dict values //note: the return result for first para no '&' public string quoteParas(Dictionary<string, string> paras) { string quotedParas = ""; bool isFirst = true; string val = ""; foreach (string para in paras.Keys) { if (paras.TryGetValue(para, out val)) { if (isFirst) { isFirst = false; quotedParas += para + "=" + HttpUtility.UrlPathEncode(val); } else { quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val); } } else { break; } } return quotedParas; } /****************************************************************************** Demo emulate login baidu related functions *******************************************************************************/ private void btnGetBaiduid_Click(object sender, EventArgs e) { //http://www.baidu.com/ string baiduMainUrl = txbBaiduMainUrl.Text; //generate http request HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainUrl); //add follow code to handle cookies req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); req.Method = "GET"; //use request to get response HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); txbGotBaiduid.Text = ""; foreach (Cookie ck in resp.Cookies) { txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value; if (ck.Name == "BAIDUID") { gotCookieBaiduid = true; } } if (gotCookieBaiduid) { //store cookies curCookies = resp.Cookies; } else { MessageBox.Show("错误:没有找到cookie BAIDUID !"); } } private void btnGetToken_Click(object sender, EventArgs e) { if (gotCookieBaiduid) { string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(getapiUrl); //add previously got cookies req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); req.Method = "GET"; HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); StreamReader sr = new StreamReader(resp.GetResponseStream()); string respHtml = sr.ReadToEnd(); //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3'; string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"; Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml); if (foundTokenVal.Success) { //extracted the token value txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value; extractTokenValueOK = true; } else { txbExtractedTokenVal.Text = "错误:没有找到token的值!"; } } else { MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !"); } } private void btnEmulateLoginBaidu_Click(object sender, EventArgs e) { if (gotCookieBaiduid && extractTokenValueOK) { string staticpage = "http://www.baidu.com/cache/user/html/jump.html"; //init post dict info Dictionary<string, string> postDict = new Dictionary<string, string>(); //postDict.Add("ppui_logintime", ""); postDict.Add("charset", "utf-8"); //postDict.Add("codestring", ""); postDict.Add("token", txbExtractedTokenVal.Text); postDict.Add("isPhone", "false"); postDict.Add("index", "0"); //postDict.Add("u", ""); //postDict.Add("safeflg", "0"); postDict.Add("staticpage", staticpage); postDict.Add("loginType", "1"); postDict.Add("tpl", "mn"); postDict.Add("callback", "parent.bdPass.api.login._postCallback"); postDict.Add("username", txbBaiduUsername.Text); postDict.Add("password", txbBaiduPassword.Text); //postDict.Add("verifycode", ""); postDict.Add("mem_pass", "on"); string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login"; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(baiduMainLoginUrl); //add cookie req.CookieContainer = new CookieContainer(); req.CookieContainer.Add(curCookies); //set to POST req.Method = "POST"; req.ContentType = "application/x-www-form-urlencoded"; //prepare post data string postDataStr = quoteParas(postDict); byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr); req.ContentLength = postBytes.Length; //send post data Stream postDataStream = req.GetRequestStream(); postDataStream.Write(postBytes, 0, postBytes.Length); postDataStream.Close(); //got response HttpWebResponse resp = (HttpWebResponse)req.GetResponse(); //got returned html StreamReader sr = new StreamReader(resp.GetResponseStream()); string loginBaiduRespHtml = sr.ReadToEnd(); //check whether got all expected cookies Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>(); string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"}; foreach (String cookieToCheck in cookiesNameList) { cookieCheckDict.Add(cookieToCheck, false); } foreach (Cookie singleCookie in resp.Cookies) { if (cookieCheckDict.ContainsKey(singleCookie.Name)) { cookieCheckDict[singleCookie.Name] = true; } } bool allCookiesFound = true; foreach (bool foundCurCookie in cookieCheckDict.Values) { allCookiesFound = allCookiesFound && foundCurCookie; } loginBaiduOk = allCookiesFound; if (loginBaiduOk) { txbEmulateLoginResult.Text = "成功模拟登陆百度首页!"; } else { txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!"; txbEmulateLoginResult.Text += Environment.NewLine + "所返回的Header信息为:"; txbEmulateLoginResult.Text += Environment.NewLine + resp.Headers.ToString(); txbEmulateLoginResult.Text += Environment.NewLine + Environment.NewLine; txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:"; txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml; } } else { MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"); } } private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e) { string emulateLoginTutorialUrl = "https://www.crifan.org/emulate_login_website_using_csharp"; System.Diagnostics.Process.Start(emulateLoginTutorialUrl); } private void btnClearAll_Click(object sender, EventArgs e) { curCookies = new CookieCollection(); gotCookieBaiduid = false; extractTokenValueOK = false; loginBaiduOk = false; txbGotBaiduid.Text = ""; txbExtractedTokenVal.Text = ""; txbBaiduUsername.Text = ""; txbBaiduPassword.Text = ""; txbEmulateLoginResult.Text = ""; } } }
对应的,完整的VS2010的C#项目,可以去这里下载:
emulateLoginBaidu_csharp_2012-11-07.7z
【版本2:C#实现模拟登陆百度首页的完整代码 之 crifanLib.py版】
后来,又把上述代码,改为利用我的C#版本的crifanLib.cs,以方便以后再次利用相关的网络方面的库函数。
下面是完整的,利用到crifanLib.cs的版本,的C#代码:
using System; using System.Collections.Generic; using System.ComponentModel; using System.Data; using System.Drawing; using System.Text; using System.Windows.Forms; using System.Net; using System.IO; using System.Text.RegularExpressions; using System.Web; namespace emulateLoginBaidu { public partial class frmEmulateLoginBaidu : Form { CookieCollection curCookies = null; bool gotCookieBaiduid, extractTokenValueOK, loginBaiduOk; public frmEmulateLoginBaidu() { InitializeComponent(); } private void frmEmulateLoginBaidu_Load(object sender, EventArgs e) { this.AcceptButton = this.btnEmulateLoginBaidu; //init for crifanLib.cs curCookies = new CookieCollection(); //init for demo login gotCookieBaiduid = false; extractTokenValueOK = false; loginBaiduOk = false; } /****************************************************************************** functions in crifanLib.cs Online browser: http://code.google.com/p/crifanlib/source/browse/trunk/csharp/crifanLib.cs Download: http://code.google.com/p/crifanlib/ *******************************************************************************/ //quote the input dict values //note: the return result for first para no '&' public string quoteParas(Dictionary<string, string> paras) { string quotedParas = ""; bool isFirst = true; string val = ""; foreach (string para in paras.Keys) { if (paras.TryGetValue(para, out val)) { if (isFirst) { isFirst = false; quotedParas += para + "=" + HttpUtility.UrlPathEncode(val); } else { quotedParas += "&" + para + "=" + HttpUtility.UrlPathEncode(val); } } else { break; } } return quotedParas; } /*********************************************************************/ /* cookie */ /*********************************************************************/ //add a single cookie to cookies, if already exist, update its value public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies, bool overwriteDomain) { bool found = false; if (cookies.Count > 0) { foreach (Cookie originalCookie in cookies) { if (originalCookie.Name == toAdd.Name) { // !!! for different domain, cookie is not same, // so should not set the cookie value here while their domains is not same // only if it explictly need overwrite domain if ((originalCookie.Domain == toAdd.Domain) || ((originalCookie.Domain != toAdd.Domain) && overwriteDomain)) { //here can not force convert CookieCollection to HttpCookieCollection, //then use .remove to remove this cookie then add // so no good way to copy all field value originalCookie.Value = toAdd.Value; originalCookie.Domain = toAdd.Domain; originalCookie.Expires = toAdd.Expires; originalCookie.Version = toAdd.Version; originalCookie.Path = toAdd.Path; //following fields seems should not change //originalCookie.HttpOnly = toAdd.HttpOnly; //originalCookie.Secure = toAdd.Secure; found = true; break; } } } } if (!found) { if (toAdd.Domain != "") { // if add the null domain, will lead to follow req.CookieContainer.Add(cookies) failed !!! cookies.Add(toAdd); } } }//addCookieToCookies //add singel cookie to cookies, default no overwrite domain public void addCookieToCookies(Cookie toAdd, ref CookieCollection cookies) { addCookieToCookies(toAdd, ref cookies, false); } //check whether the cookies contains the ckToCheck cookie //support: //ckTocheck is Cookie/string //cookies is Cookie/string/CookieCollection/string[] public bool isContainCookie(object ckToCheck, object cookies) { bool isContain = false; if ((ckToCheck != null) && (cookies != null)) { string ckName = ""; Type type = ckToCheck.GetType(); //string typeStr = ckType.ToString(); //if (ckType.FullName == "System.string") if (type.Name.ToLower() == "string") { ckName = (string)ckToCheck; } else if (type.Name == "Cookie") { ckName = ((Cookie)ckToCheck).Name; } if (ckName != "") { type = cookies.GetType(); // is single Cookie if (type.Name == "Cookie") { if (ckName == ((Cookie)cookies).Name) { isContain = true; } } // is CookieCollection else if (type.Name == "CookieCollection") { foreach (Cookie ck in (CookieCollection)cookies) { if (ckName == ck.Name) { isContain = true; break; } } } // is single cookie name string else if (type.Name.ToLower() == "string") { if (ckName == (string)cookies) { isContain = true; } } // is cookie name string[] else if (type.Name.ToLower() == "string[]") { foreach (string name in ((string[])cookies)) { if (ckName == name) { isContain = true; break; } } } } } return isContain; }//isContainCookie // update cookiesToUpdate to localCookies // if omitUpdateCookies designated, then omit cookies of omitUpdateCookies in cookiesToUpdate public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies, object omitUpdateCookies) { if (cookiesToUpdate.Count > 0) { if (localCookies == null) { localCookies = cookiesToUpdate; } else { foreach (Cookie newCookie in cookiesToUpdate) { if (isContainCookie(newCookie, omitUpdateCookies)) { // need omit process this } else { addCookieToCookies(newCookie, ref localCookies); } } } } }//updateLocalCookies //update cookiesToUpdate to localCookies public void updateLocalCookies(CookieCollection cookiesToUpdate, ref CookieCollection localCookies) { updateLocalCookies(cookiesToUpdate, ref localCookies, null); } /*********************************************************************/ /* HTTP */ /*********************************************************************/ /* get url's response */ public HttpWebResponse getUrlResponse(string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict, int timeout, string postDataStr) { //CookieCollection parsedCookies; HttpWebResponse resp = null; HttpWebRequest req = (HttpWebRequest)WebRequest.Create(url); req.AllowAutoRedirect = true; req.Accept = "*/*"; //const string gAcceptLanguage = "en-US"; // zh-CN/en-US //req.Headers["Accept-Language"] = gAcceptLanguage; req.KeepAlive = true; //IE8 //const string gUserAgent = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; InfoPath.3; .NET4.0C; .NET4.0E"; //IE9 //const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0)"; // x64 const string gUserAgent = "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)"; // x86 //Chrome //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/533.4 (KHTML, like Gecko) Chrome/5.0.375.99 Safari/533.4"; //Mozilla Firefox //const string gUserAgent = "Mozilla/5.0 (Windows; U; Windows NT 6.1; rv:1.9.2.6) Gecko/20100625 Firefox/3.6.6"; req.UserAgent = gUserAgent; req.Headers["Accept-Encoding"] = "gzip, deflate"; req.AutomaticDecompression = DecompressionMethods.GZip; req.Proxy = null; if (timeout > 0) { req.Timeout = timeout; } if (curCookies != null) { req.CookieContainer = new CookieContainer(); req.CookieContainer.PerDomainCapacity = 40; // following will exceed max default 20 cookie per domain req.CookieContainer.Add(curCookies); } if (headerDict != null) { foreach (string header in headerDict.Keys) { string headerValue = ""; if (headerDict.TryGetValue(header, out headerValue)) { // following are allow the caller overwrite the default header setting if (header.ToLower() == "referer") { req.Referer = headerValue; } else if (header.ToLower() == "allowautoredirect") { bool isAllow = false; if (bool.TryParse(headerValue, out isAllow)) { req.AllowAutoRedirect = isAllow; } } else if (header.ToLower() == "accept") { req.Accept = headerValue; } else if (header.ToLower() == "keepalive") { bool isKeepAlive = false; if (bool.TryParse(headerValue, out isKeepAlive)) { req.KeepAlive = isKeepAlive; } } else if (header.ToLower() == "accept-language") { req.Headers["Accept-Language"] = headerValue; } else if (header.ToLower() == "useragent") { req.UserAgent = headerValue; } else { req.Headers[header] = headerValue; } } else { break; } } } if (postDict != null || postDataStr != "") { req.Method = "POST"; req.ContentType = "application/x-www-form-urlencoded"; if (postDict != null) { postDataStr = quoteParas(postDict); } //byte[] postBytes = Encoding.GetEncoding("utf-8").GetBytes(postData); byte[] postBytes = Encoding.UTF8.GetBytes(postDataStr); req.ContentLength = postBytes.Length; Stream postDataStream = req.GetRequestStream(); postDataStream.Write(postBytes, 0, postBytes.Length); postDataStream.Close(); } else { req.Method = "GET"; } //may timeout, has fixed in: //https://www.crifan.org/fixed_problem_sometime_httpwebrequest_getresponse_timeout/ resp = (HttpWebResponse)req.GetResponse(); updateLocalCookies(resp.Cookies, ref curCookies); return resp; } public HttpWebResponse getUrlResponse(string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict) { return getUrlResponse(url, headerDict, postDict, 0, ""); } public HttpWebResponse getUrlResponse(string url) { return getUrlResponse(url, null, null, 0, ""); } // valid charset:"GB18030"/"UTF-8", invliad:"UTF8" public string getUrlRespHtml(string url, Dictionary<string, string> headerDict, string charset, Dictionary<string, string> postDict, int timeout, string postDataStr) { string respHtml = ""; //HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout); HttpWebResponse resp = getUrlResponse(url, headerDict, postDict, timeout, postDataStr); //long realRespLen = resp.ContentLength; StreamReader sr; if ((charset != null) && (charset != "")) { Encoding htmlEncoding = Encoding.GetEncoding(charset); sr = new StreamReader(resp.GetResponseStream(), htmlEncoding); } else { sr = new StreamReader(resp.GetResponseStream()); } respHtml = sr.ReadToEnd(); return respHtml; } public string getUrlRespHtml(string url, Dictionary<string, string> headerDict, string charset, Dictionary<string, string> postDict, string postDataStr) { return getUrlRespHtml(url, headerDict, charset, postDict, 0, postDataStr); } public string getUrlRespHtml(string url, Dictionary<string, string> headerDict, Dictionary<string, string> postDict) { return getUrlRespHtml(url, headerDict, "", postDict, ""); } public string getUrlRespHtml(string url, Dictionary<string, string> headerDict) { return getUrlRespHtml(url, headerDict, null); } public string getUrlRespHtml(string url, string charset, int timeout) { return getUrlRespHtml(url, null, charset, null, timeout, ""); } public string getUrlRespHtml(string url, string charset) { return getUrlRespHtml(url, charset, 0); } public string getUrlRespHtml(string url) { return getUrlRespHtml(url, ""); } /****************************************************************************** Demo emulate login baidu related functions *******************************************************************************/ private void btnGetBaiduid_Click(object sender, EventArgs e) { //http://www.baidu.com/ string baiduMainUrl = txbBaiduMainUrl.Text; HttpWebResponse resp = getUrlResponse(baiduMainUrl); txbGotBaiduid.Text = ""; foreach (Cookie ck in resp.Cookies) { txbGotBaiduid.Text += "[" + ck.Name + "]=" + ck.Value; if (ck.Name == "BAIDUID") { gotCookieBaiduid = true; } } if (gotCookieBaiduid) { //store cookies curCookies = resp.Cookies; } else { MessageBox.Show("错误:没有找到cookie BAIDUID !"); } } private void btnGetToken_Click(object sender, EventArgs e) { if (gotCookieBaiduid) { string getapiUrl = "https://passport.baidu.com/v2/api/?getapi&class=login&tpl=mn&tangram=true"; string respHtml = getUrlRespHtml(getapiUrl); //bdPass.api.params.login_token='5ab690978812b0e7fbbe1bfc267b90b3'; string tokenValP = @"bdPass\.api\.params\.login_token='(?<tokenVal>\w+)';"; Match foundTokenVal = (new Regex(tokenValP)).Match(respHtml); if (foundTokenVal.Success) { //extracted the token value txbExtractedTokenVal.Text = foundTokenVal.Groups["tokenVal"].Value; extractTokenValueOK = true; } else { txbExtractedTokenVal.Text = "错误:没有找到token的值!"; } } else { MessageBox.Show("错误:之前没有正确获得Cookie:BAIDUID !"); } } private void btnEmulateLoginBaidu_Click(object sender, EventArgs e) { if (gotCookieBaiduid && extractTokenValueOK) { string staticpage = "http://www.baidu.com/cache/user/html/jump.html"; //init post dict info Dictionary<string, string> postDict = new Dictionary<string, string>(); //postDict.Add("ppui_logintime", ""); postDict.Add("charset", "utf-8"); //postDict.Add("codestring", ""); postDict.Add("token", txbExtractedTokenVal.Text); postDict.Add("isPhone", "false"); postDict.Add("index", "0"); //postDict.Add("u", ""); //postDict.Add("safeflg", "0"); postDict.Add("staticpage", staticpage); postDict.Add("loginType", "1"); postDict.Add("tpl", "mn"); postDict.Add("callback", "parent.bdPass.api.login._postCallback"); postDict.Add("username", txbBaiduUsername.Text); postDict.Add("password", txbBaiduPassword.Text); //postDict.Add("verifycode", ""); postDict.Add("mem_pass", "on"); string baiduMainLoginUrl = "https://passport.baidu.com/v2/api/?login"; string loginBaiduRespHtml = getUrlRespHtml(baiduMainLoginUrl, null, postDict); //check whether got all expected cookies Dictionary<string, bool> cookieCheckDict = new Dictionary<string, bool>(); string[] cookiesNameList = {"BDUSS", "PTOKEN", "STOKEN", "SAVEUSERID"}; foreach (String cookieToCheck in cookiesNameList) { cookieCheckDict.Add(cookieToCheck, false); } foreach (Cookie singleCookie in curCookies) { if (cookieCheckDict.ContainsKey(singleCookie.Name)) { cookieCheckDict[singleCookie.Name] = true; } } bool allCookiesFound = true; foreach (bool foundCurCookie in cookieCheckDict.Values) { allCookiesFound = allCookiesFound && foundCurCookie; } loginBaiduOk = allCookiesFound; if (loginBaiduOk) { txbEmulateLoginResult.Text = "成功模拟登陆百度首页!"; } else { txbEmulateLoginResult.Text = "模拟登陆百度首页 失败!"; txbEmulateLoginResult.Text += Environment.NewLine + "所返回的HTML源码为:"; txbEmulateLoginResult.Text += Environment.NewLine + loginBaiduRespHtml; } } else { MessageBox.Show("错误:没有正确获得Cookie BAIDUID 和/或 没有正确提取出token值!"); } } private void lklEmulateLoginTutorialUrl_LinkClicked(object sender, LinkLabelLinkClickedEventArgs e) { string emulateLoginTutorialUrl = "https://www.crifan.org/emulate_login_website_using_csharp"; System.Diagnostics.Process.Start(emulateLoginTutorialUrl); } private void btnClearAll_Click(object sender, EventArgs e) { curCookies = new CookieCollection(); gotCookieBaiduid = false; extractTokenValueOK = false; loginBaiduOk = false; txbGotBaiduid.Text = ""; txbExtractedTokenVal.Text = ""; txbBaiduUsername.Text = ""; txbBaiduPassword.Text = ""; txbEmulateLoginResult.Text = ""; } } }
完整的VS2010的项目,可去这里下载:
emulateLoginBaidu_csharp_crifanLibVersion_2012-11-07.7z
关于crifanLib.cs:
在线浏览:crifanLib.cs
【总结】
可以看出,虽然之前分析出来的,模拟登陆百度首页的流程,相对不是那么复杂,但是实际上用C#实现起来,要比用Python实现出来,要复杂的多。
主要原因在于,Python中封装了很多常用的,好用的库函数。而C#中,很多细节,都需要自己处理,包括GET或POST时的各种参数,都要考虑到,另外尤其是涉及cookie等方面的内容,很是繁琐。
所以,对于抓取网页分析内容,和模拟登陆网站来说,还是Python用起来比较方便。
【后记 2013-09-11】
1.经过研究:
【记录】研究模拟登陆百度的C#代码为何在.NET 4.0中不工作
的确是:
之前的代码, 在.NET 3.5之前,都是正常工作的,而在.NET 4.0中,是不工作的;
2.现已找到原因并修复。
原因是:
.NET 4.0,对于没有指定expires域的cookie,会把cookie的expires域值设置成默认的0001年0分0秒,由此导致该cookie过期失效,导致百度的那个cookie:
H_PS_PSSID
失效,导致后续操作都异常了。
而.NET 3.5之前,虽然cookie的expires域值也是默认的0001年0分0秒,但是实际上cookie还是可用的,所以后续就正常,就不会发生此问题;
3.修复后的代码:
供下载:
(1)模拟百度登陆 独立完整代码版本 .NET 4.0
emulateLoginBaidu_csharp_independentCodeVersion_2013-09-11.7z
(2)模拟百度登陆 (利用我自己的)crifanLib版本 .NET 4.0
emulateLoginBaidu_csharp_crifanLibVersion_2013-09-11.7z
(抽空再上传上面两个文件,因为此处上传出错:
xxx.7z: unknown Bytes complete FAILED! :Upload canceled : VIRUS DETECTED! (Heuristics.Broken.Executable FOUND) |
抽空换个时间上传试试。还是同样错误的话,再去解决。)
【总结】
.NET 不论是3.5以及之前,还是最新的4.0,在解析http的response中的Set-Cookie变成CookieCollection方面:
一直就是狗屎,bug一堆。
详见:
以后,能少用那个resp.Cookies,就少用吧。
否则被C#玩死,都不知道怎么死的。
还是用自己写的那个解析函数去解析Set-Cookie,得到正确的CookieCollection吧。
详见: