【背景】
关于模拟登陆,之前大多都是模拟用户名和密码登陆的:
中的:
先去用工具分析逻辑: 【教程】手把手教你如何利用工具(IE9的F12)去分析模拟登陆网站(百度首页)的内部逻辑过程 再去用代码实现,此处,目前已经实现了: |
没怎么处理过,同时上传文件的。
对于上传文件方面的模拟,之前有过:
但是失败了。(后来猜测,可能是自己的boundary设置错误而导致的)
和帖子:
中的评论:
最近需要用C# httpwebrequest的POST方法提交img图片到https的网站,遇到了诸多问题。 |
所以,打算去总结一下:
关于如何模拟上传文件。
【折腾过程】
1.对于上述的地址:
https://www.peuland.com/captcha/captcha_demo.htm
用IE10的F12去调试,随便选择一个文件:
E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif
然后上传,返回结果是:
| {"message":"user error","text":""} |
所得数据如下:
(1)请求标头(request header):
键 值 请求 POST /captcha/captchaimg.php HTTP/1.1 Accept text/html, application/xhtml+xml, */* Referer https://www.peuland.com/captcha/captcha_demo.htm Accept-Language zh-CN User-Agent Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0) Content-Type multipart/form-data; boundary=---------------------------7dd2e12d50c9a Accept-Encoding gzip, deflate Host www.peuland.com Content-Length 3720 DNT 1 Connection Keep-Alive Cache-Control no-cache
如图:
(2)请求正文(post data):
-----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="user" test -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="pwd" test -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="img"; filename="captcha.gif" Content-Type: image/gif <二进制文件数据未显示> ---------------------------7dd2e12d50c9a Content-Disposition: form-data; name="type" 100000 -----------------------------7dd2e12d50c9a Content-Disposition: form-data; name="button" 鎻愪氦 -----------------------------7dd2e12d50c9a--
如图:
2.对于上述信息,需要重点解释的是:
(1)request header中的Content-Type的值
设置为:multipart/form-data; boundary=xxx
此处是:
multipart/form-data; boundary=---------------------------7dd2e12d50c9a
表示后面的post data中,所要传送的值,是多个
此处即:
用户名user
密码pwd
数据类型type
等等。
(2)post data中的boundary值是boundary前面再加两个短横线
此处,post data中,可以看到,每部分的值,都是用:
-----------------------------7dd2e12d50c9a
分隔开的。
而此值,是上面的request header中的boundary的值:
---------------------------7dd2e12d50c9a
前面再加上两个短横线:
–
之后,而得到的,即:
post data中的多个部分的分隔符:
—————————–7dd2e12d50c9a
=
— + request header中的boundary值
=
— + —————————7dd2e12d50c9a
另外,在post data最后的:
—————————–7dd2e12d50c9a–
是额外,再加上两个短横线,而得到的,即:
post data最后的分隔符:
—————————–7dd2e12d50c9a–
=
— + request header中的boundary值 + —
=
— + —————————7dd2e12d50c9a + —
3.由此,就可以写出代码,去模拟此过程了。
此处,参考之前:
中的C#代码,再去参考别人的代码:
Upload files with HTTPWebrequest (multipart/form-data)
最后代码如下:
/*
* [File]
* frmEmulateUploadFile.cs
*
* [Function]
* emulate upload file using C# HTTPWebrequest code
* 【教程】模拟登陆分析之:分析如何模拟上传文件,其中涉及到Content-Disposition,multipart/form-data,boundary
* https://www.crifan.org/emulate_login_example_for_analysis_upload_file_multipart_form_data_content_disposition
*
* [Version]
* 2013-10-06
*
* [Author]
* Crifan Li
*
* [Contact]
* https://www.crifan.org/contact_me/
*
*/
using System;
using System.Collections.Generic;
using System.ComponentModel;
using System.Data;
using System.Drawing;
using System.Text;
using System.Windows.Forms;
using System.IO;
namespace EmulateUploadFile
{
public partial class frmEmulateUploadFile : Form
{
crifanLib crl;
public frmEmulateUploadFile()
{
crl = new crifanLib();
InitializeComponent();
}
public struct FileParameter
{
public string fileKeyInForm;
public string filename;
public string fileContentType;
public string fileContentStr;
}
public string generateMultiPartFormData(string boundaryStr, Dictionary<string, string> postFormDict, FileParameter fileToUpload)
{
//IE10 captured:
//(1)reuqest header:
//键 值
//请求 POST /captcha/captchaimg.php HTTP/1.1
//Accept text/html, application/xhtml+xml, */*
//Referer https://www.peuland.com/captcha/captcha_demo.htm
//Accept-Language zh-CN
//User-Agent Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)
//Content-Type multipart/form-data; boundary=---------------------------7dd2e12d50c9a
//Accept-Encoding gzip, deflate
//Host www.peuland.com
//Content-Length 3720
//DNT 1
//Connection Keep-Alive
//Cache-Control no-cache
//(2)post data:
// -----------------------------7dd2e12d50c9a
// Content-Disposition: form-data; name="user"
// test
// -----------------------------7dd2e12d50c9a
// Content-Disposition: form-data; name="pwd"
// test
// -----------------------------7dd2e12d50c9a
// Content-Disposition: form-data; name="img"; filename="captcha.gif"
// Content-Type: image/gif
// <二进制文件数据未显示>
// ---------------------------7dd2e12d50c9a
// Content-Disposition: form-data; name="type"
// 100000
// -----------------------------7dd2e12d50c9a
// Content-Disposition: form-data; name="button"
// 鎻愪氦
// -----------------------------7dd2e12d50c9a--
string multiPartFormDataStr = "";
string singlePartTemplate =
"--{0}\r\n"
+ "Content-Disposition: form-data; name=\"{1}\"\r\n"
+ "\r\n"
+ "{2}"
+ "\r\n" /* auto add CRLF for each line */;
//string tailTemplae = "\r\n--{0}--";
//string tailTemplae = "\r\n--{0}--\r\n";
string singlePartStr = "";
string fileParaTemplate =
"--{0}\r\n"
+ "Content-Disposition: form-data; name=\"{1}\"; filename=\"{2}\";\r\n"
+ "Content-Type: {3}\r\n\r\n"
+ "{4}";
string fileParaStr = String.Format(fileParaTemplate,
boundaryStr,
fileToUpload.fileKeyInForm,
fileToUpload.filename ?? fileToUpload.fileKeyInForm,
fileToUpload.fileContentType ?? "application/octet-stream",
fileToUpload.fileContentStr);
string tailTemplae = "--{0}--\r\n"; //previous lines already added CRLF
string tailStr = String.Format(tailTemplae, boundaryStr);
//1. post form data: key and value
if ((null != postFormDict) && (postFormDict.Count > 0))
{
foreach (string postKey in postFormDict.Keys)
{
string postValue = postFormDict[postKey];
singlePartStr = String.Format(singlePartTemplate, boundaryStr, postKey, postValue);
multiPartFormDataStr += singlePartStr;
}
}
//2. file parameters
multiPartFormDataStr += fileParaStr;
//3. add tail in the end
multiPartFormDataStr += tailStr;
return multiPartFormDataStr;
}
public void demoFileUpload()
{
//access main url
string mainUrl = "https://www.peuland.com/captcha/captcha_demo.htm";
string respHtml = crl.getUrlRespHtml(mainUrl);
//emulate upload file
string fileUploadUrl = "https://www.peuland.com/captcha/captchaimg.php";
string boundaryValue = "---------------------------7dd2e12d50c9a";
string contentTypeValueTemplate = "multipart/form-data; boundary={0}";
string contentTypeValue = String.Format(contentTypeValueTemplate, boundaryValue);
Dictionary<string, string> headerDict = new Dictionary<string, string>();
headerDict.Add("Content-Type", contentTypeValue);
Dictionary<string, string> postDict = new Dictionary<string, string>();
postDict.Add("user", "test");
postDict.Add("pwd", "test");
postDict.Add("type", "100000");
postDict.Add("button", "提交");
//string fileFullpath = "E:/Dev_Root/svn_dev_root/website/python/BlogsToWordpress/captcha/captcha.gif";
string fileFullpath = @"E:\Dev_Root\svn_dev_root\website\python\BlogsToWordpress\captcha\captcha.gif";
string fileContentStr = File.ReadAllText(fileFullpath);
FileParameter fileToUpload = new FileParameter();
fileToUpload.fileKeyInForm = "img";
fileToUpload.filename = "captcha.gif";
fileToUpload.fileContentType = "image/gif";
fileToUpload.fileContentStr = fileContentStr;
string postDataStr = generateMultiPartFormData(boundaryValue, postDict, fileToUpload);
string respJson = crl.getUrlRespHtml(fileUploadUrl, headerDict: headerDict, postDataStr: postDataStr);
//return:
//"{\"message\":\"user error\",\"text\":\"\"}"
//same as we have see in webbrowser
//Console.WriteLine(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson));
MessageBox.Show(String.Format("after emulate upload file {0}, returned json={1}", fileFullpath, respJson));
}
private void frmEmulateUploadFile_Load(object sender, EventArgs e)
{
demoFileUpload();
}
}
}运行效果如图:
注:
其中的crifanLib.cs,详见:
中的:
【总结】
总的来说,还是那句话:
用工具抓取网站内部执行逻辑
然后用代码模拟该逻辑。
更多内容,详见:
详解抓取网站,模拟登陆,抓取动态网页的原理和实现(Python,C#等)
转载请注明:在路上 » 【教程】模拟登陆之如何分析并用代码模拟上传文件