I can't seem to get past the Login page. Here is an abridged version of my login page (http://ift.tt/1DtPTE1) using IE View Source:
<html ...>
<head ...></head>
<body>...
<form method="post" action="/support" id="mainform">
<input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
<input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
<input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="<stuff>" />
<script type="text/javascript">
//<![CDATA[
var theForm = document.forms['mainform'];
if (!theForm) {
theForm = document.mainform;
}
function __doPostBack(eventTarget, eventArgument) {
if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
theForm.__EVENTTARGET.value = eventTarget;
theForm.__EVENTARGUMENT.value = eventArgument;
theForm.submit();
}
}
//]]>
</script>
...
<input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="87894A7C" />
<input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="<stuff>" />
<input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="<stuff>" />...
<div id="maincontent_0_content_0_pnlLogin" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'maincontent_0_content_0_butLogin')">
<h2>HELP24 eSupport Portal</h2>
<input type="hidden" name="startURL" value="" />
<input type="hidden" name="loginURL" value="" />
<input type="hidden" name="useSecure" value="true" />
<input type="hidden" name="orgId" value="00D700000008gWM" />
<input type="hidden" name="portalId" value="06070000000DZJN" />
<input type="hidden" name="loginType" value="2" />
<label for="username">Username:</label>
<input type="text" id="username" name="username" maxlength="80" value="" class="captionblack" />
<label for="password">Password:</label>
<input type="password" id="password" name="password" maxlength="80" class="captionblack" />
<input type="submit" name="maincontent_0$content_0$butLogin" value="Log in" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions("maincontent_0$content_0$butLogin", "", false, "", "http://ift.tt/1O8cWzl;, false, false))"
id="maincontent_0_content_0_butLogin" />
</div>
...
</form>
</body>
</html>
I wrote this crawler to process the login page:
import scrapy
class ACIspider(scrapy.Spider):
name = "aci"
allowed_domains = ["aciworldwide.com"]
start_urls = [
"http://ift.tt/1DtPTUi"
]
def parse(self, response):
title = response.xpath('//title/text()').extract()
print 'Starting title is ' + title[0]
return scrapy.FormRequest.from_response(
response,
formdata={'username': 'myuser@my.com', 'password': 'mypass'},
clickdata={ 'type': 'submit' },
callback=self.after_login
)
def after_login(self, response):
print 'Hello next page'
# check login succeed before going on
if "authentication failed" in response.body:
self.log("Login failed", level=log.ERROR)
return
title = response.xpath('//title/text()').extract()
print 'Title is ' + title[0]
Here is an excerpt from my output:
[time] [aci] DEBUG: Redirecting (301) to http://ift.tt/1O8cVeH> from p://www.aciworldwide.com/support.aspx> [time] [aci] DEBUG: Crawled (200) http://ift.tt/1O8cVeH> (referer: None)
Starting title is Support
[time] [aci] DEBUG: Crawled (200) http://ift.tt/1O8cVeH> (referer: https://w http://ift.tt/1DtPTnm)
Hello next page
Title is Support
Note that I print the page title in the beginning and after the callback. It is the same page. What am I doing wrong that the response from the login is not the next page after authentication?
Aucun commentaire:
Enregistrer un commentaire