lundi 20 avril 2015

Does not get past Login page

I can't seem to get past the Login page. Here is an abridged version of my login page (http://ift.tt/1DtPTE1) using IE View Source:

<html ...>

<head ...></head>

<body>...

  <form method="post" action="/support" id="mainform">
    <input type="hidden" name="__EVENTTARGET" id="__EVENTTARGET" value="" />
    <input type="hidden" name="__EVENTARGUMENT" id="__EVENTARGUMENT" value="" />
    <input type="hidden" name="__VIEWSTATE" id="__VIEWSTATE" value="<stuff>" />

    <script type="text/javascript">
      //<![CDATA[
      var theForm = document.forms['mainform'];
      if (!theForm) {
        theForm = document.mainform;
      }

      function __doPostBack(eventTarget, eventArgument) {
          if (!theForm.onsubmit || (theForm.onsubmit() != false)) {
            theForm.__EVENTTARGET.value = eventTarget;
            theForm.__EVENTARGUMENT.value = eventArgument;
            theForm.submit();
          }
        }
        //]]>
    </script>
    ...
    <input type="hidden" name="__VIEWSTATEGENERATOR" id="__VIEWSTATEGENERATOR" value="87894A7C" />
    <input type="hidden" name="__PREVIOUSPAGE" id="__PREVIOUSPAGE" value="<stuff>" />
    <input type="hidden" name="__EVENTVALIDATION" id="__EVENTVALIDATION" value="<stuff>" />...
    <div id="maincontent_0_content_0_pnlLogin" onkeypress="javascript:return WebForm_FireDefaultButton(event, &#39;maincontent_0_content_0_butLogin&#39;)">

      <h2>HELP24 eSupport Portal</h2>
      <input type="hidden" name="startURL" value="" />
      <input type="hidden" name="loginURL" value="" />
      <input type="hidden" name="useSecure" value="true" />
      <input type="hidden" name="orgId" value="00D700000008gWM" />
      <input type="hidden" name="portalId" value="06070000000DZJN" />
      <input type="hidden" name="loginType" value="2" />
      <label for="username">Username:</label>
      <input type="text" id="username" name="username" maxlength="80" value="" class="captionblack" />
      <label for="password">Password:</label>
      <input type="password" id="password" name="password" maxlength="80" class="captionblack" />


      <input type="submit" name="maincontent_0$content_0$butLogin" value="Log in" onclick="javascript:WebForm_DoPostBackWithOptions(new WebForm_PostBackOptions(&quot;maincontent_0$content_0$butLogin&quot;, &quot;&quot;, false, &quot;&quot;, &quot;http://ift.tt/1O8cWzl;, false, false))"
      id="maincontent_0_content_0_butLogin" />
    </div>
    ...
  </form>
</body>

</html>

I wrote this crawler to process the login page:

import scrapy

class ACIspider(scrapy.Spider):
    name = "aci"
    allowed_domains = ["aciworldwide.com"]
    start_urls = [
        "http://ift.tt/1DtPTUi"
        ]

    def parse(self, response):
        title = response.xpath('//title/text()').extract()
        print 'Starting title is ' + title[0]
        return scrapy.FormRequest.from_response(
         response,
         formdata={'username': 'myuser@my.com', 'password': 'mypass'},
         clickdata={ 'type': 'submit' },
         callback=self.after_login
        )

    def after_login(self, response):
        print 'Hello next page'
        # check login succeed before going on
        if "authentication failed" in response.body:
            self.log("Login failed", level=log.ERROR)
            return

        title = response.xpath('//title/text()').extract()
        print 'Title is ' + title[0]

Here is an excerpt from my output:

[time] [aci] DEBUG: Redirecting (301) to http://ift.tt/1O8cVeH> from p://www.aciworldwide.com/support.aspx> [time] [aci] DEBUG: Crawled (200) http://ift.tt/1O8cVeH> (referer: None)
Starting title is Support
[time] [aci] DEBUG: Crawled (200) http://ift.tt/1O8cVeH> (referer: https://w http://ift.tt/1DtPTnm)
Hello next page
Title is Support

Note that I print the page title in the beginning and after the callback. It is the same page. What am I doing wrong that the response from the login is not the next page after authentication?

Aucun commentaire:

Enregistrer un commentaire