samedi 28 février 2015

Python - Logging in to web scrape

I'm trying to web-scrape a page on www.roblox.com that requires me to be logged in. I have done this using the .ROBLOSECURITY cookie, however, that cookie changes every few days. I want to instead log in using the login form and Python. The form and what I have so far is below. I do NOT want to use any add-on libraries like mechanize or requests.


Form:



<form action="/newlogin" id="loginForm" method="post" novalidate="novalidate" _lpchecked="1"> <div id="loginarea" class="divider-bottom" data-is-captcha-on="False">
<div id="leftArea">
<div id="loginPanel">
<table id="logintable">
<tbody><tr id="username">
<td><label class="form-label" for="Username">Username:</label></td>
<td><input class="text-box text-box-medium valid" data-val="true" data-val-required="The Username field is required." id="Username" name="Username" type="text" value="" autocomplete="off" aria-required="true" aria-invalid="false" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
</tr>
<tr id="password">
<td><label class="form-label" for="Password">Password:</label></td>
<td><input class="text-box text-box-medium" data-val="true" data-val-required="The Password field is required." id="Password" name="Password" type="password" autocomplete="off" style="cursor: auto; background-image: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAEAAAABCAQAAAC1HAwCAAAAC0lEQVR4nGP6zwAAAgcBApocMXEAAAAASUVORK5CYII=);"></td>
</tr>
</tbody></table>
<div>
</div>
<div>
<div id="forgotPasswordPanel">
<a class="text-link" href="/Login/ResetPasswordRequest.aspx" target="_blank">Forgot your password?</a>
</div>
<div id="signInButtonPanel" data-use-apiproxy-signin="False" data-sign-on-api-path="http://ift.tt/1EBcovl">
<a roblox-js-onclick="" class="btn-medium btn-neutral">Sign In</a>
<a roblox-js-oncancel="" class="btn-medium btn-negative">Cancel</a>
</div>
<div class="clearFloats">
</div>
</div>
<span id="fb-root">
<div id="SplashPageConnect" class="fbSplashPageConnect">
<a class="facebook-login" href="/Facebook/SignIn?returnTo=/home" ref="form-facebook">
<span class="left"></span>
<span class="middle">Login with Facebook<span>Login with Facebook</span></span>
<span class="right"></span>
</a>
</div>
</span>
</div>
</div>
<div id="rightArea" class="divider-left">
<div id="signUpPanel" class="FrontPageLoginBox">
<p class="text">Not a member?</p>
<h2>Sign Up to Build &amp; Make Friends</h2>


Sign Up


^Don't know what that "Sign Up" thing is doing there, can't delete it.


What I have so far:



import cookielib
import urllib
import urllib2

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

opener.addheaders = [('User-agent', 'Mozilla/5.0')]

urllib2.install_opener(opener)

authentication_url = 'http://ift.tt/1AVbOUS'

payload = {
'ReturnUrl' : 'http://ift.tt/1sfn0d7',
'Username' : 'usernamehere',
'Password' : 'passwordhere'
}

data = urllib.urlencode(payload)

req = urllib2.Request(authentication_url, data)

resp = urllib2.urlopen(req)
contents = resp.read()
print contents


I am very new to Python so I don't know how much of this works. Please let me know what is wrong with my code; I only get the log in page when I print contents


PS: The login page is HTTPS


Aucun commentaire:

Enregistrer un commentaire