| % mechanize |
| |
| Stateful programmatic web browsing in Python, after Andy Lester's Perl |
| module [`WWW::Mechanize`](http://search.cpan.org/dist/WWW-Mechanize/). |
| |
| * `mechanize.Browser` and `mechanize.UserAgentBase` implement the |
| interface of `urllib2.OpenerDirector`, so: |
| |
| * any URL can be opened, not just `http:` |
| |
| * `mechanize.UserAgentBase` offers easy dynamic configuration of |
| user-agent features like protocol, cookie, redirection and |
| `robots.txt` handling, without having to make a new |
| `OpenerDirector` each time, e.g. by calling `build_opener()`. |
| |
| * Easy HTML form filling. |
| |
| * Convenient link parsing and following. |
| |
| * Browser history (`.back()` and `.reload()` methods). |
| |
| * The `Referer` HTTP header is added properly (optional). |
| |
| * Automatic observance of |
| [`robots.txt`](http://www.robotstxt.org/wc/norobots.html). |
| |
| * Automatic handling of HTTP-Equiv and Refresh. |
| |
| |
| Examples |
| -------- |
| |
| The examples below are written for a website that does not exist |
| (`example.com`), so cannot be run. There are also some [working |
| examples](documentation.html#examples) that you can run. |
| |
| ~~~~{.python} |
| import re |
| import mechanize |
| |
| br = mechanize.Browser() |
| br.open("http://www.example.com/") |
| # follow second link with element text matching regular expression |
| response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1) |
| assert br.viewing_html() |
| print br.title() |
| print response1.geturl() |
| print response1.info() # headers |
| print response1.read() # body |
| |
| br.select_form(name="order") |
| # Browser passes through unknown attributes (including methods) |
| # to the selected HTMLForm. |
| br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__) |
| # Submit current form. Browser calls .close() on the current response on |
| # navigation, so this closes response1 |
| response2 = br.submit() |
| |
| # print currently selected form (don't call .submit() on this, use br.submit()) |
| print br.form |
| |
| response3 = br.back() # back to cheese shop (same data as response1) |
| # the history mechanism returns cached response objects |
| # we can still use the response, even though it was .close()d |
| response3.get_data() # like .seek(0) followed by .read() |
| response4 = br.reload() # fetches from server |
| |
| for form in br.forms(): |
| print form |
| # .links() optionally accepts the keyword args of .follow_/.find_link() |
| for link in br.links(url_regex="python.org"): |
| print link |
| br.follow_link(link) # takes EITHER Link instance OR keyword args |
| br.back() |
| ~~~~ |
| |
| You may control the browser's policy by using the methods of |
| `mechanize.Browser`'s base class, `mechanize.UserAgent`. For example: |
| |
| ~~~~{.python} |
| br = mechanize.Browser() |
| # Explicitly configure proxies (Browser will attempt to set good defaults). |
| # Note the userinfo ("joe:password@") and port number (":3128") are optional. |
| br.set_proxies({"http": "joe:password@myproxy.example.com:3128", |
| "ftp": "proxy.example.com", |
| }) |
| # Add HTTP Basic/Digest auth username and password for HTTP proxy access. |
| # (equivalent to using "joe:password@..." form above) |
| br.add_proxy_password("joe", "password") |
| # Add HTTP Basic/Digest auth username and password for website access. |
| br.add_password("http://example.com/protected/", "joe", "password") |
| # Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML). |
| br.set_handle_equiv(False) |
| # Ignore robots.txt. Do not do this without thought and consideration. |
| br.set_handle_robots(False) |
| # Don't add Referer (sic) header |
| br.set_handle_referer(False) |
| # Don't handle Refresh redirections |
| br.set_handle_refresh(False) |
| # Don't handle cookies |
| br.set_cookiejar() |
| # Supply your own mechanize.CookieJar (NOTE: cookie handling is ON by |
| # default: no need to do this unless you have some reason to use a |
| # particular cookiejar) |
| br.set_cookiejar(cj) |
| # Log information about HTTP redirects and Refreshes. |
| br.set_debug_redirects(True) |
| # Log HTTP response bodies (ie. the HTML, most of the time). |
| br.set_debug_responses(True) |
| # Print HTTP headers. |
| br.set_debug_http(True) |
| |
| # To make sure you're seeing all debug output: |
| logger = logging.getLogger("mechanize") |
| logger.addHandler(logging.StreamHandler(sys.stdout)) |
| logger.setLevel(logging.INFO) |
| |
| # Sometimes it's useful to process bad headers or bad HTML: |
| response = br.response() # this is a copy of response |
| headers = response.info() # currently, this is a mimetools.Message |
| headers["Content-type"] = "text/html; charset=utf-8" |
| response.set_data(response.get_data().replace("<!---", "<!--")) |
| br.set_response(response) |
| ~~~~ |
| |
| mechanize exports the complete interface of `urllib2`: |
| |
| ~~~~{.python} |
| import mechanize |
| response = mechanize.urlopen("http://www.example.com/") |
| print response.read() |
| ~~~~ |
| |
| When using mechanize, anything you would normally import from `urllib2` should |
| be imported from mechanize instead. |
| |
| |
| Credits |
| ------- |
| |
| Much of the code was originally derived from the work of the following people: |
| |
| * Gisle Aas -- [libwww-perl](http://search.cpan.org/dist/libwww-perl/) |
| |
| * Jeremy Hylton (and many others) -- |
| [urllib2](http://docs.python.org/release/2.6/library/urllib2.html) |
| |
| * Andy Lester -- [WWW::Mechanize](http://search.cpan.org/dist/WWW-Mechanize/) |
| |
| * Johnny Lee (coincidentally-named) -- MSIE CookieJar Perl code from which |
| mechanize's support for that is derived. |
| |
| Also: |
| |
| * Gary Poster and Benji York at Zope Corporation -- contributed significant |
| changes to the HTML forms code |
| |
| * Ronald Tschalar -- provided help with Netscape cookies |
| |
| Thanks also to the many people who have contributed [bug reports and |
| patches](support.html). |
| |
| |
| See also |
| -------- |
| |
| There are several wrappers around mechanize designed for functional testing of |
| web applications: |
| |
| * [`zope.testbrowser`](http://cheeseshop.python.org/pypi?:action=display&name=zope.testbrowser) |
| |
| * [twill](http://twill.idyll.org/) |
| |
| See [the FAQ](faq.html) page for other links to related |
| software. |
| |
| |
| <!-- Local Variables: --> |
| <!-- fill-column:79 --> |
| <!-- End: --> |