docs/index.txt - external/github.com/jjlee/mechanize - Git at Google

 % mechanize

 Stateful programmatic web browsing in Python, after Andy Lester's Perl
 module [`WWW::Mechanize`](http://search.cpan.org/dist/WWW-Mechanize/).

   * `mechanize.Browser` and `mechanize.UserAgentBase` implement the
     interface of `urllib2.OpenerDirector`, so:

       * any URL can be opened, not just `http:`

       * `mechanize.UserAgentBase` offers easy dynamic configuration of
         user-agent features like protocol, cookie, redirection and
         `robots.txt` handling, without having to make a new
         `OpenerDirector` each time, e.g. by calling `build_opener()`.

   * Easy HTML form filling.

   * Convenient link parsing and following.

   * Browser history (`.back()` and `.reload()` methods).

   * The `Referer` HTTP header is added properly (optional).

   * Automatic observance of
     [`robots.txt`](http://www.robotstxt.org/wc/norobots.html).

   * Automatic handling of HTTP-Equiv and Refresh.


 Examples
 --------

 The examples below are written for a website that does not exist
 (`example.com`), so cannot be run.  There are also some [working
 examples](documentation.html#examples) that you can run.

 ~~~~{.python}
 import re
 import mechanize

 br = mechanize.Browser()
 br.open("http://www.example.com/")
 # follow second link with element text matching regular expression
 response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
 assert br.viewing_html()
 print br.title()
 print response1.geturl()
 print response1.info()  # headers
 print response1.read()  # body

 br.select_form(name="order")
 # Browser passes through unknown attributes (including methods)
 # to the selected HTMLForm.
 br["cheeses"] = ["mozzarella", "caerphilly"]  # (the method here is __setitem__)
 # Submit current form.  Browser calls .close() on the current response on
 # navigation, so this closes response1
 response2 = br.submit()

 # print currently selected form (don't call .submit() on this, use br.submit())
 print br.form

 response3 = br.back()  # back to cheese shop (same data as response1)
 # the history mechanism returns cached response objects
 # we can still use the response, even though it was .close()d
 response3.get_data()  # like .seek(0) followed by .read()
 response4 = br.reload()  # fetches from server

 for form in br.forms():
     print form
 # .links() optionally accepts the keyword args of .follow_/.find_link()
 for link in br.links(url_regex="python.org"):
     print link
     br.follow_link(link)  # takes EITHER Link instance OR keyword args
     br.back()
 ~~~~

 You may control the browser's policy by using the methods of
 `mechanize.Browser`'s base class, `mechanize.UserAgent`.  For example:

 ~~~~{.python}
 br = mechanize.Browser()
 # Explicitly configure proxies (Browser will attempt to set good defaults).
 # Note the userinfo ("joe:password@") and port number (":3128") are optional.
 br.set_proxies({"http": "joe:password@myproxy.example.com:3128",
                 "ftp": "proxy.example.com",
                 })
 # Add HTTP Basic/Digest auth username and password for HTTP proxy access.
 # (equivalent to using "joe:password@..." form above)
 br.add_proxy_password("joe", "password")
 # Add HTTP Basic/Digest auth username and password for website access.
 br.add_password("http://example.com/protected/", "joe", "password")
 # Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML).
 br.set_handle_equiv(False)
 # Ignore robots.txt.  Do not do this without thought and consideration.
 br.set_handle_robots(False)
 # Don't add Referer (sic) header
 br.set_handle_referer(False)
 # Don't handle Refresh redirections
 br.set_handle_refresh(False)
 # Don't handle cookies
 br.set_cookiejar()
 # Supply your own mechanize.CookieJar (NOTE: cookie handling is ON by
 # default: no need to do this unless you have some reason to use a
 # particular cookiejar)
 br.set_cookiejar(cj)
 # Log information about HTTP redirects and Refreshes.
 br.set_debug_redirects(True)
 # Log HTTP response bodies (ie. the HTML, most of the time).
 br.set_debug_responses(True)
 # Print HTTP headers.
 br.set_debug_http(True)

 # To make sure you're seeing all debug output:
 logger = logging.getLogger("mechanize")
 logger.addHandler(logging.StreamHandler(sys.stdout))
 logger.setLevel(logging.INFO)

 # Sometimes it's useful to process bad headers or bad HTML:
 response = br.response()  # this is a copy of response
 headers = response.info()  # currently, this is a mimetools.Message
 headers["Content-type"] = "text/html; charset=utf-8"
 response.set_data(response.get_data().replace("<!---", "<!--"))
 br.set_response(response)
 ~~~~

 mechanize exports the complete interface of `urllib2`:

 ~~~~{.python}
 import mechanize
 response = mechanize.urlopen("http://www.example.com/")
 print response.read()
 ~~~~

 When using mechanize, anything you would normally import from `urllib2` should
 be imported from mechanize instead.


 Credits
 -------

 Much of the code was originally derived from the work of the following people:

  * Gisle Aas -- [libwww-perl](http://search.cpan.org/dist/libwww-perl/)

  * Jeremy Hylton (and many others) --
 [urllib2](http://docs.python.org/release/2.6/library/urllib2.html)

  * Andy Lester -- [WWW::Mechanize](http://search.cpan.org/dist/WWW-Mechanize/)

  * Johnny Lee (coincidentally-named) -- MSIE CookieJar Perl code from which
 mechanize's support for that is derived.

 Also:

  * Gary Poster and Benji York at Zope Corporation -- contributed significant
 changes to the HTML forms code

  * Ronald Tschalar -- provided help with Netscape cookies

 Thanks also to the many people who have contributed [bug reports and
 patches](support.html).


 See also
 --------

 There are several wrappers around mechanize designed for functional testing of
 web applications:

   * [`zope.testbrowser`](http://cheeseshop.python.org/pypi?:action=display&name=zope.testbrowser)

   * [twill](http://twill.idyll.org/)

 See [the FAQ](faq.html) page for other links to related
 software.


 <!-- Local Variables: -->
 <!-- fill-column:79 -->
 <!-- End: -->
	% mechanize

	Stateful programmatic web browsing in Python, after Andy Lester's Perl
	module [`WWW::Mechanize`](http://search.cpan.org/dist/WWW-Mechanize/).

	* `mechanize.Browser` and `mechanize.UserAgentBase` implement the
	interface of `urllib2.OpenerDirector`, so:

	* any URL can be opened, not just `http:`

	* `mechanize.UserAgentBase` offers easy dynamic configuration of
	user-agent features like protocol, cookie, redirection and
	`robots.txt` handling, without having to make a new
	`OpenerDirector` each time, e.g. by calling `build_opener()`.

	* Easy HTML form filling.

	* Convenient link parsing and following.

	* Browser history (`.back()` and `.reload()` methods).

	* The `Referer` HTTP header is added properly (optional).

	* Automatic observance of
	[`robots.txt`](http://www.robotstxt.org/wc/norobots.html).

	* Automatic handling of HTTP-Equiv and Refresh.


	Examples
	--------

	The examples below are written for a website that does not exist
	(`example.com`), so cannot be run. There are also some [working
	examples](documentation.html#examples) that you can run.

	~~~~{.python}
	import re
	import mechanize

	br = mechanize.Browser()
	br.open("http://www.example.com/")
	# follow second link with element text matching regular expression
	response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)
	assert br.viewing_html()
	print br.title()
	print response1.geturl()
	print response1.info() # headers
	print response1.read() # body

	br.select_form(name="order")
	# Browser passes through unknown attributes (including methods)
	# to the selected HTMLForm.
	br["cheeses"] = ["mozzarella", "caerphilly"] # (the method here is __setitem__)
	# Submit current form. Browser calls .close() on the current response on
	# navigation, so this closes response1
	response2 = br.submit()

	# print currently selected form (don't call .submit() on this, use br.submit())
	print br.form

	response3 = br.back() # back to cheese shop (same data as response1)
	# the history mechanism returns cached response objects
	# we can still use the response, even though it was .close()d
	response3.get_data() # like .seek(0) followed by .read()
	response4 = br.reload() # fetches from server

	for form in br.forms():
	print form
	# .links() optionally accepts the keyword args of .follow_/.find_link()
	for link in br.links(url_regex="python.org"):
	print link
	br.follow_link(link) # takes EITHER Link instance OR keyword args
	br.back()
	~~~~

	You may control the browser's policy by using the methods of
	`mechanize.Browser`'s base class, `mechanize.UserAgent`. For example:

	~~~~{.python}
	br = mechanize.Browser()
	# Explicitly configure proxies (Browser will attempt to set good defaults).
	# Note the userinfo ("joe:password@") and port number (":3128") are optional.
	br.set_proxies({"http": "joe:password@myproxy.example.com:3128",
	"ftp": "proxy.example.com",
	})
	# Add HTTP Basic/Digest auth username and password for HTTP proxy access.
	# (equivalent to using "joe:password@..." form above)
	br.add_proxy_password("joe", "password")
	# Add HTTP Basic/Digest auth username and password for website access.
	br.add_password("http://example.com/protected/", "joe", "password")
	# Don't handle HTTP-EQUIV headers (HTTP headers embedded in HTML).
	br.set_handle_equiv(False)
	# Ignore robots.txt. Do not do this without thought and consideration.
	br.set_handle_robots(False)
	# Don't add Referer (sic) header
	br.set_handle_referer(False)
	# Don't handle Refresh redirections
	br.set_handle_refresh(False)
	# Don't handle cookies
	br.set_cookiejar()
	# Supply your own mechanize.CookieJar (NOTE: cookie handling is ON by
	# default: no need to do this unless you have some reason to use a
	# particular cookiejar)
	br.set_cookiejar(cj)
	# Log information about HTTP redirects and Refreshes.
	br.set_debug_redirects(True)
	# Log HTTP response bodies (ie. the HTML, most of the time).
	br.set_debug_responses(True)
	# Print HTTP headers.
	br.set_debug_http(True)

	# To make sure you're seeing all debug output:
	logger = logging.getLogger("mechanize")
	logger.addHandler(logging.StreamHandler(sys.stdout))
	logger.setLevel(logging.INFO)

	# Sometimes it's useful to process bad headers or bad HTML:
	response = br.response() # this is a copy of response
	headers = response.info() # currently, this is a mimetools.Message
	headers["Content-type"] = "text/html; charset=utf-8"
	response.set_data(response.get_data().replace("<!---", "<!--"))
	br.set_response(response)
	~~~~

	mechanize exports the complete interface of `urllib2`:

	~~~~{.python}
	import mechanize
	response = mechanize.urlopen("http://www.example.com/")
	print response.read()
	~~~~

	When using mechanize, anything you would normally import from `urllib2` should
	be imported from mechanize instead.


	Credits
	-------

	Much of the code was originally derived from the work of the following people:

	* Gisle Aas -- [libwww-perl](http://search.cpan.org/dist/libwww-perl/)

	* Jeremy Hylton (and many others) --
	[urllib2](http://docs.python.org/release/2.6/library/urllib2.html)

	* Andy Lester -- [WWW::Mechanize](http://search.cpan.org/dist/WWW-Mechanize/)

	* Johnny Lee (coincidentally-named) -- MSIE CookieJar Perl code from which
	mechanize's support for that is derived.

	Also:

	* Gary Poster and Benji York at Zope Corporation -- contributed significant
	changes to the HTML forms code

	* Ronald Tschalar -- provided help with Netscape cookies

	Thanks also to the many people who have contributed [bug reports and
	patches](support.html).


	See also
	--------

	There are several wrappers around mechanize designed for functional testing of
	web applications:

	* [`zope.testbrowser`](http://cheeseshop.python.org/pypi?:action=display&name=zope.testbrowser)

	* [twill](http://twill.idyll.org/)

	See [the FAQ](faq.html) page for other links to related
	software.


	<!-- Local Variables: -->
	<!-- fill-column:79 -->
	<!-- End: -->