This class defines the basic interface called by the tree builders. These methods will be called by the parser: reset() feed(markup) The tree builder may call these methods from its feed() implementation: handle_starttag(name, attrs) # See note about return value
| 48 | 'You are trying to run the Python 2 version of Beautiful Soup under Python 3. This will not work.'!='You need to convert the code, either by installing it (`python setup.py install`) or by running 2to3 (`2to3 -w bs4`).' |
| 49 | |
| 50 | class BeautifulSoup(Tag): |
| 51 | """ |
| 52 | This class defines the basic interface called by the tree builders. |
| 53 | |
| 54 | These methods will be called by the parser: |
| 55 | reset() |
| 56 | feed(markup) |
| 57 | |
| 58 | The tree builder may call these methods from its feed() implementation: |
| 59 | handle_starttag(name, attrs) # See note about return value |
| 60 | handle_endtag(name) |
| 61 | handle_data(data) # Appends to the current data node |
| 62 | endData(containerClass=NavigableString) # Ends the current data node |
| 63 | |
| 64 | No matter how complicated the underlying parser is, you should be |
| 65 | able to build a tree using 'start tag' events, 'end tag' events, |
| 66 | 'data' events, and "done with data" events. |
| 67 | |
| 68 | If you encounter an empty-element tag (aka a self-closing tag, |
| 69 | like HTML's <br> tag), call handle_starttag and then |
| 70 | handle_endtag. |
| 71 | """ |
| 72 | ROOT_TAG_NAME = '[document]' |
| 73 | |
| 74 | # If the end-user gives no indication which tree builder they |
| 75 | # want, look for one with these features. |
| 76 | DEFAULT_BUILDER_FEATURES = ['html', 'fast'] |
| 77 | |
| 78 | ASCII_SPACES = '\x20\x0a\x09\x0c\x0d' |
| 79 | |
| 80 | NO_PARSER_SPECIFIED_WARNING = "No parser was explicitly specified, so I'm using the best available %(markup_type)s parser for this system (\"%(parser)s\"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.\n\nTo get rid of this warning, change this:\n\n BeautifulSoup([your markup])\n\nto this:\n\n BeautifulSoup([your markup], \"%(parser)s\")\n" |
| 81 | |
| 82 | def __init__(self, markup="", features=None, builder=None, |
| 83 | parse_only=None, from_encoding=None, exclude_encodings=None, |
| 84 | **kwargs): |
| 85 | """The Soup object is initialized as the 'root tag', and the |
| 86 | provided markup (which can be a string or a file-like object) |
| 87 | is fed into the underlying parser.""" |
| 88 | |
| 89 | if 'convertEntities' in kwargs: |
| 90 | warnings.warn( |
| 91 | "BS4 does not respect the convertEntities argument to the " |
| 92 | "BeautifulSoup constructor. Entities are always converted " |
| 93 | "to Unicode characters.") |
| 94 | |
| 95 | if 'markupMassage' in kwargs: |
| 96 | del kwargs['markupMassage'] |
| 97 | warnings.warn( |
| 98 | "BS4 does not respect the markupMassage argument to the " |
| 99 | "BeautifulSoup constructor. The tree builder is responsible " |
| 100 | "for any necessary markup massage.") |
| 101 | |
| 102 | if 'smartQuotesTo' in kwargs: |
| 103 | del kwargs['smartQuotesTo'] |
| 104 | warnings.warn( |
| 105 | "BS4 does not respect the smartQuotesTo argument to the " |
| 106 | "BeautifulSoup constructor. Smart quotes are always converted " |
| 107 | "to Unicode characters.") |
no outgoing calls