MCPcopy
hub / github.com/csev/py4e / BeautifulSoup

Class BeautifulSoup

old/old-code/BeautifulSoup.py:1465–1616  ·  view source on GitHub ↗

This parser knows the following facts about HTML: * Some tags have no closing tag and should be interpreted as being closed as soon as they are encountered. * The text inside some tags (ie. 'script') may contain tags which are not really part of the document and which should be

Source from the content-addressed store, hash-verified

1463 return j
1464
1465class BeautifulSoup(BeautifulStoneSoup):
1466
1467 """This parser knows the following facts about HTML:
1468
1469 * Some tags have no closing tag and should be interpreted as being
1470 closed as soon as they are encountered.
1471
1472 * The text inside some tags (ie. 'script') may contain tags which
1473 are not really part of the document and which should be parsed
1474 as text, not tags. If you want to parse the text as tags, you can
1475 always fetch it and parse it explicitly.
1476
1477 * Tag nesting rules:
1478
1479 Most tags can't be nested at all. For instance, the occurance of
1480 a <p> tag should implicitly close the previous <p> tag.
1481
1482 <p>Para1<p>Para2
1483 should be transformed into:
1484 <p>Para1</p><p>Para2
1485
1486 Some tags can be nested arbitrarily. For instance, the occurance
1487 of a <blockquote> tag should _not_ implicitly close the previous
1488 <blockquote> tag.
1489
1490 Alice said: <blockquote>Bob said: <blockquote>Blah
1491 should NOT be transformed into:
1492 Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah
1493
1494 Some tags can be nested, but the nesting is reset by the
1495 interposition of other tags. For instance, a <tr> tag should
1496 implicitly close the previous <tr> tag within the same <table>,
1497 but not close a <tr> tag in another table.
1498
1499 <table><tr>Blah<tr>Blah
1500 should be transformed into:
1501 <table><tr>Blah</tr><tr>Blah
1502 but,
1503 <tr>Blah<table><tr>Blah
1504 should NOT be transformed into
1505 <tr>Blah<table></tr><tr>Blah
1506
1507 Differing assumptions about tag nesting rules are a major source
1508 of problems with the BeautifulSoup class. If BeautifulSoup is not
1509 treating as nestable a tag your page author treats as nestable,
1510 try ICantBelieveItsBeautifulSoup, MinimalSoup, or
1511 BeautifulStoneSoup before writing your own subclass."""
1512
1513 def __init__(self, *args, **kwargs):
1514 if not kwargs.has_key('smartQuotesTo'):
1515 kwargs['smartQuotesTo'] = self.HTML_ENTITIES
1516 kwargs['isHTML'] = True
1517 BeautifulStoneSoup.__init__(self, *args, **kwargs)
1518
1519 SELF_CLOSING_TAGS = buildTagMap(None,
1520 ('br' , 'hr', 'input', 'img', 'meta',
1521 'spacer', 'link', 'frame', 'base', 'col'))
1522

Callers 6

wikigrade.pyFile · 0.90
urllinks.pyFile · 0.90
urllink2.pyFile · 0.90
wsave.pyFile · 0.90
urlforever.pyFile · 0.90
BeautifulSoup.pyFile · 0.70

Calls 1

buildTagMapFunction · 0.70

Tested by

no test coverage detected