Detect HTML Attributes

  • + 0 comments

    The exact code will be: import re

    Read input

    N = int(raw_input()) html = '' for _ in range(N): html += raw_input()

    Extract tags and attributes using regex

    tags = {} pattern = r'<(\w+)([^>]*)>' matches = re.findall(pattern, html)

    for match in matches: tag, attrs = match if tag not in tags: tags[tag] = set()

    attr_pattern = r'(\w+)='
    attrs_match = re.findall(attr_pattern, attrs)
    tags[tag].update(attrs_match)
    

    Print output

    for tag in sorted(tags): if tag == 'a': attrs = ','.join(sorted([attr for attr in tags[tag] if attr in ['accesskey', 'href', 'title']])) else: attrs = ','.join(sorted(tags[tag]))

    if attrs:
        print("%s:%s" % (tag, attrs))
    else:
        print("%s:" % tag)