Read input

N = int(raw_input()) html = '' for _ in range(N): html += raw_input()

Extract tags and attributes using regex

tags = {} pattern = r'<(\w+)([^>]*)>' matches = re.findall(pattern, html)

for match in matches: tag, attrs = match if tag not in tags: tags[tag] = set()

attr_pattern = r'(\w+)='
attrs_match = re.findall(attr_pattern, attrs)
tags[tag].update(attrs_match)

Print output

for tag in sorted(tags): if tag == 'a': attrs = ','.join(sorted([attr for attr in tags[tag] if attr in ['accesskey', 'href', 'title']])) else: attrs = ','.join(sorted(tags[tag]))

if attrs:
    print("%s:%s" % (tag, attrs))
else:
    print("%s:" % tag)

Read input

Extract tags and attributes using regex

Print output

Cookie support is required to access HackerRank