Detect HTML links

Sort by

recency

|

206 Discussions

|

  • + 0 comments

    Python 3

    import re
    import sys
    
    html = sys.stdin.read()
    
    pattern=r'href=\"([^-]*?)\"[^>]*>(<[^>]*>)?\s?([^>]*?)<'
    
    matches = re.findall(pattern, html)
    
    for match in matches:
        print(",".join((match[0], match[2])))
    
  • + 0 comments

    Detecting HTML links, such as the tag with the href attribute, is crucial for web development and troubleshooting, especially when the text name like "HackerRank" can be hidden within multiple nested tags. For a practical understanding of this, visiting resources like Honista Hub can be insightful. Honista Hub offers various old versions of apps ( https://www.honistahub.com/honista-old-versions/ ), which can be useful for experimenting with and understanding different aspects of web functionality and link detection.

  • + 0 comments

    Java 15:

    import java.io.*;
    import java.util.*;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    public class Solution {
        public static void main(String[] args) {
            Scanner scanner = new Scanner(System.in);
            int n = Integer.parseInt(scanner.nextLine());
            String regex = "<a href\\s*=\\s*\"([^\"]+)\"[^>]*>(?:<[^<>]+>)*([^><]*)(?:<[^<>]+>)*";
            Pattern pattern = Pattern.compile(regex);
            for(int i=0;i<n;i++)
            {   Matcher matcher = pattern.matcher(scanner.nextLine());
                while(matcher.find())   
                {   System.out.print(matcher.group(1).trim());
                    System.out.println(","+matcher.group(2).trim());
                }
            }
        }
    }
    
    /*
    regex = "<a href\\s*=\\s*\"([^\"]+)\"[^>]*>(?:<[^<>]+>)*([^><]*)(?:<[^<>]+>)*"
    
    > Keeping <a is important as there is <area> tag with href attribute
    > \\s* means none or more space character.
    > \" used to denote double quotes
    > \"([^\"]+)\" means double quote + (anything other than double quote) + double quote. 
    > (anything other than double quote) forms the group(1) i.e. required URL. 
    > [^>]*> means (anything other than >) + >
        Before the <a> tag is closed there may be other attributes like title="---" etc. So to manage that this portion is used.
         
    > (?:___) means this is a non-captured group, i.e., this group is not added as next group, i.e., this is not group(2).
    > (?:<[^<>]+>) means a non-captured group where open_tag + (anything other than open and close tag) + close_tag that is a single tag like <h1>
    > (?:<[^<>]+>)* used to manage any starting tags nested within <a> tag, like <h1><p>.
    > ([^><]*) means anything other than open or close tag. Forms group(2) that is, required textname of url.
    > (?:<[^<>]+>)* used again to manage any closing tags nested within <a> tag, like </h1></p>.
    
    > So (?:<[^<>]+>)*([^><]*)(?:<[^<>]+>)* 
        means (any opening tags like <h1>)+(textname)+(any closing tags like </h1>)
    */
    
  • + 0 comments

    Python 3

    import re
    import sys
    n = int(input())
    html = sys.stdin.read()
    pattern = r'<a\s.*?href="(.*?)".*?>\s*([^<>]*?)</'
    matches = re.findall(pattern, html)
    for match in matches:
        print(match[0], match[1], sep=",")
    
  • + 0 comments

    Javascript solution:

    const lines = input.matchAll(/<a\s+href=['"]?(.*?)['"\s].*?>\s*([^<].*?)</g);
    for (const line of lines) {
      const output = `${line[1].trim()},${line[2].trim()}`;
      console.log(output);
    }