Detect the Domain Name

Sort by

recency

|

164 Discussions

|

  • + 0 comments
    import re
    import sys
    
    domain_pattern = re.compile(r'https?://(?:www\.|ww2\.)?([a-zA-Z0-9\-]+\.[a-zA-Z0-9.\-]+)[^a-zA-Z0-9.\-]')
    
    input_stream = sys.stdin
    
    unique_domains = set()
    
    n = int(input_stream.readline().strip())
    
    for _ in range(n):
        line = input_stream.readline().strip()
        for match in domain_pattern.findall(line):
            unique_domains.add(match)
    
    print(';'.join(sorted(unique_domains)))
    
  • + 0 comments

    I think there is a problem with the expected output of Test case 1. If you inspect the inputs (html). You can see these lines:

    ... '//www.googletagservices.com/tag/js/gpt.js'; ... _gaq.push(['_addIgnoredOrganic', 'www.timesofindia.com']); ...

    however, those 2 urls does not appear in the expected output

  • + 0 comments

    Java 15

    import java.io.*;
    import java.util.*;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    public class Solution {
        public static void main(String[] args) {
            Scanner scanner = new Scanner(System.in);
            int n = scanner.nextInt(); scanner.nextLine();
            // Here only captured group is ([a-z0-9-]+(?:\\.[a-z0-9-]+)+) i.e. the required domain without www/ww2.
            String regex = "https?://(?:ww[w2]\\.)?([a-z0-9-]+(?:\\.[a-z0-9-]+)+)";
            Pattern pattern = Pattern.compile(regex);
            // Creating hashset to keep only unique values
            HashSet<String> hashset = new HashSet<String>();
            for(int i=0;i<n;i++){
                Matcher matcher = pattern.matcher(scanner.nextLine());
                while(matcher.find())   hashset.add(matcher.group(1));
            }  
            // Converting hashset to arraylist to use sort() 
            ArrayList<String> arraylist = new ArrayList<String>(hashset);
            Collections.sort(arraylist);
            // Use listiterator to traverse through arraylist 
            ListIterator<String> listiterator = arraylist.listIterator();
            while(listiterator.hasNext()){
                System.out.print(listiterator.next());
                if (listiterator.hasNext()) System.out.print(";");
            }
        }
    }
    
    /*
    > () in regex means captured groups.
    > If regex="(A)(B)", group() or group(0) shows the entire matched string. group(1) shows the corresponding match to pattern "A" and group(2) shows the corresponding match to pattern "B".
    > (?:)--Here "?:" is added after ( to mean that this is a non-captured group. means it is not stored within group(index).
    
    ------------"https?://(?:ww[w2]\\.)?([a-z0-9-]+(?:\\.[a-z0-9-]+)+)"------
    Example: http://www.hydrogencars-now.com.uk.org/blog2/index.php
    Required domain: hydrogencars-now.com.uk.org
    
    > http
    > s? means s may or may not be present just once.
    > ://
    
    > \\. denotes the original dot.
    > (?:) denotes a non-captured group.
    > (?:ww[w2]\\.)? means non-captured group to get www/ww2 followed by dot. 
        This may or may not be present just once.
        
    > ([a-z0-9-]+(?:\\.[a-z0-9-]+)+) means required output. that is, group(1).
    > This means required domain is of the form "word.word.word...." 
    > Here each word may consists of letters, numbers or hyphen only.
    > [a-z0-9-]+ means this word.
    > (?:\\.[a-z0-9-]+)+ means the combination ".word", one or more times. that is ".word.word.word..."
    > \\. denotes the original dot
    
    */
    
  • + 0 comments

    Python 3

    import re, sys
    n = int(input())
    html = sys.stdin.read()
    pattern = r'https?://(www\.|ww2\.)?([a-z0-9\-]+\.[\.a-z0-9\-]+)'
    
    matches = re.findall(pattern, html)
    st = set()
    for match in matches:
        st.add(match[1])
    print(";".join(sorted(list(st))))
    
  • + 0 comments
    import re, sys
    
    regex = re.findall(r"(?:(http:\/\/|https:\/\/))(?:www\.)?([a-z0-9\-\.]+\.(org|com|in|tv|me|net))", sys.stdin.read())
    print(*sorted({m[1] for m in regex}), sep=';')