Detect the Domain Name

  • + 0 comments

    Java 15

    import java.io.*;
    import java.util.*;
    import java.util.regex.Pattern;
    import java.util.regex.Matcher;
    public class Solution {
        public static void main(String[] args) {
            Scanner scanner = new Scanner(System.in);
            int n = scanner.nextInt(); scanner.nextLine();
            // Here only captured group is ([a-z0-9-]+(?:\\.[a-z0-9-]+)+) i.e. the required domain without www/ww2.
            String regex = "https?://(?:ww[w2]\\.)?([a-z0-9-]+(?:\\.[a-z0-9-]+)+)";
            Pattern pattern = Pattern.compile(regex);
            // Creating hashset to keep only unique values
            HashSet<String> hashset = new HashSet<String>();
            for(int i=0;i<n;i++){
                Matcher matcher = pattern.matcher(scanner.nextLine());
                while(matcher.find())   hashset.add(matcher.group(1));
            }  
            // Converting hashset to arraylist to use sort() 
            ArrayList<String> arraylist = new ArrayList<String>(hashset);
            Collections.sort(arraylist);
            // Use listiterator to traverse through arraylist 
            ListIterator<String> listiterator = arraylist.listIterator();
            while(listiterator.hasNext()){
                System.out.print(listiterator.next());
                if (listiterator.hasNext()) System.out.print(";");
            }
        }
    }
    
    /*
    > () in regex means captured groups.
    > If regex="(A)(B)", group() or group(0) shows the entire matched string. group(1) shows the corresponding match to pattern "A" and group(2) shows the corresponding match to pattern "B".
    > (?:)--Here "?:" is added after ( to mean that this is a non-captured group. means it is not stored within group(index).
    
    ------------"https?://(?:ww[w2]\\.)?([a-z0-9-]+(?:\\.[a-z0-9-]+)+)"------
    Example: http://www.hydrogencars-now.com.uk.org/blog2/index.php
    Required domain: hydrogencars-now.com.uk.org
    
    > http
    > s? means s may or may not be present just once.
    > ://
    
    > \\. denotes the original dot.
    > (?:) denotes a non-captured group.
    > (?:ww[w2]\\.)? means non-captured group to get www/ww2 followed by dot. 
        This may or may not be present just once.
        
    > ([a-z0-9-]+(?:\\.[a-z0-9-]+)+) means required output. that is, group(1).
    > This means required domain is of the form "word.word.word...." 
    > Here each word may consists of letters, numbers or hyphen only.
    > [a-z0-9-]+ means this word.
    > (?:\\.[a-z0-9-]+)+ means the combination ".word", one or more times. that is ".word.word.word..."
    > \\. denotes the original dot
    
    */