Python | Difference between re.findall() and re.finditer()

re.finditer() returns iterator of matched objects in the string while re.findall() returns list of matched patterns in string. Refer below snippet for understanding difference between re.finditer() and re.findall() .

Code Snippet 1 : Extracting domain name from URLs

  • Using re.finditer
  •   
    import re
    
    text = ''' Extract the doamin from the urls www.gcptutorials.com,
    www.wikipedia.org, www.google.com'''
    
    pattern = r'(www.([A-Za-z_0-9-]+)(.\w+))'
    
    
    find_iter_result = re.finditer(pattern, text)
    
    print(type(find_iter_result))
    print(find_iter_result)
    
    for i in find_iter_result:
      print(i.group(2))
    
    
    

    Output

     
    < class 'callable_iterator' >
    < callable_iterator object at 0x7f0c5cc24e48 >
    gcptutorials
    wikipedia
    google
      
     
     
  • Using re.findall
  •   
    import re
    
    text = ''' Extract the doamin from the urls www.gcptutorials.com,
    www.wikipedia.org, www.google.com'''
    
    pattern = r'(www.([A-Za-z_0-9-]+)(.\w+))'
    
    
    find_all_result = re.findall(pattern, text)
    
    print(type(find_all_result))
    print(find_all_result)
    for i in find_all_result:
      print(i[1])
    
    
    

    Output

     
    < class 'list' >
    [('www.gcptutorials.com', 'gcptutorials', '.com'), ('www.wikipedia.org', 'wikipedia', '.org'), ('www.google.com', 'google', '.com')]
    gcptutorials
    wikipedia
    google
      
     
     

    Code Snippet 2 : Extracting URLs from HTML href tags

      
    import re
    
    text = '''
    <li class="nav-item"><a class="nav-link"
      href="/companies-listing/corporate-filings-announcements">Announcements</a></li>                                                        
    <li class="nav-item"><a class="nav-link" \
    href="/companies-listing/corporate-filings-board-meetings">Board Meetings</a></li>                                                    
    <li class="nav-item">
    <a class="nav-link"
        href="/companies-listing/corporate-filings-actions">Corporate Actions</a>
    </li>
            '''
    
    
    finditer_output = re.finditer(r'href="(.*)"', text)
    
    print("Output of finditer()")
    print(type(finditer_output))
    print(finditer_output)
    for i in finditer_output:
        print(i.group(1))
    
    
    

    Example Output

     
    Output of finditer()
    <class 'callable_iterator'>
    <callable_iterator object at 0x00000278CC7BDE48>
    /companies-listing/corporate-filings-announcements
    /companies-listing/corporate-filings-board-meetings
    /companies-listing/corporate-filings-actions
    
    

    Using re.findall

     
    text = '''
    <li class="nav-item"><a class="nav-link"
      href="/companies-listing/corporate-filings-announcements">Announcements</a></li>                                                        
    <li class="nav-item"><a class="nav-link" \
    href="/companies-listing/corporate-filings-board-meetings">Board Meetings</a></li>                                                    
    <li class="nav-item">
    <a class="nav-link"
        href="/companies-listing/corporate-filings-actions">Corporate Actions</a>
    </li>
         '''
    
    findall_output = re.findall(r'href="(.*)"', text)
    print("Output of findall()")
    print(type(findall_output))
    print(findall_output)
    
     
    

    Example Output

     
    Output of findall()
    <class 'list'>
    ['/companies-listing/corporate-filings-announcements',
      '/companies-listing/corporate-filings-board-meetings',
      '/companies-listing/corporate-filings-actions']
    '''anies-listing/corporate-filings-actions