Raiders of the Lost Accounts

Finding forgotten INTERNET accounts with Python

WHY WE’RE HERE

If I had used a password manager like KeePass from the beginning (it was released in 2003) I wouldn’t be in this predicament. One of my year-end goals was to update my password manager with not only my primary accounts but also those that have been long abandoned.

I shouldn’t blame the ease of creating online accounts for the number of accounts I’ve abandoned and even completely forgotten entirely, but I’m tempted to. If you’re using a different password for each website orphaned accounts aren’t nearly as much of a security risk as they otherwise might be. Assuming they don’t contain any significant personal information.

For me tracking down accounts from yesteryear isn’t so much about wanting to improve my internet security (though it’s a good opportunity to change passwords) but more about my own curiosity. There’s probably easier ways than doing this yourself such as the websites that will cross-reference usernames with numerous sites. That simply doesn’t interest me.

The method I’m using may not be that useful for people who delete their emails. However, if you’re like me you don’t which means there’s a good chance of finding some forgotten accounts.

One of the most laborious and dull ways would be to simply fire up each email account in turn and search for various keywords associated with the confirmation emails you get when making a new account. Or you know, I could try to code something.

Chapter 16 of Al Sweigart’s fantastic book Automating the Boring Stuff with Python: Practical Programming for Total Beginners was what gave the inspiration. Some of the code within is outdated as the IMAPClient has had updates since the book was released and I get into that below. His books are fantastic and I highly recommend supporting him!

HUNTING WITH PYTHON

I’m not a programmer by trade and my attempt is akin to someone stumbling through a foreign language they don’t quite grasp but have a interest in learning. If you’re someone who is familiar with the Style Guide for Python Code you may be disturbed by what comes next. Also, be aware that I am not at all responsible for anything going wrong with your email and all of that.

The Python script that I’ve cooked up utilizes IMAPClient, pyzmail, email, and a few other libraries to log into an email account, search through the inbox, and make a list of all the unique addresses I’ve received emails from. I’m using Python 3.6.

To further narrow down results you could also extract the email subjects and cross-reference them against keywords found in the typical boiler-plate language of confirmation/activation emails received when making an account. I’m not going to add that extra layer.

Most email clients allow their users to use the Internet Message Access Protocol (IMAP) to access their mail servers. However, some providers like Gmail have IMAP turned off by default while others like Yahoo allow access without updating any settings.

Before moving on you should make sure your email client is set-up to be accessed using IMAP. For Gmail go to click the settings gear > settings > Forwarding POP/IMAP > IMAP Access > Enable IMAP.

A note about Gmail: I found that even after I enabled IMAP that I was unable to log in. Under Google Account > Security there should be a hyperlink to Secure Account. You may find that Google has blocked access from your computer and may be listed as a security issue. There should be a option to choose whether it was you or not who attempted to access your account. After confirming it was me I found that I had to wait a few hours before I could finally access my Gmail account over IMAP.

If you’re unable to log in you may also have to change your settings to allow access from less secure devices. For Gmail go to Google Account > Security > Less secure app access. For Yahoo go to Account Info > Account Security > Allow apps that use less secure sign in. It’s best to turn this back to off after you’re done.

Using two-factor authentication? You should look into Application Specific Passwords to be able to access your email. Click here for Gmail and here for Yahoo.

After you’ve allowed IMAP you first need to figure out what the IMAP domain is for your email provider. I’ve listed a few common ones below:

  • Gmail: imap.gmail.com
  • Outlook.com/Hotmail.com: imap-mail.outlook.com and outlook.office365.com
  • Yahoo Mail: imap.mail.yahoo.com

THE SCRIPT

Login

With all that out of the way it’s time to fire up your preferred Python IDE. My oldest email is a Yahoo account so I’ll be using that for this example. First thing first we need to import the modules we’ll be using. Add the code below to your file:

import imapclient
import pyzmail
import pprint
import getpass
import imaplib
import datetime
import email

If you’re missing IMAPClient or Pyzmail you can download them using the pip module manager:

 pip install imapclient 
pip install pyzmail36

If you’re using Python 3.6 you’ll need to install the fork of pyzmail above. If you’re using a earlier version you can simply remove the “36”.

Before moving on we’re going to add a single line of code:

imaplib._MAXLINE = 10000000

The default size for searches is 10,000 bytes. For many people with too many email messages this will be too small. To avoid our script erroring as a result we’ll increase the amount 50 10,000,000 bytes.

Now that we’ve imported the necessary modules it’s time to get cracking. First, let’s simply connect to our email by adding the following code to your file:

email_address = input("Enter Email Address:")
email_pass = getpass.getpass("Enter Password:")

imap_obj = imapclient.IMAPClient('imap.mail.yahoo.com', ssl=True)

imap_obj.login(email_address, email_pass)

print("SUCCESS")

Let’s break it down:

email_address = input("Enter Email Address:")
email_pass = getpass.getpass("Enter Password:")

You shouldn’t insert passwords directly into your code. In this case, we’re simply asking the user to input the email and password each time. We’re using the getpass module to obfuscate the password from any potential prying eyes.

imap_obj = imapclient.IMAPClient('imap.mail.yahoo.com', ssl=True)

The imapclient.IMAPClient() function creates an IMAPClient object which connects to the IMAP server using the address parameter (Yahoo’s IMAP server in this case). Most email providers require SSL or TLS so we add ssl=True. We’ll use our newly created imap_obj with various IMAPClient methods, including the next line of code below:

imap_obj.login(email_address, email_pass) 

Next we simply pass the email and password provided by the user as strings into the login() function which will attempt to login.

Now run the script. If receive an error you may want to try the following:

  • Double-check to make sure the email and password you used are correct
  • Make sure IMAP is allowed
  • Allow access from less secure devices
  • Check “Secure Account” if you’re using Gmail

If all went well you should see something like this:

Mailboxes

Now that we can now connect to the IMAP server let’s see what mailboxes are available by adding the code below:

email_folders = imap_obj.list_folders()
pprint.pprint(email_folders)

Your script should output something like this:

[((b'\Junk', b'\HasNoChildren'), b'/', 'Bulk Mail'),
((b'\Archive', b'\HasNoChildren'), b'/', 'Archive'),
((b'\Drafts', b'\HasNoChildren'), b'/', 'Draft'),
((b'\HasNoChildren',), b'/', 'Inbox'),]

More or less depending on what all folders you have in your inbox.

The list_folders() method returns a tuple collection with nested indexes. All we want is the folder’s full name so that we can later pass it to the search()method. In this case it would be the Bulk Mail, Archive, Draft, and Inbox folder names. If you’re using Gmail some folders names may be preceded by “[Gmail]/”. This is part of the full folder name.

We can use indexing to make the output more readable by replacing the pprint function with the code below:

for i in range(0, len(email_folders)):
    print(email_folders[i][-1])

Which should in turn output something like this:

Bulk Mail
Archive
Draft
Inbox

Let’s go ahead and add the next line of code:

imap_obj.select_folder('Inbox', readonly=True)

We pass the string of the inbox we want to search to the select_folder() method. In this case let’s just search through the inbox folder. So we don’t accidentally delete or otherwise mess with anything we’ll use the readonly=True parameter. They will also not be marked as read.

Searching a Folder

Now that we’ve selected a folder let’s see look at some emails adding the code below:

email_uids = imap_obj.search(['ALL'])
print(email_uids)

Your output may look something like this:

What a messy list of numbers. Each email is represented by a Unique Identification Number (UID) that is given in ascending order and will be unique to your account. These UIDs are in effect the emails and we can pass them to various methods to fetch email data. They are returned by the search() method as a list.

In the example above the ['ALL'] IMAP search key was used with the search() method. This returns all messages in the folder selected when we used the select_folder() method.

There’s a number of IMAP search keys available for use to narrow down your results if everything is a little to broad for your tastes. For example, if we wanted to pull only emails received since May 3rd 2017 we could use the following code:

email_uids = imap_obj.search(['SINCE', datetime.date(2017, 5, 3)])
print(email_uids)

You could also use BEFORE instead of SINCE or combine them like so to get all emails between May 3rd 2017-May 3rd 2018:

email_uids = imap_obj.search(['SINCE', datetime.date(2017, 5, 3), 'BEFORE', datetime.date(2018, 5, 4)])
print(email_uids)

If you’ve looked at Chapter 16 of Automate the Boring Stuff you may notice that it uses a different date format, specifically day, month, year with hyphen delimiters. 03-MAY-2017 is an example. This date format no longer works with newer releases of IMAPClient which is why we’re using datetime.date constructor.

Also, on the subject of dates while SINCE includes the day provided BEFORE does not. As a result I’ve had to use 'BEFORE', datetime.date(2018, 5, 4) to capture May 3rd.

But Wait, That’s Not All!

There’s even more search key goodness to make us of. Here’s a handy list from Automate the Boring Stuff:

Search Key Meaning
‘ALL’ Returns all messages in the folder . You may run in to imaplib size limits if you request all the messages in a large folder.
‘BEFORE date
‘ON date
‘SINCE date
These three search keys return, respectively, messages that were received by the IMAP server before, on, or after the given date .
‘SUBJECT’, ‘string
‘BODY’, ‘string
‘SINCE date
Returns messages where string is found in the subject, body, or either, respectively . If string has spaces in it, then enclose it with double quotes: ‘TEXT “search with spaces”‘ .
‘FROM’, ‘string
‘TO’, ‘string
‘CC’, ‘string
‘BCC’, ‘string
Returns all messages where string is found in the “from” emailaddress, “to” addresses, “cc” (carbon copy) addresses, or “bcc” (blind carbon copy) addresses, respectively . If there are multiple email addresses in string, then separate them with spaces and enclose them all with double quotes:  ‘CC “firstcc@example.com secondcc@example.com”‘ .
‘SEEN’
‘UNSEEN’
Returns all messages with and without the \Seen flag, respectively . An email obtains the \Seen flag if it has been accessed with a fetch() method call (described later) or if it is clicked when you’re checking your email in an email program or web browser . It’s more common to say the email has been “read” rather than “seen,” but they mean the same thing .
‘ANSWERED’
‘UNANSWRED’
Returns all messages with and without the \Answered flag, respectively . A message obtains the \Answered flag when it is replied to .
‘DELETED’
‘UNDELETED’
Returns all messages with and without the \Deleted flag, respectively . Email messages deleted with the delete_messages() method are given the \Deleted flag but are not permanently deleted until the expunge() method is called (see “Deleting Emails” on page 375). Note that some email providers, such as Gmail, automatically expunge emails .
‘DRAFT’
‘UNDRAFT’
Returns all messages with and without the \Draft flag, respectively . Draft messages are usually kept in a separate Drafts folder rather than in the INBOX folder .
‘FLAGGED’
‘UNFLAGED’
Returns all messages with and without the \Flagged flag, respectively . This flag is usually used to mark email messages as “Important” or “Urgent .
‘LARGER’ N
‘SMALLER’ N
Returns all messages larger or smaller than N bytes, respectively .
‘NOT’, ‘search-key Returns the messages that search-key would not have returned .
‘OR’, search-key1, search-key2 Returns the messages that match either the first or second search-key

Automate the Boring Stuff With Python by Al Sweigart is licensed under CC BY 3.
Edited several rows for changes in input syntax and also clarity. See IMAPClient documentation for version history.

Let’s get into some more examples! However, please note I haven’t tested every IMAP Key with every email provider.

email_uids = imap_obj.search(['ANSWERED', 'BEFORE', datetime.date(2015, 5, 3)])
print(email_uids)

email_uids = imap_obj.search(['LARGER', 100])
print(email_uids)

email_uids = imap_obj.search(['FROM', 'exampleperson@email.com'])
print("You have received", len(email_uids), "email from exampleperson@email.com")

email_uids = imap_obj.search(['OR', 'FROM', 'example1@email.com', 'FROM', "example2@email.com"])
print(email_uids)

The Finale?

Hopefully options shown in the previous section are helpful. For now I’m going to search my entire inbox as I want to have a full list of look over. I’ll go over two ways to go about this.

Method one:

emails_from_address = []

progress_uid = 1

for i in email_uids:
    print(progress_uid, "of", len(imap_obj.search(['ALL'])), end="\r")

    duplicate_temp = False

    temp_rawmessage = imap_obj.fetch([i], ['BODY[]', 'FLAGS'])
    temp_message = pyzmail.PyzMessage.factory(temp_rawmessage[i][b'BODY[]'])
            
    temp_fromraw = temp_message.get_addresses('from')
    temp_from_address = temp_fromraw[0][1]
 
    for i2 in emails_from_address:
        if temp_from_address == i2:
                duplicate_temp = True

    if duplicate_temp == False:
        emails_from_address.append(temp_from_address)

    progress_uid = progress_uid + 1

for i in emails_from_address:
    print(i)

imap_obj.logout()

In short, we’re iterating through every UID from the email_uid list. We echo the script’s current progress. duplicate_temp is initially false. After parsing the message we check the temp_from_address by looping through the email_from_address list. If it’s already in the list we don’t add it to the list. If it’s unique, we add it to the list and continue increasing our progress by one email. At the end we iterate through the unique emails and logout of the IMAP server.

temp_rawmessage = imap_obj.fetch([i], ['BODY[]', 'FLAGS'])

The fetch() method does what it says on the tin and fetches the actual email’s content. The first argument [i] is the UID we’re passing from email_uid list that we created earlier. The second is the['BODY[]'] which is the body of the email in all it’s RFC 822 glory as a defaultdict. We’ll look more at this in a moment.

temp_message = pyzmail.PyzMessage.factory(temp_rawmessage[i][b'BODY[]'])

Using the pyzmail module we can make the raw email data a PyzMessage object which allows us easily pull certain data from the email body, in this case we’re pulling the sender’s email address:

temp_fromraw = temp_message.get_addresses('from')
temp_from_address = temp_fromraw[0][1]

The get_addresses('from') method returns the from address as a list containing both the sender’s name and address. In this case I only wanted the address. You could also use (‘to’), (‘cc’), and (‘bcc’).

We can also get the subject and the body of the email using the following line of code:

# Subject
print(temp_message.get_subject())

#HTML body
print(temp_message.html_part.get_payload().decode(temp_message.html_part.charset))

# Text body
print(temp_message.text_part.get_payload().decode(temp_message.text_part.charset))

Getting the body of the email is evidently a little more involved. Emails can be sent as HTML, plaintext, or both. A solution could be doing something like this:

if temp_message.text_part != None:
    temp_email_body = temp_message.text_part.get_payload().decode(temp_message.text_part.charset)
else:
    temp_email_body = temp_message.html_part.get_payload().decode(temp_message.html_part.charset)

Method Two:

We can also use the email library to parse our emails into something intelligible for us mere mortals.

email_uid = 5

raw_msg = imap_obj.fetch([email_uid], 'RFC822')
email_message = email.message_from_bytes(raw_msg[email_uid][b'RFC822'])

In the above example we add 'RFC822' as our data perimeter the parameters in the fetch() method. Then we pass the fetched message, the UID, and the b' bytes prefix to the message_from_bytes() method.

Our email_message variable is a email.message.Message class which means we can request some information from it. For example:

print(email_message["from"])
print(email_message["to"])
print(email_message["subject"])

You could also loop through numerous emails using the following:

for uid, message_data in imap_obj.fetch(email_uids, 'RFC822').items():
    email_message = email.message_from_bytes(message_data[b'RFC822'])
    print(email_message["from"])
    print(email_message["to"])
    print(email_message["subject"])

When we call on the fetch() method it returns both the UID and the email data that we can conveniently pass through to the message_from_bytes().

I hope something I’ve gone over is helpful and sparked an interest in doing something with Python and emails.