Finding forgotten accounts With keywords
Cyber hygiene is invaluable, but like many, I don’t remember every throwaway account I’ve made using an alternative email. The method outlined below uses IMAP to search email’s for specific keywords. Of course, the keywords can be anything and used in many different ways.
I was inspired by chapter 16 of Al Sweigart’s book Automating the Boring Stuff with Python: Practical Programming for Total Beginners. Unfortunately, some of the IMAPClient code has become outdated since the book was released resulting in some changes. Sweigart’s book is fantastic and I highly recommend supporting his work with Python 3+.
I’m not a Python developer by trade. If you’re familiar the style guide for Python you may notice some departures from it. I am also not responsible for any user issues.
To further narrow down results you could also extract the email subjects and cross-reference them against keywords found in the typical boiler-plate language of confirmation/activation emails received when making an account. I have not added that extra layer.
NOTES ON EMAIL SETTINGS
Most email clients allow their users to use the Internet Message Access Protocol (IMAP) to access their mail servers. However, some providers like Gmail have IMAP turned off by default while others like Yahoo allow access without updating any settings.
If you’re using Gmail you may find difficulty connecting even after enabling IMAP. I recommend checking Secure Account. Google may have blocked access and listed your system as a security issue. Several hours after confirming it was me I was able to access my account with the script.
If you’re still unable to connect you may also have to change your settings to allow access from less secure devices. Proceed at your risk. It’s best to turn this back to off after you’re done. If you’re using two-factor authentication you should look into application specific passwords to be able to access your email. Click here for Gmail and here for Yahoo.
After you’ve allowed IMAP you first need to figure out what the IMAP domain is for your email provider. I’ve listed a few common ones below:
- Gmail: imap.gmail.com
- Outlook.com/Hotmail.com: imap-mail.outlook.com and outlook.office365.com
- Yahoo Mail: imap.mail.yahoo.com
import imapclient import pyzmail import pprint import getpass import imaplib import datetime import email
If you’re missing IMAPClient or Pyzmail you can download them using the pip module manager:
pip install imapclient
pip install pyzmail36
If you’re using Python 3.6 you’ll need to install the fork of pyzmail above. If you’re using a earlier version you can simply remove the “36”.
Before moving on we’re going to add a single line of code:
imaplib._MAXLINE = 10000000
The default size for searches is 10,000 bytes. For many people with too many email messages this will be too small. To avoid our script erroring as a result we’ll increase the amount 50 10,000,000 bytes.
Now that we’ve imported the necessary modules it’s time to get cracking. First, let’s simply connect to our email by adding the following code to your file:
email_address = input("Enter Email Address:") email_pass = getpass.getpass("Enter Password:") imap_obj = imapclient.IMAPClient('imap.mail.yahoo.com', ssl=True) imap_obj.login(email_address, email_pass) print("SUCCESS")
Let’s break it down:
email_address = input("Enter Email Address:")
email_pass = getpass.getpass("Enter Password:")
You shouldn’t insert passwords directly into your code. In this case, we’re simply asking the user to input the email and password each time. We’re using the getpass module to obfuscate the password from any potential prying eyes.
imap_obj = imapclient.IMAPClient('imap.mail.yahoo.com', ssl=True)
imapclient.IMAPClient() function creates an IMAPClient object which connects to the IMAP server using the address parameter (Yahoo’s IMAP server in this case). Most email providers require SSL or TLS so we add
ssl=True. We’ll use our newly created
imap_obj with various IMAPClient methods, including the next line of code below:
Next we simply pass the email and password provided by the user as strings into the
login() function which will attempt to login.
Now run the script. If receive an error you may want to try the following:
- Double-check to make sure the email and password you used are correct
- Make sure IMAP is allowed
- Allow access from less secure devices
- Check “Secure Account” if you’re using Gmail
If all went well you should see something like this:
Now that we can now connect to the IMAP server let’s see what mailboxes are available by adding the code below:
email_folders = imap_obj.list_folders() pprint.pprint(email_folders)
Your script should output something like this:
[((b'\Junk', b'\HasNoChildren'), b'/', 'Bulk Mail'),
((b'\Archive', b'\HasNoChildren'), b'/', 'Archive'),
((b'\Drafts', b'\HasNoChildren'), b'/', 'Draft'),
((b'\HasNoChildren',), b'/', 'Inbox'),]
More or less depending on what all folders you have in your inbox.
list_folders() method returns a tuple collection with nested indexes. All we want is the folder’s full name so that we can later pass it to the
search()method. In this case it would be the Bulk Mail, Archive, Draft, and Inbox folder names. If you’re using Gmail some folders names may be preceded by “[Gmail]/”. This is part of the full folder name.
We can use indexing to make the output more readable by replacing the
pprint function with the code below:
for i in range(0, len(email_folders)): print(email_folders[i][-1])
Which should in turn output something like this:
Let’s go ahead and add the next line of code:
We pass the string of the inbox we want to search to the
select_folder() method. In this case let’s just search through the inbox folder. So we don’t accidentally delete or otherwise mess with anything we’ll use the
readonly=True parameter. They will also not be marked as read.
Searching a Folder
Now that we’ve selected a folder let’s see look at some emails adding the code below:
email_uids = imap_obj.search(['ALL']) print(email_uids)
Your output may look something like this:
What a messy list of numbers. Each email is represented by a Unique Identification Number (UID) that is given in ascending order and will be unique to your account. These UIDs are in effect the emails and we can pass them to various methods to fetch email data. They are returned by the
search() method as a list.
In the example above the
['ALL'] IMAP search key was used with the
search() method. This returns all messages in the folder selected when we used the
There’s a number of IMAP search keys available for use to narrow down your results if everything is a little to broad for your tastes. For example, if we wanted to pull only emails received since May 3rd 2017 we could use the following code:
email_uids = imap_obj.search(['SINCE', datetime.date(2017, 5, 3)]) print(email_uids)
You could also use
BEFORE instead of
SINCE or combine them like so to get all emails between May 3rd 2017-May 3rd 2018:
email_uids = imap_obj.search(['SINCE', datetime.date(2017, 5, 3), 'BEFORE', datetime.date(2018, 5, 4)]) print(email_uids)
If you’ve looked at Chapter 16 of Automate the Boring Stuff you may notice that it uses a different date format, specifically day, month, year with hyphen delimiters. 03-MAY-2017 is an example. This date format no longer works with newer releases of IMAPClient which is why we’re using
Also, on the subject of dates while
SINCE includes the day provided
BEFORE does not. As a result I’ve had to use
'BEFORE', datetime.date(2018, 5, 4) to capture May 3rd.
But Wait, That’s Not All!
There’s even more search key goodness to make us of. Here’s a handy list from Automate the Boring Stuff:
|‘ALL’||Returns all messages in the folder . You may run in to imaplib size limits if you request all the messages in a large folder.|
|These three search keys return, respectively, messages that were received by the IMAP server before, on, or after the given date .|
|Returns messages where string is found in the subject, body, or either, respectively . If string has spaces in it, then enclose it with double quotes: ‘TEXT “search with spaces”‘ .|
|Returns all messages where string is found in the “from” emailaddress, “to” addresses, “cc” (carbon copy) addresses, or “bcc” (blind carbon copy) addresses, respectively . If there are multiple email addresses in string, then separate them with spaces and enclose them all with double quotes: ‘CC “email@example.com firstname.lastname@example.org”‘ .|
|Returns all messages with and without the \Seen flag, respectively . An email obtains the \Seen flag if it has been accessed with a fetch() method call (described later) or if it is clicked when you’re checking your email in an email program or web browser . It’s more common to say the email has been “read” rather than “seen,” but they mean the same thing .|
|Returns all messages with and without the \Answered flag, respectively . A message obtains the \Answered flag when it is replied to .|
|Returns all messages with and without the \Deleted flag, respectively . Email messages deleted with the delete_messages() method are given the \Deleted flag but are not permanently deleted until the expunge() method is called (see “Deleting Emails” on page 375). Note that some email providers, such as Gmail, automatically expunge emails .|
|Returns all messages with and without the \Draft flag, respectively . Draft messages are usually kept in a separate Drafts folder rather than in the INBOX folder .|
|Returns all messages with and without the \Flagged flag, respectively . This flag is usually used to mark email messages as “Important” or “Urgent .|
|Returns all messages larger or smaller than N bytes, respectively .|
|‘NOT’, ‘search-key‘||Returns the messages that search-key would not have returned .|
|‘OR’, search-key1, search-key2‘||Returns the messages that match either the first or second search-key|
Let’s get into some more examples! However, please note I haven’t tested every IMAP Key with every email provider.
email_uids = imap_obj.search(['ANSWERED', 'BEFORE', datetime.date(2015, 5, 3)]) print(email_uids) email_uids = imap_obj.search(['LARGER', 100]) print(email_uids) email_uids = imap_obj.search(['FROM', 'email@example.com']) print("You have received", len(email_uids), "email from firstname.lastname@example.org") email_uids = imap_obj.search(['OR', 'FROM', 'email@example.com', 'FROM', "firstname.lastname@example.org"]) print(email_uids)
Hopefully options shown in the previous section are helpful. For now I’m going to search my entire inbox as I want to have a full list of look over. I’ll go over two ways to go about this.
emails_from_address =  progress_uid = 1 for i in email_uids: print(progress_uid, "of", len(imap_obj.search(['ALL'])), end="\r") duplicate_temp = False temp_rawmessage = imap_obj.fetch([i], ['BODY', 'FLAGS']) temp_message = pyzmail.PyzMessage.factory(temp_rawmessage[i][b'BODY']) temp_fromraw = temp_message.get_addresses('from') temp_from_address = temp_fromraw for i2 in emails_from_address: if temp_from_address == i2: duplicate_temp = True if duplicate_temp == False: emails_from_address.append(temp_from_address) progress_uid = progress_uid + 1 for i in emails_from_address: print(i) imap_obj.logout()
In short, we’re iterating through every UID from the
email_uid list. We echo the script’s current progress.
duplicate_temp is initially false. After parsing the message we check the
temp_from_address by looping through the
email_from_address list. If it’s already in the list we don’t add it to the list. If it’s unique, we add it to the list and continue increasing our progress by one email. At the end we iterate through the unique emails and logout of the IMAP server.
temp_rawmessage = imap_obj.fetch([i], ['BODY', 'FLAGS'])
fetch() method does what it says on the tin and fetches the actual email’s content. The first argument
[i] is the UID we’re passing from
email_uid list that we created earlier. The second is the
['BODY'] which is the body of the email in all it’s RFC 822 glory as a defaultdict. We’ll look more at this in a moment.
temp_message = pyzmail.PyzMessage.factory(temp_rawmessage[i][b'BODY'])
pyzmail module we can make the raw email data a PyzMessage object which allows us easily pull certain data from the email body, in this case we’re pulling the sender’s email address:
temp_fromraw = temp_message.get_addresses('from')
temp_from_address = temp_fromraw
get_addresses('from') method returns the from address as a list containing both the sender’s name and address. In this case I only wanted the address. You could also use (‘to’), (‘cc’), and (‘bcc’).
We can also get the subject and the body of the email using the following line of code:
# Subject print(temp_message.get_subject()) #HTML body print(temp_message.html_part.get_payload().decode(temp_message.html_part.charset)) # Text body print(temp_message.text_part.get_payload().decode(temp_message.text_part.charset))
Getting the body of the email is evidently a little more involved. Emails can be sent as HTML, plaintext, or both. A solution could be doing something like this:
if temp_message.text_part != None: temp_email_body = temp_message.text_part.get_payload().decode(temp_message.text_part.charset) else: temp_email_body = temp_message.html_part.get_payload().decode(temp_message.html_part.charset)
We can also use the
email_uid = 5 raw_msg = imap_obj.fetch([email_uid], 'RFC822') email_message = email.message_from_bytes(raw_msg[email_uid][b'RFC822'])
In the above example we add
'RFC822' as our data perimeter the parameters in the
fetch() method. Then we pass the fetched message, the UID, and the
b' bytes prefix to the
email_message variable is a
email.message.Message class which means we can request some information from it. For example:
print(email_message["from"]) print(email_message["to"]) print(email_message["subject"])
You could also loop through numerous emails using the following:
for uid, message_data in imap_obj.fetch(email_uids, 'RFC822').items(): email_message = email.message_from_bytes(message_data[b'RFC822']) print(email_message["from"]) print(email_message["to"]) print(email_message["subject"])
When we call on the
fetch() method it returns both the UID and the email data that we can conveniently pass through to the
I hope something I’ve gone over is helpful and sparked an interest in doing something with Python and emails.