PYTHON get mail sending time

Today, I continued to pile on the hill where I wrote it.

The problem is as follows: use python to get the sending time of mail.
CSDN is a bad place: Daniel basically doesn't have it. If there is Daniel, he doesn't care to solve this problem.

python has a module email, but it does not encapsulate the function of obtaining time.
Encapsulates only the function to get the header.
Then I thought to make do with it:

This is a standard header:

Date: Tue, 30 Jul 2013 16:53:17 +0800
Received: from db-sysnoc-mailrelay3.db01.baidu (unknown [123.125.66.194])
    by newmx38.qq.com (NewMx) with SMTP id 
    for <hongtenzone@foxmail.com>; Tue, 30 Jul 2013 16:53:16 +0800
X-QQ-SSF: 00500000010000010rF000C1040000r
X-QQ-mid: mx38t1375174396thkz17254
Received: from mail-out.sys.baidu.com (cq01-passport-mis00.cq01.baidu.com [10.46.78.11])
    by db-sysnoc-mailrelay3.db01.baidu.com (Postfix) with SMTP id 515EF798060
    for <hongtenzone@foxmail.com>; Tue, 30 Jul 2013 16:53:16 +0800 (CST)
From: =?UTF-8?B?YmFpZHU=?=<passport@baidu.com>
To: =?UTF-8?B?aG9uZ3RlbnpvbmVAZm94bWFpbC5jb20=?=<hongtenzone@foxmail.com>
Subject: =?UTF-8?B?55m+5bqm5LqR6YCB5L2gMTAwR+WtmOWCqOepuumXtOmAmuefpQ==?=
MIME-Version: 1.0
Content-Type: text/html;
    charset="UTF-8"
Content-Transfer-Encoding: base64

Get the Date function with the get function of mail and get it successfully

<Tue>

Let's take a look at the reasons:

What I got was

Date: Tue, 30 Jul 2013 16:53:17 +0800

Then the function get Date returns me the data Tue before the comma.
Got a week's information.
What's the use of a week??????

Tear the script by hand. Regular matching starts.

The first step. Get mail.

The second step is to splice the mail

The third step is to encode the file as UTF-8, and the error mode is replace (no matter what the encoding is, it's ok if the English is not garbled anyway)

Step 4: regular matching time.

The fifth step is to convert the time format.

Get time successfully.

Those who don't have a little ability don't have to look down. It's not difficult for those who have programming ability. It's a headache for Xiaobai (I have a headache for everything I write)

Get an email: use python's poplib module.

    p = poplib.POP3_SSL(pop3server)
    p.user(user)
    p.pass_(password)

Among them, the server address, user and password are all configured.

then

mail = p.retr(100)[1]

Get an email with sequence number 100.

Since the email module does not have this function, it is not needed. Tear it by hand.
This module obtains the format of bytes, so it needs to use b '\ n' to splice. You can get all the contents of the message. If you are too lazy to handle it, you can get the email header directly. Index value should be 0

mail = b'\n'.join(mail)

Now the file is in bytes mode for transcoding.
Convert to UTF-8 format. If it is in GBK format, Chinese will be garbled.
If you don't want to mess with the code, you need to write try... except... If you just get the time, you don't need it.
Error set to replace.

mail = mail.decode('UTF-8', errors='replace')

Then, perform regular matching on the time string date: Tue, 30 Jul, 2013 16:53:17 + 0800.

Normal people will not use this time format, so after regularization, we need to carry out secondary processing and replace it with our commonly used format.

def add_time(mail):
    try:
        mail = mail.decode('UTF-8', errors='replace')
        get_date = re.search(r'Date:\s([A-Za-z]{1,3}),\s([0-9]{1,2})\s([A-Za-z]{1,3})\s([0-9]{1,4})\s([0-9]{1,2}):',
                             mail)
    except Exception as e:
        print(e)
    return '{}-{}-{} {}Time'.format(get_date.group(4), str(list(calendar.month_abbr).index(get_date.group(3))).zfill(2),str(get_date.group(2)).zfill(2), get_date.group(5))

Pay attention to some details: because the obtained month uses enumeration and is a number, it must be converted into text before filling.
Or build a table to map.

In this way, the acquired time becomes:

2013-10-12 11 Time

Of course, I only got part of the time, and I ignored the rest.

Some details:

If it is an email sent by China, there is usually + 0800, indicating that we are in the time zone of Dongba district. If it is foreign mail or some system settings are wrong, it is not necessarily Dongba District, so try not to regular match + 0800

Regular matching has a feature: if no text is obtained, its return value is NULL, and NULL does not support the function of GROUP. Therefore, we need to write a fault handling module. Otherwise, this function will crash and exit directly.
The second is: most emails are in line with this time module, but if they are generated by direct operation function, they may not meet the function of this time module, so they cannot be obtained.
If you can't get it, write a fault handling and return a value, or write a regular for the corresponding module.
Third, if the datetime module does not obtain formal data, there will be a BUG that cannot be found in the index, and a BUG without the thirteenth value in the index will be returned.

This is the sending time in the email (because the sending time comes first).
Match the time and save it in EXCEL.

Keywords: Python

Added by dandare on Tue, 08 Mar 2022 02:56:05 +0200