Mail Conversion Questions

JayMBS

Member
Nov 20, 2018
7
2
3
Rhode Island
cPanel Access Level
DataCenter Provider
Hello there,

We're looking to free up some space on our system with many accounts. We've found a handful of accounts with mailboxes above 10GB.

We've recently enabled compression, and we understand that it only affects new emails. We'd like to try and save space with the existing email.

After some research it appears that maildir is not only the default, but the best option for ease-in-restoration for individual mails messages and for avoiding potential corruption/data loss. And since no one is complaining about performance, we don't see mdbox as worth the risk for its higher efficiency.

Throughout our testing we've found, on at least two mailboxes, that conversion saves significant space.

For example, a maildir mailbox at 19GB is 13GB after conversion, then converting back to maildir, it remains at 13GB.

For reference we are testing 'mailbox size' using the `du -h` command inside the /home/<user>/mail/<domain>/ folder

I am unsure if it is compressing the mail through the conversion process or if we are losing data with the conversions. I know that running diff against two folders (a maildir folder before the conversion and a maildir folder after conversion to mdbox and back to maildir) show many differences.

Can anyone suggest an approach that allows us to understand where the 6GB is going? - is it truly space savings without data loss (either through compression or some other side effect of going from: maildir > mdbox > maidlir such as inode usage) ?

Can anyone provide any advanced knowledge or experience using mail conversion that would either clear up what is going on, or help steer us one way or the other from moving down a path of converting mailboxes back and forth to free up space?

Obviously hand-checking each and every mail message file is not possible, and after the conversion to mdbox and then again to maildir, some messages seem binary when `cat`-ed that were pure text before.

Any forum threads I found here that suggest the conversion process compresses existing mail have shown replies that seem to rebuke / disprove that claim.

Our highest priority is to be sure we do not lose any mail data for any clients. Since this seems to be two-way conversion that seems to "save space" on the surface we'd like to understand the risks of using it.

Any thoughts or suggestions are appreciated.

Thanks for your time in reviewing this ask.

-Jay
 

LucasRolff

Well-Known Member
Community Guide Contributor
May 27, 2013
142
95
78
cPanel Access Level
Root Administrator
Hi Jay,

So compression works regardless if you're using maildir or mdbox, if you have a mailbox where compression was enabled after the mailbox was created, only new emails will be compressed.

When you then convert it using the mailbox conversion tool, it will convert it in such a way, that it compresses the emails and store them in mdbox format.

If you decide to then convert from mdbox to maildir again, your email will still be compressed because that's what you have enabled in the settings. If you'd uncheck compression and run conversion again, you'd see a growth in disk space.

The main benefit of mdbox is the fact you're storing multiple emails in a single file, thus using a lot less inodes, and thus also making it faster to read (maybe doesn't matter much if you're using SSD storage).

Now, about loss of data, it is very possible you might lose data during a conversion and it really depends on how emails was stored originally, there's known bugs where the conversion tool from dovecot "merges" and deletes emails if you have double INBOX prefixes such as INBOX.INBOX.Spam - when it then merges the emails during conversion, you'll lose some emails.

There's also cases where special characters might result in lost emails. There's a whole list of internal cPanel cases of bugs in the conversion tool that has to be resolved for me to say it's production ready.
I've had cases where I lost roughly 80% of all emails in an account (luckily I do backups) - but simply because the conversion tool decided to throw it out.

I hope cPanel will eventually put focus on it, so we can use the tool in a reliable way.
 
  • Like
Reactions: cPanelMichael

JayMBS

Member
Nov 20, 2018
7
2
3
Rhode Island
cPanel Access Level
DataCenter Provider
@LucasRolff - Thank you very much for your time and insight. I had found conflicting information in these forums on whether conversion compressed existing email. Thanks for clearing that up.

We are using SSD, so the conversion process is simply to get the compression out of it. We'd like to stick with maildir.

Very insightful about the double prefixes, thank you. Something to watch out for.

From a mail client perspective the use of maildir vs mdbox should be completely transparent. It'd be great if there was some way for it to build an index or "map" of a mailbox that could be used after the conversions to evaluate if the mail client can no longer see some mail, that way we could identify it and attempt to resolve.

Thanks again,
Jay
 

LucasRolff

Well-Known Member
Community Guide Contributor
May 27, 2013
142
95
78
cPanel Access Level
Root Administrator
We are using SSD, so the conversion process is simply to get the compression out of it. We'd like to stick with maildir.
Technically you could compress it without having to convert, there's a few examples on the dovecot mailing list for example.

From a mail client perspective the use of maildir vs mdbox should be completely transparent. It'd be great if there was some way for it to build an index or "map" of a mailbox that could be used after the conversions to evaluate if the mail client can no longer see some mail, that way we could identify it and attempt to resolve.
For the client it doesn't matter much, because it's dovecot doing the compression/decompression of the data and sure cPanel could build a way to resolve possible issues.

There's plenty of accounts I'd like to move to mdbox actually because they're consuming a ton of inodes, but I simply can't because of the risk.

If I'd actually have to do it, I'd simply inform users about a password change of their account, then move the old account to a "<username>[email protected]" - use imapsync to this way "migrate" to mdbox, because it's actually reliable, but it's a lot more time consuming.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,258
463
Hello,

Now, about loss of data, it is very possible you might lose data during a conversion and it really depends on how emails was stored originally, there's known bugs where the conversion tool from dovecot "merges" and deletes emails if you have double INBOX prefixes such as INBOX.INBOX.Spam - when it then merges the emails during conversion, you'll lose some emails.
For reference, the internal case that's open to investigate reports of lost INBOX.INBOX.* folders during the conversion from Maildir to MDBox is CPANEL-17461. I see recent activity on this case, but there's no decision or time frame to offer on a solution at this time. I've linked this forums thread to the case, and I'll update this thread with new information on the case status as soon as it becomes available.

There's also cases where special characters might result in lost emails. There's a whole list of internal cPanel cases of bugs in the conversion tool that has to be resolved for me to say it's production ready.
I've had cases where I lost roughly 80% of all emails in an account (luckily I do backups) - but simply because the conversion tool decided to throw it out.
Do you happen to know the case numbers for any of these issues? I'd like to compile them into a list, track their status, and then report back here as they are solved.

Thanks!
 

JayMBS

Member
Nov 20, 2018
7
2
3
Rhode Island
cPanel Access Level
DataCenter Provider
Here's an update so far:

On a test system with last nights backup (pkgacct) I've tested the Email Conversion on one very large account.

With compression newly enabled:

Going from maildir -> mdbox produce a 13GB savings
Going from mdbox -> maildir left the usage the same.

I checked all email accounts using roundcube and simply looking for the total number of emails in each folder.

It does not appear any emails are missing. I've done this 3 times with 3 different backup files.

At this point, for this particular account it appears the built-in conversion UI works well.

Of course, I am unaware of any potential issues that may arise from converting from maildir -> mdbox -> maildir again. I don't know if there is anything to look out for after a successful conversion, or if once messages are confirmed, there should be no other issues from the conversion process.

The plan is to move ahead with each account one at a time testing thoroughly before-hand and confirming email counts before/after.

Thoughts?
 
  • Like
Reactions: cPanelLauren

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,258
463
Of course, I am unaware of any potential issues that may arise from converting from maildir -> mdbox -> maildir again. I don't know if there is anything to look out for after a successful conversion, or if once messages are confirmed, there should be no other issues from the conversion process.

The plan is to move ahead with each account one at a time testing thoroughly before-hand and confirming email counts before/after.
Hello @JayMBS,

If you haven't already done so, take a look at the Warnings section of the following document to get an idea of which issues to look out for:

Mailbox Conversion - Version 76 Documentation - cPanel Documentation

Other than that, it's a good idea to enable backups on the system so you have something to restore in-case anything goes wrong.

Let us know if you encounter any problems.

Thank you.
 

LucasRolff

Well-Known Member
Community Guide Contributor
May 27, 2013
142
95
78
cPanel Access Level
Root Administrator
Do you happen to know the case numbers for any of these issues? I'd like to compile them into a list, track their status, and then report back here as they are solved.

Thanks!
Hi Michael,

The only cases I have is:
CPANEL-21367
CPANEL-21372
CPANEL-19556
CPANEL-17461
CPANEL-22405

I know one account that was problematic with conversion, so I'll take a backup of that account, restore on a test environment, run the conversion and see the actual emails that changes and see if I can find a way to constantly reproduce it, and then open a new ticket about it, so even if it's already logged internally, at least then we have a consistent way of reproducing it.

Might take a bit, since I have a massive list of things to get through, but eventually it will be there!
 
  • Like
Reactions: cPanelMichael

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,258
463
Hi Lucas,

Regarding the case numbers you noted, here's the current status for each one:

07/19/2019 Update:

CPANEL-19556 & CPANEL-23733 - Add workaround for dovecot double counting of INBOX.INBOX - Fixed in version 76.

CPANEL-17461 - "INBOX.INBOX.*" folders are lost during Maildir -> MDBox conversion - Fixed in version 78

CPANEL-22518 - Add script to clean up INBOX.INBOX and INBOX.$subfolder folders and save the emails - Open

I'll update this thread with more information on the status of CPANEL-17461 as it becomes available.

Thanks!
 
Last edited:

sahostking

Well-Known Member
May 15, 2012
403
29
78
Cape Town, South Africa
cPanel Access Level
Root Administrator
Twitter
We use Maildir on all our shared hosting servers. Anyone using mdbox yet and is it by far superior? Should we convert all servers to it.

I see we have many clients with 100GB of disk usage now and would be nice if we could save some space.

But besides that any issues or problems we should be aware of.
 

cPanelMichael

Administrator
Staff member
Apr 11, 2011
47,880
2,258
463
Hello @sahostking,

I merged your thread into this one, as this one includes the discussion you are looking for.

Let me know if you have any questions.

Thank you.
 

JayMBS

Member
Nov 20, 2018
7
2
3
Rhode Island
cPanel Access Level
DataCenter Provider
Hello all!

We've finally gotten around to testing external solutions for this. This came about as a need to gracefully migrate email from one server to another as part of a decomissioning process. The side benefit of this need affects users interested in this topic.
  • Although we implemented compression on the old server it did not affect existing email.
  • We had already enabled mail compression on the new server.
  • We used the pkgacct/restoracct scripts to migrate the account website, mail, and all to the new server. (this seeded our mail on server #2)
  • In this situation we had a small user count that were employees and we were able to enforce a reset of their passwords for this migration easily and notify them out-of-band (slack).
  • We used the following script which worked perfectly to one-way sync the mail pre and post MX record change. It retained all mail client flag settings and folders from the old server.

    Script: Official imapsync migration tool ( release 1.977 )
The resulting mailbox on the destination server is now completely compressed, including old mail without needing to convert to MDBOX and back to maildir

While not ideal to require a reset of credentials and while it may not be appropriate for many other scenarios, I wanted to confirm the concern over keeping the proper flags set and syncing easily and gracefully can be achieved along with the compression savings if you are able to change passwords and have folks reset them post-migration.

The code is available on github and the author accepts payment for the latest version along with support and because it works so well.

In my research I found options out there, but this was the easiest to configure and for us as time was of the essence so we went with the requirement to change passwords for the users for the mail migration.

Hope this helps others.
 

JayMBS

Member
Nov 20, 2018
7
2
3
Rhode Island
cPanel Access Level
DataCenter Provider
We use Maildir on all our shared hosting servers. Anyone using mdbox yet and is it by far superior? Should we convert all servers to it.

I see we have many clients with 100GB of disk usage now and would be nice if we could save some space.

But besides that any issues or problems we should be aware of.
FYI - There are various pros and cons between the two that I found online with some google searching. Our choice to keep maildir was based on the statements found online about dealing with corruption in an isolated way vs having it affect the entire mail system. Of course that concern could have been more academic, we started with maildir, it works just fine, so we're not trying to 'fix' that..