The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

Cpanel filter for non-english characters

Discussion in 'General Discussion' started by dandanfireman, May 1, 2006.

  1. dandanfireman

    dandanfireman Well-Known Member
    PartnerNOC

    Joined:
    May 31, 2002
    Messages:
    117
    Likes Received:
    0
    Trophy Points:
    16
    I have a customer that is looking to filter all incoming emails on an account to exclude any that have non-english characters in them. Since the customer can't read any language other than english, this seems fairly logical.


    I understand that some languages might be difficult to detect. What about just excluding eastern languages that use a completely different character set? The customer is specifically getting a lot of messages in Korean, and would like them to go away.


    TO anyone reading this that might believe this is in someway discriminatory, please don't bother replying. It is simply a technical question trying to avoid unwanted emails by a customer.
     
  2. chirpy

    chirpy Well-Known Member

    Joined:
    Jun 15, 2002
    Messages:
    13,475
    Likes Received:
    20
    Trophy Points:
    38
    Location:
    Go on, have a guess
    Have a look at the email headers. You might see a change in the header record from the standard ASCII:

    Content-Transfer-Encoding: 7bit

    to something else for those languages. If so, you could filter on that header record.
     
  3. casey

    casey Well-Known Member

    Joined:
    Jan 17, 2003
    Messages:
    2,303
    Likes Received:
    0
    Trophy Points:
    36
    Location:
    If there is trouble, it will find me
    Look into spamassassin config as well:

    # Speakers of Asian languages, like Chinese, Japanese and Korean, will almost
    # definitely want to uncomment the following lines. They will switch off some
    # rules that detect 8-bit characters, which commonly trigger on mails using CJK
    # character sets, or that assume a western-style charset is in use.
    #
    # score HTML_COMMENT_8BITS 0
    # score UPPERCASE_25_50 0
    # score UPPERCASE_50_75 0
    # score UPPERCASE_75_100 0
     
Loading...

Share This Page