The Community Forums

Interact with an entire community of cPanel & WHM users!
  1. This site uses cookies. By continuing to use this site, you are agreeing to our use of cookies. Learn More.

UTF-8 filenames and URLs

Discussion in 'General Discussion' started by fbonomi, Aug 30, 2013.

  1. fbonomi

    fbonomi Member

    Joined:
    Aug 4, 2009
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    (running WHM 11.38.2 on CENTOS 6.4 x86_64)
    I have a file (on a local windows machine) called ò.txt
    If I upload it via FTP to cpanel, I see the name correctly in the FTP listing.
    In ssh, I see it listed as \362.txt but it's actually correctly named, i.e. if I type
    #rm "ò.txt"
    the file gets deleted.

    This file is not accessible as http://example.com/ò.txt, but only when percent-encoded (http://example.com/%f2.txt)

    I know that these characters are not in the standard for URLs, but I am migrating several sites with HTML containing references to images with accented names and I don't want to have to edit all of them

    Plus, in other cases I see this can be done, and the percent-encoded URL is just a graphical variant of the UTF-8 URL (i.e. you can type both versions and you'll get the same file)

    Is there a way to allow the user to see the accented form too in cpanel?
     
  2. simonas

    simonas Well-Known Member

    Joined:
    Apr 21, 2013
    Messages:
    141
    Likes Received:
    0
    Trophy Points:
    16
    Location:
    Lithuania
    cPanel Access Level:
    Root Administrator
    This is up to browser to encode unicode character (Using firefox it will auto encode it, thus open your file).
    Web server almost always is set to url decode.

    You don't need to edit them, just make sure they get properly packed (if not using standard cpanel migration).

    So when any modern browser will see unicode symbol in URL it will auto encode it, and web server auto decode it and it will open the file.
     
  3. fbonomi

    fbonomi Member

    Joined:
    Aug 4, 2009
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    I think I made some progress.

    I have the same file ( ò.txt ) uploaded via FTP to two different servers
    (allr urls entered as CODE to avoid interferences, else the forum chenges them!!)

    1) Non Cpanel server
    The file is accessible as
    Code:
    www.domain.it/%C3%B2.txt
    If you type the accented url, the url gets encoded by the browser and everything works:
    Code:
    http://www.domain.it/ò.txt
    So far, everything is ok

    2) Cpanel server
    The same file is only accessible with another encoding, namely
    Code:
    www.domaintoo.it/%f2.txt
    The fact is, if the user types the ò.txt form, the browser will encode it as %C3%B2.txt, not %f2.txt, and the request will therefore fail:

    Code:
    www.domaintoo.it/%C3%B2.txt
    It seems Cpanel is using ISO-8859-1 to decode URLs, not UTF-8

    Where can I change this setting?

    I have already tried implementing what is suggested here : http://forums.cpanel.net/f185/cant-get-utf-8-default-charset-352911.html#post1418721 (but of course using AddDefaultCharset UTF-8), with no effect
     
  4. fbonomi

    fbonomi Member

    Joined:
    Aug 4, 2009
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    Actually, I think the AddDefaultCharset directive has probably nothing to do with my issue, as it changes the encoding of the responses the server gives, not of the requests it receives.

    any other pointer?
     
  5. fbonomi

    fbonomi Member

    Joined:
    Aug 4, 2009
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    more detail:

    1) This is what gets typed in the browser
    Code:
    http://www.example.com/ò.txt
    2) These are the headers as seen by Live http headers
    Code:
    GET /%C3%B2.txt HTTP/1.1
    Host: www.example.com
    User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0
    Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
    Accept-Language: it-IT,it;q=0.8,en-US;q=0.5,en;q=0.3
    Accept-Encoding: gzip, deflate
    Connection: keep-alive
    
    3) This is what is seen in the access log
    Code:
    84.253.135.183 - - [31/Aug/2013:16:10:32 +0200] "GET /%C3%B2.txt HTTP/1.1" 404 - "-" "Mozilla/5.0 (Windows NT 5.1; rv:23.0) Gecko/20100101 Firefox/23.0"
    
    4) This is the error log
    Code:
    [Sat Aug 31 16:10:32 2013] [error] [client 84.253.135.183] File does not exist: /home/example/public_html/\xc3\xb2.txt
    Am I wrong in supposing the problem is between steps 3 & 4 ?
    I.e., apache gets a request to an UFT-8 percent-encoded file, but does not correctly decode it?
     
    #5 fbonomi, Aug 31, 2013
    Last edited: Aug 31, 2013
  6. cPanelMichael

    cPanelMichael Forums Analyst
    Staff Member

    Joined:
    Apr 11, 2011
    Messages:
    30,744
    Likes Received:
    662
    Trophy Points:
    113
    cPanel Access Level:
    Root Administrator
    Hello :)

    Could you open a support ticket so we can take a closer look?

    Submit A Ticket

    You can post the ticket number here so we can update this thread with the outcome.

    Thank you.
     
  7. fbonomi

    fbonomi Member

    Joined:
    Aug 4, 2009
    Messages:
    8
    Likes Received:
    0
    Trophy Points:
    1
    thanks,
    ticket # is 4336315
     
  8. cPanelMichael

    cPanelMichael Forums Analyst
    Staff Member

    Joined:
    Apr 11, 2011
    Messages:
    30,744
    Likes Received:
    662
    Trophy Points:
    113
    cPanel Access Level:
    Root Administrator
Loading...

Share This Page