Community Forums
Connect with us on LinkedIn
+ Reply to Thread
Results 1 to 3 of 3
  1. #1
    Member monarobase's Avatar
    Join Date
    Jan 2010
    Location
    France
    Posts
    387
    cPanel/Enkompass Access Level

    Root Administrator

    Default Ability to set collation when creating a new database

    Some scripts need to have UTF-8 set as default collation / character set, most work fine on latin1 or change this themselves.

    When cPanel creates a database the default for both MySQL and PostgreSQL is latin1.

    I would like users to be able to choose their own collation when creating a database. A simple dropdown with latin1 as default and utf-8 as second choice (I don't know if any others would be needed, maybe utf16 ?).

    You can do this in phpmyadmin after creating a mysql database but not in PostgreSQL, however lots of beginner users don't know how to do this and if it was possible to choose this option when creating a new database it would make things a lot easier.

    I don't think this would be very difficult to implement.

  2. #2
    Technical Product Specialist cPanelDavidG's Avatar
    Join Date
    Nov 2006
    Location
    Houston, TX
    Posts
    11,189
    cPanel/Enkompass Access Level

    Root Administrator

    Default Re: Ability to set collation when creating a new database

    I discussed this with a developer yesterday. Like you, I thought this wouldn't be very difficult to implement and I'm a big fan of UTF8 since it gets the job done for essentially everyone without becoming a hassle like other character sets like ASCII and latin1.

    We focused on MySQL in our discussions.

    First, we need to pick which UTF8 we would want to use. There's UTF8 binary, UTF8 case insensitive and UTF8 case sensitive. After researching the issue, we discovered the differences and the differences are most easily explained via illustration.

    Let's say we have a German-language website. In the German language used in Germany, sometimes words that have 2 s's next to one-another will replace the "ss" with "ß" - so strasse becomes straße. However, someone from Switzerland (where German is also spoken) would probably not use the letter ß. They instead use ss all the time. Let's say that German speakers from both countries are on the same internet forum that has a search function that relies on MySQL.

    If you created the database using UTF8 binary, it does a literal bit-by-bit comparison. This means doing a search for straße will not return any results for strasse.

    However, if you created the database using UTF8 case sensitive or UTF8 case insensitive, MySQL will instead do a linguistic comparison. In this collation, MySQL knows that straße is linguistically identical to strasse and will thus return both even if you're just searching for one. In fact, it'll go as far as saying "strasse" is identical to "straße" when it comes to enforcing unique keys in this collation.

    Based on that, it seems we've boiled our options down to UTF8 case sensitive or UTF8 case insensitive. By default, cPanel&WHM uses latin1_swedish_ci (ci means case insensitive) so for consistency, we will probably use UTF8 case insensitive (utf8_general_ci). This means when you do a search in MySQL, "Bob" will match "Bob" and "bob" etc.


    If we can agree on the collation, doing this in MySQL seems quite doable, we can even change this default at runtime: MySQL :: MySQL 3.23, 4.0, 4.1 Reference Manual :: 9.1.3.1 Server Character Set and Collation


    EDIT: As for PostgreSQL, setting things up is more complicated (not by much) but the character set decision is much easier: http://www.postgresql.org/docs/8.1/s...multibyte.html - we'd probably go with UTF-8 here.
    Last edited by cPanelDavidG; 09-08-2011 at 09:08 AM. Reason: Forgot to show PostgreSQL some love.

  3. #3
    Member monarobase's Avatar
    Join Date
    Jan 2010
    Location
    France
    Posts
    387
    cPanel/Enkompass Access Level

    Root Administrator

    Default Re: Ability to set collation when creating a new database

    I believe that utf8_general_ci is the best option for default UTF8, if someone needs a specific case sensitive UTF8 thy can change this after they have created the database. As far as I know most databases work with utf8_general_ci.

    For PostgresSQL the choice between Latin1 and UTF-8 would be great.

    Some scripts still need latin1_swedish_ci for mysql and Latin1 for postgres, so as far as I'm concerned the default should be latin1 with the option to change this to UTF-8 while creating the database.

Similar Threads & Tags
Similar threads

  1. default MySQL collation and character set / charset
    By tgavin in forum Database Discussions
    Replies: 17
    Last Post: 02-01-2012, 04:51 PM
  2. default MySQL collation and character set / charset
    By tgavin in forum cPanel and WHM Discussions
    Replies: 12
    Last Post: 08-26-2010, 05:32 PM
  3. How can I change my MySQL database collation via phpMyAdmin?
    By dreamzgaurav in forum Database Discussions
    Replies: 0
    Last Post: 01-07-2010, 04:56 AM
  4. Disable ability to set up email on subdomains...
    By 4u123 in forum cPanel and WHM Discussions
    Replies: 0
    Last Post: 04-27-2009, 05:07 AM
  5. Mysql character set / collation problem with Cpanel
    By bonis in forum New User Questions
    Replies: 0
    Last Post: 06-07-2008, 09:03 PM
Linkedin       Facebook       Twitter       RSS       Flickr       YouTube