Skip to main content

Case sensitivity in MySQL (do you really need it?)

File under

Not sure how this escaped my notice until now, but today I realized the default collation for MySQL columns is not case-sensitive. Among other things, this means that if you're searching for distinct values on a column with varchar, text, or any other non-binary data type, values that differ only in their cases (such as "apple" and "APpLE") are considered equal, and you won't get both values back. MySQL will return "apple" or "APpLE", but not both.

One guess as to how I discovered this. Heh.

Solution

I found three possible solutions to this problem:

  1. Use binary data types for the columns you want to be case-sensitive.

    You can do this by using the BINARY attribute during table creation; for instance:

    CREATE TABLE animals (name varchar(100) BINARY)

    You can also achieve this by changing the collation of an existing column to a binary format. For instance, if you're using the default collation of latin1_swedish_ci (and if you're reading this post, there's a good chance you are!), change it to latin1_bin.

    Doing this will give you the case-sensitivity you were after, but one drawback is that it might not sort your results the way you expected. Remember, in binary, "Z" comes before "a"; so if you're storing text in this column, and want to sort on it, using a binary collation might not be your best bet.

  2. Cast the column to binary on-the-fly in your query, like so:

    SELECT DISTINCT BINARY value FROM variable
  3. By casing the column in this manner, you retain the sort order specified by the column's collation. Essentially, it forces MySQL to do a byte-level comparison instead of a character-level comparison. So we've solved the sorting problem.

    However, as is so often the case, this solution causes an additional problem: if you've got an index on a binary column, MySQL might not be able to make efficient use of that index. The MySQL doc doesn't say under which circumstances you might run into problems, though, so maybe it's a very rare corner case (otherwise, I think they'd be more explicit). My guess is that in most situations, you're probably fine.

  4. Use a case-sensitive collation.

    latin1_general_cs is one such collation. You can tell by the "cs" suffix (which stands for ... well, you know).

Were I implementing case-sensitive searches (I'm not; more on that below), I would go with option number three. The results would be sorted the way I expect ("A", "a", "B", and so on), and comparisons would be case-sensitive. I wouldn't have to worry about any edge cases (unlikely as I am to run into them) and I wouldn't need to remember to add the BINARY attirbute to all my comparisons.

Even better, I wouldn't have to write a special query just to deal with the columns that I need to be case-sensitive. And fewer special cases is a good thing.

One additional wrinkle: it appears that you can't have a FULLTEXT index on columns with different collations.

That Answers How, but a Word of Caution

There is a fourth solution: don't use case-sensitive searches. Before you engage in an orgy of column collation change-ups, ask yourself whether you really need to perform case-sensitive searches, or if you just think you do.

For instance: suppose you've got a user management table that stores usenames and passwords, and you want this functionality because you plan to make your usernames case-sensitive. But would you really want users named "joe", "jOe", and "JOe" (not to mention "Joe") to exist on the same system all at the same time? Besides being a pain in the ass for the user to remember (was it "joe" or "Joe"?), it could easily lead to mass identity confusion. Kind of a contrived example, but hopefully you get the point.

Personally, I'm not using any case-sensitive columns. I entertained the possibility, but soon realized that they weren't needed in my situation. So remember: before you hack ... take a step back.

great information thanks

great information
 thanks

Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.