Create Youtube-Like IDs With PHP/Python/Javascript/Java/SQL

IDs are often numbers. Unfortunately there are only 10 digits to work with, so if you have a lot of records, IDs tend to get very lengthy. For computers that's OK. But human beings like their IDs as short as possible. So how can we make IDs shorter? Well, we could borrow characters from the alphabet as have them pose as additional numbers.... Alphabet to the rescue!

Other title options where

  • How to create unique short string IDs with PHP & MySQL
  • Or how to create IDs similar to YouTube e.g. yzNjIBEdyww

I created this function a long time ago. Time to be nice and share.

More is Less - the 'math'

The alphabet has 26 characters. That's a lot more than 10 digits. If we also distinguish upper- and lowercase, and add digits to the bunch or the heck of it, we already have (26 x 2 + 10) 62 options we can use per position in the ID.

Now of course we can also add additional funny characters to 'the bunch' like - / * & # but those may cause problems in URLs and that's our target audience for now.

OK so because there are roughly 6x more characters we will use per position, IDs will get much shorter. We can just fit a lot more data in each position.

This is basically what url shortening services do like tinyurl, is.gd, or bit.ly. But similar IDs can also be found at youtube: http://www.youtube.com/watch?v=**yzNjIBEdyww**

Convert your IDs

Now unlike Database servers: webservers are easy to scale so you can let them do a bit of converting to ease the life of your users, while keeping your database fast with numbers (MySQL really likes them plain numbers ; )

To do the conversion I've written a PHP function that can translate big numbers to short strings and vice versa. I call it: alphaID.

The resulting string is not hard to decipher, but it can be a very nice feature to make URLs or directorie structures more compact and significant.

So basically:

  • when someone requests rLHWfKd
  • alphaID() converts it to 999999999999
  • you lookup the record for id 999999999999 in your database

Source

<?php
/**
 * Translates a number to a short alhanumeric version
 *
 * Translated any number up to 9007199254740992
 * to a shorter version in letters e.g.:
 * 9007199254740989 --> PpQXn7COf
 *
 * specifiying the second argument true, it will
 * translate back e.g.:
 * PpQXn7COf --> 9007199254740989
 *
 * this function is based on any2dec && dec2any by
 * fragmer[at]mail[dot]ru
 * see: http://nl3.php.net/manual/en/function.base-convert.php#52450
 *
 * If you want the alphaID to be at least 3 letter long, use the
 * $pad_up = 3 argument
 *
 * In most cases this is better than totally random ID generators
 * because this can easily avoid duplicate ID's.
 * For example if you correlate the alpha ID to an auto incrementing ID
 * in your database, you're done.
 *
 * The reverse is done because it makes it slightly more cryptic,
 * but it also makes it easier to spread lots of IDs in different
 * directories on your filesystem. Example:
 * $part1 = substr($alpha_id,0,1);
 * $part2 = substr($alpha_id,1,1);
 * $part3 = substr($alpha_id,2,strlen($alpha_id));
 * $destindir = "/".$part1."/".$part2."/".$part3;
 * // by reversing, directories are more evenly spread out. The
 * // first 26 directories already occupy 26 main levels
 *
 * more info on limitation:
 * - http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/165372
 *
 * if you really need this for bigger numbers you probably have to look
 * at things like: http://theserverpages.com/php/manual/en/ref.bc.php
 * or: http://theserverpages.com/php/manual/en/ref.gmp.php
 * but I haven't really dugg into this. If you have more info on those
 * matters feel free to leave a comment.
 *
 * The following code block can be utilized by PEAR's Testing_DocTest
 * <code>
 * // Input //
 * $number_in = 2188847690240;
 * $alpha_in  = "SpQXn7Cb";
 *
 * // Execute //
 * $alpha_out  = alphaID($number_in, false, 8);
 * $number_out = alphaID($alpha_in, true, 8);
 *
 * if ($number_in != $number_out) {
 *   echo "Conversion failure, ".$alpha_in." returns ".$number_out." instead of the ";
 *   echo "desired: ".$number_in."\n";
 * }
 * if ($alpha_in != $alpha_out) {
 *   echo "Conversion failure, ".$number_in." returns ".$alpha_out." instead of the ";
 *   echo "desired: ".$alpha_in."\n";
 * }
 *
 * // Show //
 * echo $number_out." => ".$alpha_out."\n";
 * echo $alpha_in." => ".$number_out."\n";
 * echo alphaID(238328, false)." => ".alphaID(alphaID(238328, false), true)."\n";
 *
 * // expects:
 * // 2188847690240 => SpQXn7Cb
 * // SpQXn7Cb => 2188847690240
 * // aaab => 238328
 *
 * </code>
 *
 * @author  Kevin van Zonneveld <kevin@vanzonneveld.net>
 * @author  Simon Franz
 * @author  Deadfish
 * @author  SK83RJOSH
 * @copyright 2008 Kevin van Zonneveld (http://kevin.vanzonneveld.net)
 * @license   http://www.opensource.org/licenses/bsd-license.php New BSD Licence
 * @version   SVN: Release: $Id: alphaID.inc.php 344 2009-06-10 17:43:59Z kevin $
 * @link    http://kevin.vanzonneveld.net/
 *
 * @param mixed   $in   String or long input to translate
 * @param boolean $to_num  Reverses translation when true
 * @param mixed   $pad_up  Number or boolean padds the result up to a specified length
 * @param string  $pass_key Supplying a password makes it harder to calculate the original ID
 *
 * @return mixed string or long
 */
function alphaID($in, $to_num = false, $pad_up = false, $pass_key = null)
{
  $out   =   '';
  $index = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';
  $base  = strlen($index);

  if ($pass_key !== null) {
    // Although this function's purpose is to just make the
    // ID short - and not so much secure,
    // with this patch by Simon Franz (http://blog.snaky.org/)
    // you can optionally supply a password to make it harder
    // to calculate the corresponding numeric ID

    for ($n = 0; $n < strlen($index); $n++) {
      $i[] = substr($index, $n, 1);
    }

    $pass_hash = hash('sha256',$pass_key);
    $pass_hash = (strlen($pass_hash) < strlen($index) ? hash('sha512', $pass_key) : $pass_hash);

    for ($n = 0; $n < strlen($index); $n++) {
      $p[] =  substr($pass_hash, $n, 1);
    }

    array_multisort($p, SORT_DESC, $i);
    $index = implode($i);
  }

  if ($to_num) {
    // Digital number  <<--  alphabet letter code
    $len = strlen($in) - 1;

    for ($t = $len; $t >= 0; $t--) {
      $bcp = bcpow($base, $len - $t);
      $out = $out + strpos($index, substr($in, $t, 1)) * $bcp;
    }

    if (is_numeric($pad_up)) {
      $pad_up--;

      if ($pad_up > 0) {
        $out -= pow($base, $pad_up);
      }
    }
  } else {
    // Digital number  -->>  alphabet letter code
    if (is_numeric($pad_up)) {
      $pad_up--;

      if ($pad_up > 0) {
        $in += pow($base, $pad_up);
      }
    }

    for ($t = ($in != 0 ? floor(log($in, $base)) : 0); $t >= 0; $t--) {
      $bcp = bcpow($base, $t);
      $a   = floor($in / $bcp) % $base;
      $out = $out . substr($index, $a, 1);
      $in  = $in - ($a * $bcp);
    }
  }

  return $out;
}
?>

Get from GitHub

Example

Running:

alphaID(9007199254740989);

will return PpQXn7COf and:

alphaID('PpQXn7COf', true);

will return 9007199254740989

Easy right?

More features

  • There also is an optional third argument: $pad_up. This enables you to make the resulting alphaId at least X characters long.
  • You can support even more characters (making the resulting alphaID even smaller) by adding characters to the $index var at the top of the function body.

Bonus

Thanks to some wonderful contributions in the comment section, here are some interesting updates & additions:

Pro tip

You may want to remove vouwels (a, e, o, u, i) from $index as to avoid combinations that result in: 'penis' or other dirty words that could get your customers upset.

You can also use the $pad_up argument to enforce a minimum length of 5 characters as to avoid: 'nsfw' and 'wtf'.

Thanks to William for pointing this out ; )

Postgres Implementation

Thanks to William as well:

CREATE OR REPLACE FUNCTION string_to_bits(input_text TEXT)
RETURNS TEXT AS $$
DECLARE
output_text TEXT;
i INTEGER;
BEGIN
output_text := '';


FOR i IN 1..char_length(input_text) LOOP
output_text := output_text || ascii(substring(input_text FROM i FOR 1))::bit(8);
END LOOP;


return output_text;
END;
$$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION id_to_sid(id INTEGER)
RETURNS TEXT AS $$
DECLARE
output_text TEXT;
i INTEGER;
index TEXT[];
bits TEXT;
bit_array TEXT[];
input_text TEXT;
BEGIN
input_text := id::TEXT;
output_text := '';
index := string_to_array('0,d,A,3,E,z,W,m,D,S,Q,l,K,s,P,b,N,c,f,j,5,I,t,C,i,y,o,G,2,r,x,h,V,J,k,-,T,w,H,L,9,e,u,X,p,U,a,O,v,4,R,B,q,M,n,g,1,F,6,Y,_,8,7,Z', ',');

bits := string_to_bits(input_text);

IF length(bits) % 6 <> 0 THEN
bits := rpad(bits, length(bits) + 6 - (length(bits) % 6), '0');
END IF;

FOR i IN 1..((length(bits) / 6)) LOOP
IF i = 1 THEN
bit_array[i] := substring(bits FROM 1 FOR 6);
ELSE
bit_array[i] := substring(bits FROM 1 + (i - 1) * 6 FOR 6);
END IF;

output_text := output_text || index[bit_array[i]::bit(6)::integer + 1];
END LOOP;


return output_text;
END;
$$ LANGUAGE plpgsql;

Java Implementation

Thanks to Ant Kutschera there also is a Java version.

package uk.co.maxant.util;

import java.math.BigInteger;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
 * allows you to convert a whole number into a compacted representation of that number,
 * based upon the dictionary you provide. very similar to base64 encoding, or indeed hex
 * encoding.
 */
public class BaseX {

  /**
   * contains hexadecimals 0-F only.
   */
  public static final char[] DICTIONARY_16 =
    new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F'};

  /**
   * contains only alphanumerics, in capitals and excludes letters/numbers which can be confused,
   * eg. 0 and O or L and I and 1.
   */
  public static final char[] DICTIONARY_32 =
    new char[]{'1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','J','K','M','N','P','Q','R','S','T','U','V','W','X','Y','Z'};

  /**
   * contains only alphanumerics, including both capitals and smalls.
   */
  public static final char[] DICTIONARY_62 =
    new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z'};

  /**
   * contains alphanumerics, including both capitals and smalls, and the following special chars:
   * +"@*#%&/|()=?'~[!]{}-_:.,; (you might not be able to read all those using a browser!
   */
  public static final char[] DICTIONARY_89 =
    new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','+','"','@','*','#','%','&','/','|','(',')','=','?','~','[',']','{','}','$','-','_','.',':',',',';','<','>'};

  protected char[] dictionary;

  /**
   * create an encoder with the given dictionary.
   *
   * @param dictionary the dictionary to use when encoding and decoding.
   */
  public BaseX(char[] dictionary){
    this.dictionary = dictionary;
  }

  /**
   * creates an encoder with the {@link #DICTIONARY_62} dictionary.
   *
   * @param dictionary the dictionary to use when encoding and decoding.
   */
  public BaseX(){
    this.dictionary = DICTIONARY_62;
  }

  /**
   * tester method.
   */
  public static void main(String[] args) {
    String original = "123456789012345678901234567890";
    System.out.println("Original: " + original);
    BaseX bx = new BaseX(DICTIONARY_62);
    String encoded = bx.encode(new BigInteger(original));
    System.out.println("encoded: " + encoded);
    BigInteger decoded = bx.decode(encoded);
    System.out.println("decoded: " + decoded);
    if(original.equals(decoded.toString())){
      System.out.println("Passed! decoded value is the same as the original.");
    }else{
      System.err.println("FAILED! decoded value is NOT the same as the original!!");
    }
  }

  /**
   * encodes the given string into the base of the dictionary provided in the constructor.
   * @param value the number to encode.
   * @return the encoded string.
   */
  public String encode(BigInteger value) {

    List<Character> result = new ArrayList<Character>();
    BigInteger base = new BigInteger("" + dictionary.length);
    int exponent = 1;
    BigInteger remaining = value;
    while(true){
      BigInteger a = base.pow(exponent); //16^1 = 16
      BigInteger b = remaining.mod(a); //119 % 16 = 7 | 112 % 256 = 112
      BigInteger c = base.pow(exponent - 1);
      BigInteger d = b.divide(c);

      //if d > dictionary.length, we have a problem. but BigInteger doesnt have
      //a greater than method :-(  hope for the best. theoretically, d is always
      //an index of the dictionary!
      result.add(dictionary[d.intValue()]);
      remaining = remaining.subtract(b); //119 - 7 = 112 | 112 - 112 = 0

      //finished?
      if(remaining.equals(BigInteger.ZERO)){
        break;
      }

      exponent++;
    }

    //need to reverse it, since the start of the list contains the least significant values
    StringBuffer sb = new StringBuffer();
    for(int i = result.size()-1; i >= 0; i--){
      sb.append(result.get(i));
    }
    return sb.toString();
  }

  /**
   * decodes the given string from the base of the dictionary provided in the constructor.
   * @param str the string to decode.
   * @return the decoded number.
   */
  public BigInteger decode(String str) {

    //reverse it, coz its already reversed!
    char[] chars = new char[str.length()];
    str.getChars(0, str.length(), chars, 0);

    char[] chars2 = new char[str.length()];
    int i = chars2.length -1;
    for(char c : chars){
      chars2[i--] = c;
    }

    //for efficiency, make a map
    Map<Character, BigInteger> dictMap = new HashMap<Character, BigInteger>();
    int j = 0;
    for(char c : dictionary){
      dictMap.put(c, new BigInteger("" + j++));
    }

    BigInteger bi = BigInteger.ZERO;
    BigInteger base = new BigInteger("" + dictionary.length);
    int exponent = 0;
    for(char c : chars2){
      BigInteger a = dictMap.get(c);
      BigInteger b = base.pow(exponent).multiply(a);
      bi = bi.add(new BigInteger("" + b));
      exponent++;
    }

    return bi;

  }
}

JavaScript Implementation

Thanks to Even Simon, there's a JavaScript implementation. You will also find PHP version there, that implements the encode & decode functions as separate methods in a class.

/**
 *  Javascript AlphabeticID class
 *  (based on a script by Kevin van Zonneveld <kevin@vanzonneveld.net>)
 *
 *  Author: Even Simon <even.simon@gmail.com>
 *
 *  Description: Translates a numeric identifier into a short string and backwords.
 *
 *  Usage:
 *    var str = AlphabeticID.encode(9007199254740989); // str = 'fE2XnNGpF'
 *    var id = AlphabeticID.decode('fE2XnNGpF'); // id = 9007199254740989;
 **/

var AlphabeticID = {
  index:'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ',

  /**
   *  [@function](http://twitter.com/function) AlphabeticID.encode
   *  [@description](http://twitter.com/description) Encode a number into short string
   *  [@param](http://twitter.com/param) integer
   *  [@return](http://twitter.com/return) string
   **/
  encode:function(_number){
    if('undefined' == typeof _number){
      return null;
    }
    else if('number' != typeof(_number)){
      throw new Error('Wrong parameter type');
    }

    var ret = '';

    for(var i=Math.floor(Math.log(parseInt(_number))/Math.log(AlphabeticID.index.length));i>=0;i--){
      ret = ret + AlphabeticID.index.substr((Math.floor(parseInt(_number) / AlphabeticID.bcpow(AlphabeticID.index.length, i)) % AlphabeticID.index.length),1);
    }

    return ret.reverse();
  },

  /**
   *  [@function](http://twitter.com/function) AlphabeticID.decode
   *  [@description](http://twitter.com/description) Decode a short string and return number
   *  [@param](http://twitter.com/param) string
   *  [@return](http://twitter.com/return) integer
   **/
  decode:function(_string){
    if('undefined' == typeof _string){
      return null;
    }
    else if('string' != typeof _string){
      throw new Error('Wrong parameter type');
    }

    var str = _string.reverse();
    var ret = 0;

    for(var i=0;i<=(str.length - 1);i++){
      ret = ret + AlphabeticID.index.indexOf(str.substr(i,1)) * (AlphabeticID.bcpow(AlphabeticID.index.length, (str.length - 1) - i));
    }

    return ret;
  },

  /**
   *  [@function](http://twitter.com/function) AlphabeticID.bcpow
   *  [@description](http://twitter.com/description) Raise _a to the power _b
   *  [@param](http://twitter.com/param) float _a
   *  [@param](http://twitter.com/param) integer _b
   *  [@return](http://twitter.com/return) string
   **/
  bcpow:function(_a, _b){
    return Math.floor(Math.pow(parseFloat(_a), parseInt(_b)));
  }
};

/**
 *  [@function](http://twitter.com/function) String.reverse
 *  [@description](http://twitter.com/description) Reverse a string
 *  [@return](http://twitter.com/return) string
 **/
String.prototype.reverse = function(){
  return this.split('').reverse().join('');
};

C# Implementation

Thanks to Romas, there's a C# implementation.

Improved by rumble|strip

class ShortId
{
    public static readonly string Alphabet = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ";

    private static decimal BcPow(double a, double b)
    {
      return Math.Floor((decimal)Math.Pow(a, b));
    }

    public static ulong Decode(string value, int pad = 0)
    {
      value = value.ReverseString();
      var len = value.Length - 1;
      ulong result = 0;

      for (int t = len; t >= 0; t--)
      {
        var bcp = (ulong)BcPow(Alphabet.Length, len - t);
        result += (ulong)Alphabet.IndexOf(value[t]) * bcp;
      }

      if (pad > 0)
      {
        result -= (ulong)BcPow(Alphabet.Length, pad);
      }

      return result;
    }

    public static string Encode(byte[] value, int startIndex = 0, int pad = 0)
    {
      return Encode(BitConverter.ToUInt64(value, startIndex), pad);
    }

    public static string Encode(Guid guid, int pad = 0)
    {
      var bytes = guid.ToByteArray();

      var first = Encode(bytes, 0, pad);
      var second = Encode(bytes, 8, pad);

      return first + second;
    }

    public static string Encode(ulong value, int pad = 0)
    {
      var result = string.Empty;

      if (pad > 0)
      {
        value += (ulong)BcPow(Alphabet.Length, pad);
      }

      for (var t = (value != 0 ? Math.Floor(Math.Log(value, Alphabet.Length)) : 0); t >= 0; t--)
      {
        var bcp = (ulong)BcPow(Alphabet.Length, t);
        var a = ((ulong)Math.Floor((decimal)value / (decimal)bcp)) % (ulong)Alphabet.Length;
        result += Alphabet[(int)a];
        value  = value - (a * bcp);
      }

      return result.ReverseString();
    }

    private static string ReverseString(this string value)
    {
      char[] arr = value.ToCharArray();
      Array.Reverse(arr);
      return new string(arr);
    }
  }

Python Implementations

Thanks to wessite, there's a Python implementation.

ALPHABET = "bcdfghjklmnpqrstvwxyz0123456789BCDFGHJKLMNPQRSTVWXYZ"
BASE = len(ALPHABET)
MAXLEN = 6

def encode_id(self, n):

    pad = self.MAXLEN - 1
    n = int(n + pow(self.BASE, pad))

    s = []
    t = int(math.log(n, self.BASE))
    while True:
        bcp = int(pow(self.BASE, t))
        a = int(n / bcp) % self.BASE
        s.append(self.ALPHABET[a:a+1])
        n = n - (a * bcp)
        t -= 1
        if t < 0: break

    return "".join(reversed(s))

def decode_id(self, n):

    n = "".join(reversed(n))
    s = 0
    l = len(n) - 1
    t = 0
    while True:
        bcpow = int(pow(self.BASE, l - t))
        s = s + self.ALPHABET.index(n[t:t+1]) * bcpow
        t += 1
        if t > l: break

    pad = self.MAXLEN - 1
    s = int(s - pow(self.BASE, pad))

    return int(s)

Noah Miller contributed a version based on Wessite's, and changed it so it can use a passkey, and rolled it into one function:

import math
import hashlib

ALPHABET = "abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ"

def alphaID(idnum, to_num=False, pad_up=False, passkey=None):
  index = ALPHABET
  if passkey:
    i = list(index)
    passhash = hashlib.sha256(passkey).hexdigest()
    passhash = hashlib.sha512(passkey).hexdigest() if len(passhash) < len(index) else passhash
    p = list(passhash)[0:len(index)]
    index = ''.join(zip(*sorted(zip(p,i)))[1])

  base = len(index)

  if to_num:
    idnum = idnum[::-1]
    out = 0
    length = len(idnum) -1
    t = 0
    while True:
      bcpow = int(pow(base, length - t))
      out = out + index.index(idnum[t:t+1]) * bcpow
      t += 1
      if t > length: break

    if pad_up:
      pad_up -= 1
      if pad_up > 0:
        out -= int(pow(base, pad_up))
  else:
    if pad_up:
      pad_up -= 1
      if pad_up > 0:
        idnum += int(pow(base, pad_up))

    out = []
    t = int(math.log(idnum, base))
    while True:
      bcp = int(pow(base, t))
      a = int(idnum / bcp) % base
      out.append(index[a:a+1])
      idnum = idnum - (a * bcp)
      t -= 1
      if t < 0: break

    out = ''.join(out[::-1])

  return out

HaXe Implementation

Thanks to Andy Li, there's a HaXe implementation.

/**
 *  HaXe version of AlphabeticID
 *  Author: Andy Li <andy@onthewings.net>
 *  ported from...
 *
 *  Javascript AlphabeticID class
 *  Author: Even Simon <even.simon@gmail.com>
 *  which is based on a script by Kevin van Zonneveld <kevin@vanzonneveld.net>)
 *
 *  Description: Translates a numeric identifier into a short string and backwords.
 *  http://kevin.vanzonneveld.net/techblog/article/create_short_ids_with_php_like_youtube_or_tinyurl/
 **/

class AlphaID {
    static public var index:String = 'abcdefghijklmnopqrstuvwxyz0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ';

    static public function encode(_number:Int):String {
        var strBuf = new StringBuf();

        var i = 0;
        var end = Math.floor(Math.log(_number)/Math.log(index.length));
        while(i <= end) {
            strBuf.add(index.charAt((Math.floor(_number / bcpow(index.length, i++)) % index.length)));
        }

        return strBuf.toString();
    }

    static public function decode(_string:String):Int {
        var str = reverseString(_string);
        var ret = 0;

        var i = 0;
        var end = str.length - 1;
        while(i <= end) {
            ret += Std.int(index.indexOf(str.charAt(i)) * (bcpow(index.length, end-i)));
            ++i;
        }

        return ret;
    }

    inline static private function bcpow(_a:Float, _b:Float):Float {
        return Math.floor(Math.pow(_a, _b));
    }

    inline static private function reverseString(inStr:String):String {
        var ary = inStr.split("");
        ary.reverse();
        return ary.join("");
    }
}