Base64 數據編碼規則字符錶實現代碼

Base64 是用 64 個可打印字符錶示 8 位二進製數據 (含無法顯示打印字符) 的編碼方式。

完整的 base64 定義可見 PEM (Privacy Enhancement for Internet Electronic Mail) 和 MIME (Multipurpose Internet Mail Extensions) 。

Base64 以 4 字符存儲 3 字節二進製數據，因此長度增加 1/3。

Base64 包含下列字符：

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

或

ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

編碼規則

Base64 要求把每三個 8Bit 字節轉換為四個 6Bit 字節 (3*8 = 4*6 = 24)，然後再嚮每 6Bit 字節 (2 ⁶ = 64) 添兩高位 0，組成四個 8Bit 字節。

轉換前	轉換後	十進製	Base64
10101101 10111010 01110110	00101011 00011011 00101001 00110110	43 27 41 54	r b p 2
01110011 00110001 00110011	00011100 00110011 00000100 00110011	28 51 04 51	c z E z

以這 4 個數字作為索引，然後查錶獲得相應的 4 個字符，就是編碼後的字符串。

若要編碼的二進製數據不是 3 的倍數，最後剩下 1 或 2 個字節，Base64 用 \x00 字節在末尾補足後，再在編碼的末尾加上 1 或最多 2 個 = 號。

MIME 將 Base64 編碼數據的行長度強製限製為 76 字符。MIME 繼承瞭 PEM (Privacy Enhancement for Internet Electronic Mail) 的編碼，但 PEM 使用 64 個字符的行長。MIME 和 PEM 限製都是由於 SMTP 中的限製。
除非相關規範明確指示編碼器在特定數量的字符後添加換行，否則，實現不得將換行添加到編碼數據中。
當無法確定傳輸數據的長度時，纔可在編碼數據末尾填充 = 字符，且會忽略如 === 多餘結尾字符。

Base64 字符錶

值	編碼	值	編碼	值	編碼	值	編碼
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	+
12	M	29	d	46	u	63	/
13	N	30	e	47	v	(pad)	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

Base64 安全字符錶 (用於 URL 和文件名)

值	編碼	值	編碼	值	編碼	值	編碼
0	A	17	R	34	i	51	z
1	B	18	S	35	j	52	0
2	C	19	T	36	k	53	1
3	D	20	U	37	l	54	2
4	E	21	V	38	m	55	3
5	F	22	W	39	n	56	4
6	G	23	X	40	o	57	5
7	H	24	Y	41	p	58	6
8	I	25	Z	42	q	59	7
9	J	26	a	43	r	60	8
10	K	27	b	44	s	61	9
11	L	28	c	45	t	62	-
12	M	29	d	46	u	63	_
13	N	30	e	47	v	(pad)	=
14	O	31	f	48	w
15	P	32	g	49	x
16	Q	33	h	50	y

MIME

在 MIME 格式的電子郵件中， base64 可以將 binary 字節序列數據編碼成 ASCII 字符序列構成的文本。

在電子郵件中，根據 RFC822 規定每 76 個字符還需要加上一迴車換行。編碼後數據長度大約為原長的 135.1%。

特殊字符

標準的 Base64 並不適閤直接放在 URL 裏傳輸，因為 URL 編碼器會把標準 Base64 中的 / 和 + 字符轉變成 %XX 形式，而 % 號在存入數據庫時還需再轉換 (ANSI SQL 已將 % 號用作通配符)。

可采用 Base62 ，或在 URL 末尾填充 = 號，將標準 Base64 中的 + 和 / 分彆改成瞭 - 和 _，這樣可避免 URL 編解碼和數據庫存儲時的轉換，不增加數據長度，還統一瞭數據庫錶單等處標識符格式。

注意：

在正則錶達式中， + 和 / 具有特殊含義。

把 + 和 / 用作編程語言標識符或關鍵詞，會産生異常。

把 + 和 / 用於傳統文本搜索索引工具，會被視為斷詞。

Python 實現

string 中的字符，都必須在 ASCII 字符集範圍內。

def base(string:str)->str:
    base, old_string, new_string = "", "", []
    base64_list = ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L", "M", "N", "O", "P",
                   "Q", "R", "S", "T", "U", "V", "W", "X", "Y", "Z", "a", "b", "c", "d", "e", "f",
                   "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v",
                   "w", "x", "y", "z", "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "+", "/"]
 
    #把原始字符串轉換為二進製，bin 轉換後是 0b 開頭的，所以把 b 替換瞭，首位補 0 補齊 8 位
    for i in string:
        old_string += "{:08}".format(int(str(bin(ord(i))).replace("0b", "")))
 
    #把轉換好的二進製按照 6 位一組分好，最後一組不足 6 位的後麵補 0
    for j in range(0, len(old_string), 6):
        new_string.append("{:<06}".format(old_string[j:j + 6]))
 
    #在 base_list 中找到對應的字符，拼接
    for l in range(len(new_string)):
        base += base64_list[int(new_string[l], 2)]
 
    #判斷base字符結尾補幾個 =
    if len(string) % 3 == 1:
        base += "=="
    elif len(string) % 3 == 2:
        base += "="
    return base

JavaScript 實現

if (!Shotgun)
    var Shotgun = {};
if (!Shotgun.Js)
    Shotgun.Js = {};
Shotgun.Js.Base64 = {
    _table: [
        'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
        'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
        'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
        'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/'
    ],
    encode: function (bin) {
        var codes = [];
        var un = 0;
        un = bin.length % 3;
        if (un == 1)
            bin.push(0, 0);
        else if (un == 2)
            bin.push(0);
        for (var i = 2; i < bin.length; i += 3) {
            var c = bin[i - 2] << 16;
            c |= bin[i - 1] << 8;
            c |= bin[i];
            codes.push(this._table[c >> 18 & 0x3f]);
            codes.push(this._table[c >> 12 & 0x3f]);
            codes.push(this._table[c >> 6 & 0x3f]);
            codes.push(this._table[c & 0x3f]);
        }
        if (un >= 1) {
            codes[codes.length - 1] = "=";
            bin.pop();
        }
        if (un == 1) {
            codes[codes.length - 2] = "=";
            bin.pop();
        }
        return codes.join("");
    },
    decode: function (base64Str) {
        var i = 0;
        var bin = [];
        var x = 0, code = 0, eq = 0;
        while (i < base64Str.length) {
            var c = base64Str.charAt(i++);
            var idx = this._table.indexOf(c);
            if (idx == -1) {
                switch (c) {
                    case '=': idx = 0; eq++; break;
                    case ' ':
                    case '\n':
                    case "\r":
                    case '\t':
                        continue;
                    default:
                        throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u65E0\u6548\u7F16\u7801\uFF1A" + c };
                }
            }
            if (eq > 0 && idx != 0)
                throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u683C\u5F0F\u9519\u8BEF\uFF01" };
            code = code << 6 | idx;
            if (++x != 4)
                continue;
            bin.push(code >> 16);
            bin.push(code >> 8 & 0xff);
            bin.push(code & 0xff)
            code = x = 0;
        }
        if (code != 0)
            throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u6570\u636E\u957F\u5EA6\u9519\u8BEF" };
        if (eq == 1)
            bin.pop();
        else if (eq == 2) {
            bin.pop();
            bin.pop();
        } else if (eq > 2)
            throw { "message": "\u0062\u0061\u0073\u0065\u0036\u0034\u002E\u0074\u0068\u0065\u002D\u0078\u002E\u0063\u006E\u0020\u0045\u0072\u0072\u006F\u0072\u003A\u7F16\u7801\u683C\u5F0F\u9519\u8BEF\uFF01" };
        return bin;
    }
};

內容錶

編碼規則
字符錶
1. Base64 字符錶
2. Base64 安全字符錶
MIME
特殊字符
實現代碼
1. Python 實現
2. JavaScript 實現
範例
1. 編碼範例
2. 使用範例

上一話題

Base62 數據編碼規則

下一話題

Base85 數據編碼規則

快速搜索

編碼範例

輸入數據	十六進製	二進製	六進製	十進製	輸齣
0x14fb9c03d97e	1 4 f b 9 c 0 3 d 9 7 e	00010100 11111011 10011100 00000011 11011001 11111110	000101 001111 101110 011100 000000 111101 100111 111110	5 15 46 28 0 61 37 62	F P u c A 9 l +
0x14fb9c03d9	1 4 f b 9 c 0 3 d 9	00010100 11111011 10011100 00000011 11011001	000101 001111 101110 011100 000000 111101 100100	5 15 46 28 0 61 36	F P u c A 9 k =
0x14fb9c03	1 4 f b 9 c 0 3	00010100 11111011 10011100 00000011	000101 001111 101110 011100 000000 110000	5 15 46 28 0 48	F P u c A w = =

使用範例

采用數字 IDE Shell 進行快速交互測試：

# Python2.7
>>> import base64
>>> base64.b64encode("binary\x00string")
'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode("YmluYXJ5AHN0cmluZw==")
'binary\x00string'
 
# Python3.6
>>> base64.b64encode("binary\x00string".encode("utf-8"))
b'YmluYXJ5AHN0cmluZw=='
>>> base64.b64decode(b"YmluYXJ5AHN0cmluZw==")
'binary\x00string'
 
# Python3.6
>>> base64.b64encode("简体中文".encode("utf-8"))
b'566A5L2T5Lit5paH'
>>> base64.b64decode(b"566A5L2T5Lit5paH")
b'\xe7\xae\x80\xe4\xbd\x93\xe4\xb8\xad\xe6\x96\x87'
>>> base64.b64decode(b"566A5L2T5Lit5paH").decode("utf-8")
'简体中文'
 
>>> base64.b64encode("简体中文".encode("gb18030"))
b'vPLM5dbQzsQ='

# Python2.7
>>> base64.b64encode("i\xb7\x1d\xfb\xef\xff")
'abcd++//'
>>> base64.urlsafe_b64encode("i\xb7\x1d\xfb\xef\xff")
'abcd--__'
>>> base64.urlsafe_b64decode("abcd--__")
'i\xb7\x1d\xfb\xef\xff'

由於 = 字符可能齣現在 Base64 編碼中，但 = 在 URL Cookie 裏會造成歧義，所以，可以把 = 去掉：

# Python2.7
>>> base64.b64decode("YWJjZA==")
'abcd'
>>> base64.b64decode("YWJjZA")
Traceback (most recent call last):
  ...
TypeError: Incorrect padding
>>> base64.safe_b64decode("YWJjZA")
'abcd'

# Python3.6
>>> string = "Man is distinguished, not only by his reason, but by this singular passion from other animals, which is a lust of the mind, that by a perseverance of delight in the continued and indefatigable generation of knowledge, exceeds the short vehemence of any carnal pleasure."
>>> from base64 import b64encode
>>> b64encode(string.encode("utf-8"))
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlz
IHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2Yg
dGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGlu
dWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRo
ZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

另請參閱：

Domain Name System Security Extensions

MIME (Multipurpose Internet Mail Extensions)

The Base16, Base32, and Base64 Data Encodings

PEM (Privacy Enhancement for Internet Electronic Mail)