分享代码系列——parseInt(包含java和c语言的atoi方法）

jdk中的Integer类是int对象的包装类，正常的Integer占用内存开销要比int大，比例大概是1:4 。今天分享的代码是Integer类中的静态方法parseInt(String, int)。这个方法众所周知，甚至在我们一开始学习编程时就尝试的写过这样的代码，一个正常的思路：遍历输入的字符数组（java的字符串就是一个字符数组），然后parse每个char，依据参数给定的进制，判断每个char是否满足，满足则继续，否则抛出异常或中断，直到处理完毕所有字符，返回结果。

那么我们看看jdk给出的实现：

    public static int parseInt(String s, int radix)
        throws NumberFormatException
    {
        if (s == null) {
            throw new NumberFormatException("null");
        }

    if (radix < Character.MIN_RADIX) {
        throw new NumberFormatException("radix " + radix +
                        " less than Character.MIN_RADIX");
    }

    if (radix > Character.MAX_RADIX) {
        throw new NumberFormatException("radix " + radix +
                        " greater than Character.MAX_RADIX");
    }

    int result = 0;
    boolean negative = false;
    int i = 0, max = s.length();
    int limit;
    int multmin;
    int digit;

    if (max > 0) {
        if (s.charAt(0) == '-') {
        negative = true;
        limit = Integer.MIN_VALUE;
        i++;
        } else {
        limit = -Integer.MAX_VALUE;
        }
        multmin = limit / radix;
        if (i < max) {
        digit = Character.digit(s.charAt(i++),radix);
        if (digit < 0) {
            throw NumberFormatException.forInputString(s);
        } else {
            result = -digit;
        }
        }
        while (i < max) {
        // Accumulating negatively avoids surprises near MAX_VALUE
        digit = Character.digit(s.charAt(i++),radix);
        if (digit < 0) {
            throw NumberFormatException.forInputString(s);
        }
        if (result < multmin) {
            throw NumberFormatException.forInputString(s);
        }
        result *= radix;
        if (result < limit + digit) {
            throw NumberFormatException.forInputString(s);
        }
        result -= digit;
        }
    } else {
        throw NumberFormatException.forInputString(s);
    }
    if (negative) {
        if (i > 1) {
        return result;
        } else {    /* Only got "-" */
        throw NumberFormatException.forInputString(s);
        }
    } else {
        return -result;
    }
    }

过程就是按照思路来的，但是更全面一些，首先做一些参数检查，然后定义了局部变量用于计算：result是对应的int结果，negative对应是否是负数的判断，i是遍历用的索引指针，max代表字符串的长度，limit是合法数字的上限（下限），digit是当前扫描到的字符对应的数字，multmin是在做乘法计算时能走到的合法下限。

严谨是这段程序最大的特点，因为有符号int的上下限是-2147483648~2147483647，可见负数表达的范围比正数多一个，这样就好理解为什么在开头要把limit全部表达为负数（下限），这样的操作减少了后续的判断，可以一步到位，相当于二者选择取其大一样，大的包含了小的。同理，那么multmin也就是负数了，而且可以认为是只和进制参数radix有关系。接着每个char的扫描计算digit利用到了Character.digit(char,int) 方法，这个方法就是在调用CharacterDataLatin1.digit(codePoint, radix) 方法，而这个新的方法其实只是去静态数组中取个映射而已。最后当顺利的执行完while循环后，result结果也就计算好了。

作为程序设计人员，我最初接触的语言是C++，当初用到的库函数是atoi，那么我们看看atoi的库标准实现：

int
atoi(str)
	const char *str;
{
	_DIAGASSERT(str != NULL);

	return((int)strtol(str, (char **)NULL, 10));
}

其中调用了strtol方法，参数传递的radix是10，也就是说我们常用的atoi是默认转化字符串到10进制的。其中开始时还进行了一个trim的操作，而且支持16进制的0x开头，可谓完全的尽善尽美啊。

strtol方法：

#define	_FUNCNAME	strtol
#define	__INT		long
#define	__INT_MIN	LONG_MIN
#define	__INT_MAX	LONG_MAX

__INT
_FUNCNAME(const char *nptr, char **endptr, int base)
{
	const char *s;
	__INT acc, cutoff;
	char c;
	int i, neg, any, cutlim;

	_DIAGASSERT(nptr != NULL);
	/* endptr may be NULL */

	/* check base value */
	if (base && (base < 2 || base > 36)) {
		errno = EINVAL;
		return(0);
	}

	/*
	 * Skip white space and pick up leading +/- sign if any.
	 * If base is 0, allow 0x for hex and 0 for octal, else
	 * assume decimal; if base is already 16, allow 0x.
	 */
	s = nptr;
	do {
		c = *s++;
	} while (isspace(c));
	if (c == '-') {
		neg = 1;
		c = *s++;
	} else {
		neg = 0;
		if (c == '+')
			c = *s++;
	}
	if ((base == 0 || base == 16) &&
	    c == '0' && (*s == 'x' || *s == 'X')) {
		c = s[1];
		s += 2;
		base = 16;
	}
	if (base == 0)
		base = c == '0' ? 8 : 10;

	/*
	 * Compute the cutoff value between legal numbers and illegal
	 * numbers.  That is the largest legal value, divided by the
	 * base.  An input number that is greater than this value, if
	 * followed by a legal input character, is too big.  One that
	 * is equal to this value may be valid or not; the limit
	 * between valid and invalid numbers is then based on the last
	 * digit.  For instance, if the range for longs is
	 * [-2147483648..2147483647] and the input base is 10,
	 * cutoff will be set to 214748364 and cutlim to either
	 * 7 (neg==0) or 8 (neg==1), meaning that if we have accumulated
	 * a value > 214748364, or equal but the next digit is > 7 (or 8),
	 * the number is too big, and we will return a range error.
	 *
	 * Set any if any `digits' consumed; make it negative to indicate
	 * overflow.
	 */
	cutoff = neg ? __INT_MIN : __INT_MAX;
	cutlim = (int)(cutoff % base);
	cutoff /= base;
	if (neg) {
		if (cutlim > 0) {
			cutlim -= base;
			cutoff += 1;
		}
		cutlim = -cutlim;
	}
	for (acc = 0, any = 0;; c = *s++) {
		if (!isascii(c))
			break;
		if (isdigit(c))
			i = c - '0';
		else if (isalpha(c))
			i = c - (isupper(c) ? 'A' - 10 : 'a' - 10);
		else
			break;
		if (i >= base)
			break;
		if (any < 0)
			continue;
		if (neg) {
			if (acc < cutoff || (acc == cutoff && i > cutlim)) {
				any = -1;
				acc = __INT_MIN;
				errno = ERANGE;
			} else {
				any = 1;
				acc *= base;
				acc -= i;
			}
		} else {
			if (acc > cutoff || (acc == cutoff && i > cutlim)) {
				any = -1;
				acc = __INT_MAX;
				errno = ERANGE;
			} else {
				any = 1;
				acc *= base;
				acc += i;
			}
		}
	}
	if (endptr != 0)
		/* LINTED interface specification */
		*endptr = __DECONST(char *, (any ? s - 1 : nptr));
	return(acc);
}

当然，类似的代码还有很多，这里只列出了两大语言的库实现，总体思路是一致的，当我们设计api时，这种编程思路和风格以及功能的考虑是我们需要学习的。

下面这两篇stackoverflow的问答给出了一些比较全面的c风格代码，可以参考，这里不贴全文只给link：

http://stackoverflow.com/questions/194465/how-to-parse-a-string-to-an-int-in-c

http://stackoverflow.com/questions/4442658/c-parse-int-from-string

参考文献：

jdk文档及源码

c库函数源码及文档

分享代码系列——parseInt(包含java和c语言的atoi方法）

分享代码系列——parseInt(包含java和c语言的atoi方法）

你可能感兴趣的:(分享代码系列——parseInt(包含java和c语言的atoi方法）)