Java6 String.substring()方法的内存泄露

substring(start,end)在Java编程里面经常使用,没想到如果使用不当,会出现内存泄露。

 

要了解substring(),最好的方法便是查看源码(jdk6):

 1  /**

 2      * <blockquote><pre>

 3      * "hamburger".substring(4, 8) returns "urge"

 4      * "smiles".substring(1, 5) returns "mile"

 5      * </pre></blockquote>

 6      *

 7      * @param      beginIndex   the beginning index, inclusive.

 8      * @param      endIndex     the ending index, exclusive.

 9      * @return     the specified substring.

10      * @exception  IndexOutOfBoundsException  if the

11      *             <code>beginIndex</code> is negative, or

12      *             <code>endIndex</code> is larger than the length of

13      *             this <code>String</code> object, or

14      *             <code>beginIndex</code> is larger than

15      *             <code>endIndex</code>.

16      */

17     public String substring(int beginIndex, int endIndex) {

18     if (beginIndex < 0) {

19         throw new StringIndexOutOfBoundsException(beginIndex);

20     }

21     if (endIndex > count) {

22         throw new StringIndexOutOfBoundsException(endIndex);

23     }

24     if (beginIndex > endIndex) {

25         throw new StringIndexOutOfBoundsException(endIndex - beginIndex);

26     }

27     return ((beginIndex == 0) && (endIndex == count)) ? this :

28         new String(offset + beginIndex, endIndex - beginIndex, value);

29     }

 

插一句,这段substring()的源代码,为如何编写api提供了很好的一个例子,让我想起了老赵的一篇文章,对参数的判断,异常的处理,思路上有点接近。

值得注意的是,如果调用substring(i,i)的话(即beginIndex==endIndex)或者是substring(stringLength)(即是beginIndex==字符串长度),并不会抛出异常,而是会返回一个空的字符串,因为new String(offset + beginIndex , 0 , value)。

 

言归正传,真正创建字符串的,是一个String(int,in,char[])的构造函数,源代码如下:

1 // Package private constructor which shares value array for speed.

2     String(int offset, int count, char value[]) {

3     this.value = value;

4     this.offset = offset;

5     this.count = count;

6     }

 

Java里的字符串,其实是由三个私有变量定义:

public final class String

    implements java.io.Serializable, Comparable<String>, CharSequence

{

    /** The value is used for character storage. */

    private final char value[];



    /** The offset is the first index of the storage that is used. */

    private final int offset;



    /** The count is the number of characters in the String. */

    private final int count;

}

 

当为字符串分配内存时,char数组存储字符,offset=0,count=字符串长度。问题在于,由substring(start,end)调用构造函数String(int,in,char[])时,实际上是改变offset和count的位置达到取得子字符串的目的,而子字符串里的value[]数组,仍然指向原字符串。假设原字符串s有1GB,且我们需要的是s.substring(1,10)这样一段小的字符串,但由于substring()里的value[]数组仍然指向1GB的原字符串,导致原字符串无法在GC中释放,从而产生了内存泄露。

 

但为什么要这样设计呢?由于String是不可变的(immutable),基于这种共享同一个字符数组的设计有以下好处:

调用substring()时无需复制数组,可重用value[]数组;且substring()的运行是常数时间,非线性,性能得到提高(这也是第二段代码注释的意思:share values for speed)。

而劣势,便是可能会产生内存泄露(实际上,Oracle早有人提出这个bug:http://bugs.sun.com/view_bug.do?bug_id=4513622)。

 

如何避免这个问题呢?有一个变通的方案,通过一个构造函数,复制一段数组:

 1 /**

 2      * Initializes a newly created {@code String} object so that it represents

 3      * the same sequence of characters as the argument; in other words, the

 4      * newly created string is a copy of the argument string. Unless an

 5      * explicit copy of {@code original} is needed, use of this constructor is

 6      * unnecessary since Strings are immutable.

 7      *

 8      * @param  original

 9      *         A {@code String}

10      */

11     public String(String original) {

12     int size = original.count;

13     char[] originalValue = original.value;

14     char[] v;

15       if (originalValue.length > size) {

16          // The array representing the String is bigger than the new

17          // String itself.  Perhaps this constructor is being called

18          // in order to trim the baggage, so make a copy of the array.

19             int off = original.offset;

20             v = Arrays.copyOfRange(originalValue, off, off+size);

21      } else {

22          // The array representing the String is the same

23          // size as the String, so no point in making a copy.

24         v = originalValue;

25      }

26     this.offset = 0;

27     this.count = size;

28     this.value = v;

29     }

30 

31 //smalStr no longer holds the value[] of 1GB

32 String smallStr = new String(s.substring(1,10));

 

上面的构造方法,重新复制了一段数组给v,然后再将v给字符串的数组,从而避免内存泄露。

 

在Java7里,String的实现已经改变,substring()方法的实现,由原来的共享数组变成了传统的拷贝,杜绝了内存泄露的同时也将运行时间由常数变成了线性:

 1 public String substring(int beginIndex, int endIndex) {

 2         if (beginIndex < 0) {

 3             throw new StringIndexOutOfBoundsException(beginIndex);

 4         }

 5         if (endIndex > value.length) {

 6             throw new StringIndexOutOfBoundsException(endIndex);

 7         }

 8         int subLen = endIndex - beginIndex;

 9         if (subLen < 0) {

10             throw new StringIndexOutOfBoundsException(subLen);

11         }

12         return ((beginIndex == 0) && (endIndex == value.length)) ? this

13                 : new String(value, beginIndex, subLen);

14     }
/**

     * Allocates a new {@code String} that contains characters from a subarray

     * of the character array argument. The {@code offset} argument is the

     * index of the first character of the subarray and the {@code count}

     * argument specifies the length of the subarray. The contents of the

     * subarray are copied; subsequent modification of the character array does

     * not affect the newly created string.

     *

     * @param  value

     *         Array that is the source of characters

     *

     * @param  offset

     *         The initial offset

     *

     * @param  count

     *         The length

     *

     * @throws  IndexOutOfBoundsException

     *          If the {@code offset} and {@code count} arguments index

     *          characters outside the bounds of the {@code value} array

     */

    public String(char value[], int offset, int count) {

        if (offset < 0) {

            throw new StringIndexOutOfBoundsException(offset);

        }

        if (count < 0) {

            throw new StringIndexOutOfBoundsException(count);

        }

        // Note: offset or count might be near -1>>>1.

        if (offset > value.length - count) {

            throw new StringIndexOutOfBoundsException(offset + count);

        }

        this.value = Arrays.copyOfRange(value, offset, offset+count);

    }

 

这个构造函数,每次都会复制数组,实现与Java6并不一样。至于哪个好哪个坏,其实很难说清楚。

据说有一种Rope的数据结构,可以更加高效地处理字符串,得好好看看。

 

参考:

http://javarevisited.blogspot.hk/2011/10/how-substring-in-java-works.html

http://eyalsch.wordpress.com/2009/10/27/stringleaks/

http://blog.zhaojie.me/2013/03/string-and-rope-1-string-in-dotnet-and-java.html

http://www.transylvania-jug.org/archives/5530

你可能感兴趣的:(substring)