基友前两天参加了阿里的实习生面试,问了个问题,就是关于字符串的子串搜索的问题。想想实现方式无非就是两层循环,但是 java 中是有现成实现的,于是我就去查查源码,看看 java 语言怎么实现这个的,发现也就是差不多的意思。
java.lang 包中 String 类 有几个 indexOf() 函数,我要寻找的是 indexOf(String str) 这个的具体实现,发现了
public int indexOf(String str) {
return indexOf(str, 0);
}
public int indexOf(String str, int fromIndex) {
return indexOf(value, offset, count, str.value, str.offset, str.count,
fromIndex);
}这个调用应该就是算法的实现了,继续 F3
/**
* Code shared by String and StringBuffer to do searches. The source is the
* character array being searched, and the target is the string being
* searched for.
*
* @param source
* the characters being searched.
* @param sourceOffset
* offset of the source string.
* @param sourceCount
* count of the source string.
* @param target
* the characters being searched for.
* @param targetOffset
* offset of the target string.
* @param targetCount
* count of the target string.
* @param fromIndex
* the index to begin searching from.
*/
static int indexOf(char[] source, int sourceOffset, int sourceCount,
char[] target, int targetOffset, int targetCount, int fromIndex) {
if (fromIndex >= sourceCount) {
return (targetCount == 0 ? sourceCount : -1);
}
if (fromIndex < 0) {
fromIndex = 0;
}
if (targetCount == 0) {
return fromIndex;
}
char first = target[targetOffset];
int max = sourceOffset + (sourceCount - targetCount);
for (int i = sourceOffset + fromIndex; i <= max; i++) {
/* Look for first character. */
if (source[i] != first) {
while (++i <= max && source[i] != first)
;
}
/* Found first character, now look at the rest of v2 */
if (i <= max) {
int j = i + 1;
int end = j + targetCount - 1;
for (int k = targetOffset + 1; j < end
&& source[j] == target[k]; j++, k++)
;
if (j == end) {
/* Found whole string. */
return i - sourceOffset;
}
}
}
return -1;
}
注意这个函数是静态函数,是String and StringBuffer公用的一个工具方法,具体算法原理代码中很显而易见。
又查阅了一些资料,目前子串搜索的方法有下面几种,
KMP算法, BM算法,Sunday算法
其中无论是简单程度还是效率排序均为下面:
Sunday > BM > KMP
Sunday 算法的核心思想如下(转自百度百科):
附一个Sunday算法的 C++ 实现(原文链接:http://hi.baidu.com/azuryy/item/8a50f54a2f8c72e51381dad3)
/* Sunday.h */
class Sunday
{
public:
Sunday();
~Sunday();
public:
int find(const char* pattern, const char* text);
private:
void preCompute(const char* pattern);
private:
//Let's assume all characters are all ASCII
static const int ASSIZE = 128;
int _td[ASSIZE] ;
int _patLength;
int _textLength;
};
源文件
/* Sunday.cpp */
Sunday::Sunday()
{
}
Sunday::~Sunday()
{
}
void Sunday::preCompute(const char* pattern)
{
for(int i = 0; i < ASSIZE; i++ )
_td[i] = _patLength + 1;
const char* p;
for ( p = pattern; *p; p++)
_td[*p] = _patLength - (p - pattern);
}
int Sunday::find(const char* pattern, const char* text)
{
_patLength = strlen( pattern );
_textLength = strlen( text );
if ( _patLength <= 0 || _textLength <= 0)
return -1;
preCompute( pattern );
const char *t, *p, *tx = text;
while (tx + _patLength <= text + _textLength)
{
for (p = pattern, t = tx; *p; ++p, ++t)
{
if (*p != *t)
break;
}
if (*p == 0)
return tx-text;
tx += _td[tx[_patLength]];
}
return -1;
}
简单测试下:
int main()
{
char* text = "blog.csdn,blog.net";
char* pattern = "csdn,blog" ;
Sunday sunday;
printf("The First Occurence at: %d\n",sunday.find(pattern,text));
return 1;
}