CF1200E Compress Words

题目描述

Amugae has a sentence consisting of n words. He want to compress this sentence into one word. Amugae doesn't like repetitions, so when he merges two words into one word, he removes the longest prefix of the second word that coincides with a suffix of the first word. For example, he merges "sample" and "please" into "samplease".

Amugae will merge his sentence left to right (i.e. first merge the first two words, then merge the result with the third word and so on). Write a program that prints the compressed word after the merging process ends.

输入格式

The first line contains an integer n ( 1≤n≤10^5 ), the number of the words in Amugae's sentence.

The second line contains n words separated by single space. Each words is non-empty and consists of uppercase and lowercase English letters and digits ('A', 'B', ..., 'Z', 'a', 'b', ..., 'z', '0', '1', ..., '9'). The total length of the words does not exceed 10^6 .

输出格式

In the only line output the compressed word after the merging process ends as described in the problem.

题意翻译

Amugae 有 n 个单词,他想把这个 n 个单词变成一个句子,具体来说就是从左到右依次把两个单词合并成一个单词。合并两个单词的时候,要找到最大的 i(i≥0),满足第一个单词的长度为 i 的后缀和第二个单词长度为 i 的前缀相等,然后把第二个单词第 i 位以后的部分接到第一个单词后面。输出最后那个单词。

注:题中的字符串存在大小写字母和数字。

输入输出样例

输入 #1

5
I want to order pizza

输出 #1

Iwantorderpizza

输入 #2

5
sample please ease in out

输出 #2

sampleaseinout

本题的解法有两种,一种是KMP算法,另一种是哈希算法,我使用的是哈希算法来求解

思路

本题有两个哈希值一个是主串的,一个是子串的,通过对比主串后缀的哈希值和子串前缀的哈希值来判断需要链接的部分,然后进行连接。

1.本题的坑点:题目的意思是合并单词前面的主串的后缀和此单词的前缀而不是前一个单词的后缀和后一个单词的前缀,举个例子:

给你3个单词:i,ab,iab,合并之后答案是iab,而不是iabiab。

2.采用哈希解法本题的难点:

1.本题需要采用字符串进制哈希且是双哈希,不然65数据过不了,所谓双哈希就是同时满足两个哈希函数,两个哈希函数有不同的key值和mod值。

2.对于主串求后缀的哈希值需要用O(1)速度,而不是暴力,否则时间过不了,

O(1)求字符串子串方法:

假设有一个 S=s1s2s3s4s5的字符串,根据定义,获取其 Hash值如下(我们先忽略MOD,方便理解):

haxi[1]=s1

haxi[2]=s1∗Base+s2

haxi[3]=s1∗Base^2+s2∗Base+s3

haxi[4]=s1∗Base^3+s2∗Base^2+s3∗Base+s4

现在我们想求字串 s3s4的hash值,不难得出为s3∗Base+s4,并且从上面观察,如果看hash[4]−hash[2]*Base^2,至此,通过对上例的归纳,可以得出如下的公式。

ans=((hash[r]−hash[l−1]∗Base^r−l+1)%MOD+MOD)%MOD,(求区间(l,r)字串的哈希值

思路和坑点讲完,上代码

//Compress Words  CF1200E(双字符串哈希+后缀字符串哈希)
#include
#include
long long  mod = 1e8+4,mod2= 1e9+9;
int base = 131, hgf = 377;
char s[1000001], t[1000001];
long long ans[1000001], bns[1000001];//主串的两个哈希数组(双哈希)
long long hj[1000001], hg[1000001];//预处理进制数组
int main()
{
	int n, k, h, i, j, q;
	long long f, u;
	hg[0] = hj[0] = 1;
	for (i = 1; i <= 1000000; i++)//预处理进制数组
	{
		hj[i] = (hj[i - 1] * base) % mod;
		hg[i] = (hg[i - 1] * hgf) % mod2;
	}
	scanf("%d", &n);
	scanf("%s", t);//先输入第一个串
	h = strlen(t);
	for (i = 0; i < h; i++)//求哈希值
	{
		ans[i + 1] = (ans[i] * base + t[i]) % mod;
		bns[i + 1] = (bns[i] * hgf + t[i]) % mod2;
	}
	for (i = 2; i <= n; i++)
	{
		scanf("%s", s);//输入单词
		k = strlen(s);
		int ww = 0, flag = 0;//ww记录最大重合的单词数量,flag判断是否有重合
		long long gf = 0, kg = 0;
		for (j = 0; j < h && j < k; j++)//从单词的第一个字符开始比较
		{
			//输入单词的两个哈希值
			gf = (gf * base + s[j]) % mod;
			kg = (kg * hgf + s[j]) % mod2;
			//主串对应后缀的两个哈希值(O(1)求法,非暴力)
			f = (gf + ans[h - j - 1] * hj[j + 1]) % mod;
			u = (kg + bns[h - j - 1] * hg[j + 1]) % mod2;
			if (f == ans[h] && u == bns[h])//两两对应都相同则存在重合
			{
				ww = j; flag = 1;
			}
		}
		if (flag == 1)//如果有重合部分
			ww++;
		for (j = ww; j < k; j++)//重合部分后面的连接到主串后面并计算哈希值
		{
			t[h++] = s[j];
			ans[h] = (ans[h - 1] * base + s[j]) % mod;
			bns[h] = (bns[h - 1] * hgf + s[j]) % mod2;
		}
		t[h] = '\0';
	}
	printf("%s", t);//输出答案
	return 0;
}

本题写了3~4个小时,看了题解,虽说花费的时间多,但更进一步了解了进制哈希,双哈希和O(1)方式求子串哈希的方法,还是有收获的

你可能感兴趣的:(题组,哈希算法)