这是一个朋友去面试的面试题,这种面试题,一看就不是表面那么简单,虽然题目没有说明,但是要默认这个数组是个非常大的数组,所以不能用直观简单的方法去解这道题。
这个面试题可能就是需要我们利用分布式、多线程处理这个大数组,计算出结果,这道题大致思路可能就是map-reduce,分而治之,这里我先用多线程处理。
多线程处理思路:也是分而治之,先把数组切割成若干个小数组,每个小数组交给一个线程任务去计算 ,然后合并各个线程的计算结果,下面是代码实现(并没有将结果数组输出,只是打印出来):
package com.os.manager;
import java.util.*;
import java.util.concurrent.ConcurrentHashMap;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;
/**
* @description:
* @author:LiuWeiYi
* @date: 2018/4/19
* @modified:
*/
public class Top {
//创建一个接收reduce结果的hashMapsh,key是数字,value是数字出现的次数
public static ConcurrentHashMap concurrentHashMap = new ConcurrentHashMap<>();
public static void main(String[] args) throws InterruptedException {
Scanner scanner = new Scanner(System.in);
//生成数组
System.out.println("输入生成数组大小:");
Integer[] arr = new Integer[Integer.parseInt(scanner.next())];
Random random = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = random.nextInt(10);
}
System.out.println("数组长度:" + arr.length);
for (Integer integer : arr) {
System.out.print(integer + ",");
}
System.out.println("\n输入k:");
int k = Integer.parseInt(scanner.next());
Top top = new Top();
top.multithreading(arr, k);
long start = System.currentTimeMillis();
top.singlethreading(arr, k);
System.out.println("单线程运行时间:" + (System.currentTimeMillis() - start));
}
private void multithreading(Integer[] arrs, int k) throws InterruptedException {
concurrentHashMap.clear();
//将数组分割成若干个数组
List list = split(arrs);
//创建一个固定长度的线程池
ExecutorService executor = Executors.newFixedThreadPool(list.size());
CountDownLatch countDownLatch = new CountDownLatch(list.size());
long start = 0;
for (int i = 0; i < list.size(); i++) {
//创建任务并提交
Task task = new Task(list.get(i), countDownLatch, Top.concurrentHashMap);
executor.execute(task);
if (i == 0) {
start = System.currentTimeMillis();
}
}
//等待所有线程的任务都处理完成了
countDownLatch.await();
//筛选后的结果(key:数字,value:出现次数)
Map temp = new HashMap<>();
Iterator> iterator = Top.concurrentHashMap.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry next = iterator.next();
//出现次数大于K的放进TreeMap中
if (next.getValue().get() > k) {
temp.put(next.getKey(), next.getValue().get());
}
}
//排序
List> result = new ArrayList(temp.entrySet());
Collections.sort(result, (o1, o2) -> o2.getValue() - o1.getValue());
//打印
int sumCount = 0;
for (Map.Entry entry : result) {
Integer key = entry.getKey();
Integer value = entry.getValue();
sumCount += value;
System.out.println("数字:" + key + ",出现次数:" + value);
}
System.out.println("总出现次数:" + sumCount);
System.out.println("多线程运行时间:" + (System.currentTimeMillis() - start));
//立刻关闭线程池
executor.shutdownNow();
}
//将数组分割成若干个小数组
private List split(Integer[] arr) {
List list = new ArrayList();
//将数组按cpu核心数分割成小数组,也是要运行的线程数
int core = Runtime.getRuntime().availableProcessors();
if (arr.length < core) {//当数组长度小于等于核心数时,就设置线程数core=数组长度
core = arr.length;
}
//每个数组的长度
int num = arr.length / core;
for (int i = 0; i < core; i++) {
Integer[] childArr = new Integer[num];
System.arraycopy(arr, i * num, childArr, 0, childArr.length);
list.add(childArr);
}
int remain = arr.length % core;
if (remain > 0) {
Integer[] childArr = new Integer[remain];
System.arraycopy(arr, core * num, childArr, 0, remain);
list.add(childArr);
}
return list;
}
/**
* 任务类
*/
class Task implements Runnable {
Integer[] childArr;
CountDownLatch countDownLatch;
ConcurrentHashMap concurrentHashMap;
public Task(Integer[] childArr, CountDownLatch countDownLatch, ConcurrentHashMap concurrentHashMap) {
this.childArr = childArr;
this.countDownLatch = countDownLatch;
this.concurrentHashMap = concurrentHashMap;
}
@Override
public void run() {
for (Integer integer : childArr) {
//下面的操作是线程安全的操作
//如果map中不存在这个key则添加并返回null,如果存在,则返回旧值
AtomicInteger value = concurrentHashMap.putIfAbsent(integer, new AtomicInteger(1));
if (value != null) {
value.incrementAndGet();
}
}
//处理完任务countDown一下
countDownLatch.countDown();
}
}
private void singlethreading(Integer[] arrs, int k) throws InterruptedException {
Map map = new HashMap<>();
for (Integer arr : arrs) {
Integer integer = map.putIfAbsent(arr, 1);
if (integer != null) {
map.put(arr, ++integer);
}
}
//筛选后的结果(key:数字,value:出现次数)
Map temp = new HashMap<>();
Iterator> iterator = map.entrySet().iterator();
while (iterator.hasNext()) {
Map.Entry next = iterator.next();
//出现次数大于K的放进TreeMap中
if (next.getValue() > k) {
temp.put(next.getKey(), next.getValue());
}
}
//排序
List> result = new ArrayList(temp.entrySet());
Collections.sort(result, (o1, o2) -> o2.getValue() - o1.getValue());
//打印
int sumCount = 0;
for (Map.Entry entry : result) {
Integer key = entry.getKey();
Integer value = entry.getValue();
sumCount += value;
System.out.println("数字:" + key + ",出现次数:" + value);
}
System.out.println("总出现次数:" + sumCount);
}
}
比较尴尬的是单线程的运行时间比多线程的运行时间更快,捂脸,这里的原因可能是我用的一个ConcurrentHashMap来统一接收计算数据,等待锁的过程占用了时间。
然后我改成不用ConcurrentHashMap来统一接收计算数据,而是每个线程返回一个数组,主线程合并数组,也就是分而治之,代码如下:
import java.util.*;
import java.util.concurrent.*;
/**
* @description:
* @author:LiuWeiYi
* @date: 2018/4/19
* @modified:
*/
public class Top2 {
public static void main(String[] args) throws InterruptedException, ExecutionException {
Scanner scanner = new Scanner(System.in);
//生成数组
System.out.println("输入生成数组大小:");
Integer[] arr = new Integer[Integer.parseInt(scanner.next())];
Random random = new Random();
for (int i = 0; i < arr.length; i++) {
arr[i] = random.nextInt(1000000);
}
System.out.println("数组长度:" + arr.length);
for (Integer integer : arr) {
System.out.print(integer + ",");
}
System.out.println("\n输入k:");
int k = Integer.parseInt(scanner.next());
Top2 top = new Top2();
//多线程
top.multithreading(arr, k);
//单线程
long start = System.currentTimeMillis();
top.singlethreading(arr, k);
System.out.println("单线程运行时间:" + (System.currentTimeMillis() - start));
}
private void multithreading(Integer[] arrs, int k) throws InterruptedException, ExecutionException {
//将数组分割成若干个数组
List list = split(arrs);
//创建一个固定长度的线程池
ExecutorService executor = Executors.newFixedThreadPool(list.size());
long start = 0;
List>> futures = new ArrayList<>();
for (int i = 0; i < list.size(); i++) {
//创建任务并提交
Task task = new Task(list.get(i));
futures.add(executor.submit(task));
if (i == 0) {
start = System.currentTimeMillis();
}
}
Map temp = new HashMap<>();
for (Future
然而多线程的运行时间还是比单线程的运行时间长,这就超级尴尬了[捂脸][捂脸][捂脸],我观察了一下,时间主要是消耗合并数组上,我的CPU是4核,也就是开了4个或者5个线程,需要合并4个或者5个数组。我把数组长度设置为100w,把随机数范围设置成10w,多线程的时间还是超过单线程的时间,难道要把数组长度设到亿级别,才能有实际意义[捂脸]。
希望大神们指导一下,我是不是哪里写错了?