简牧

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练

(TensorFlow @ O’Reilly AI Conference, San Francisco '18)
整理自google tensorflow2018年9月参加的TensorFlow @ O’Reilly AI Conference，介绍的分布式tensorflow《Distributed TensorFlow training using Keras and Kubernetes》，内容和ppt图片来自YouTube视频。

分享的主题

tensorflow提供的Distribution Strategy API。

Let’s begin with the obvious question. Why should one care about distributed training? Training complex neural networks with large amounts of data can often take a long time. In the graph here, you can see training the resident model on a single but powerful GPU can take up to four days. If you have some experience running complex machine learning models, this may sound rather familiar to you. Bringing down your training time from days to hours can have a significant effect on your productivity because you can try out new ideas faster.

In this talk, we’re going to talk about distributed training that is running training in parallel on multiple devices such as CPUs, GPUs, or TPUs to bring down your training time. With the techniques that you-- we’ll talk about in this talk, you can bring down your training time from weeks or days to hours with just a few lines of change of code and some powerful hardware. To achieve these goals, we’re pleased to introduce the new distribution strategy API. This is an easy way to distribute your TensorFlow training with very little modification to your code. With distribution strategy API, you no longer need to place ops or parameters on specific devices, and you don’t need to restructure a model in a way that the losses and gradients get aggregated correctly across the devices. Distribution strategy takes care of all of that for you. So let’s go with what are the key goals of distribution strategy.

目标

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第2张图片

The first one is ease of use. We want you to make minimal code changes in order to distribute your training.
The second is to give great performance out of the box. Ideally, the user shouldn’t have to change any-- change or configure any settings to get the most performance out of their hardware.
And third we want distribution strategy to work in a variety of different situations, so whether you want to scale your training on different hardware like GPUs or TPUs or you want to use different APIs like Keras or estimator or if you want to run distributed-- different distribution architectures like synchronous or asynchronous training, we have one distribution strategy to be useful for you in all these situations.

单机多GPU的训练

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第3张图片

So if you’re just beginning with machine learning, you might start your training with a multi-core CPU on your desktop. TensorFlow takes care of scaling onto a multi-core CPU automatically. Next, you may add a GPU to your desktop to scale up your training. As long as you build your program with the right CUDA libraries, TensorFlow will automatically run your training on the GPU and give you a nice performance boost. But what if you have multiple GPUs on your machine, and you want to use all of them for your training? This is where distribution strategy comes in.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第4张图片

In the next section, we’re going to talk about how you can use distribution strategy to scale your training to multiple GPUs.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第5张图片

First, we’ll look at some code to train the ResNet model without any distribution. We’ll use a Keras API, which is the recommended TensorFlow high level API. We begin by creating some datasets for training and validation using the TF data API. For the model, we’ll simply reuse the ResNetthat’s prepackaged with Keras and TensorFlow. Then we create an optimizer that we’ll be using in our training. Once we have these pieces, we can compile the model providing the loss and optimizer and maybe a few other things like metrics, which I’ve omitted in the slide here. Once a model’s compiled, you can then begin your training by calling model dot fit, providing the training dataset that you created earlier, along with how many epochs you want to run the training for. Fit will train your model and update the models variables. Then you can call evaluate with the validation dataset to see how well your training did.

So given this code to run your training on a single machine or a single GPU, let’s see how we can use distribution strategy to now run it on multiple GPUs. It’s actually very simple. You need to make only two changes:

First, create an instance of something called mirrored strategy and
second pass the strategy instance to the compile call with the distribute argument. That’s it. That’s all the code changes you need to now run this code on multiple GPUs using distribution strategy.

MirroredStrategy

Mirror strategy is a type of distribution strategy API that we introduced earlier. This API is available intensive on point release which will be out very shortly. And in the bottom of the slide, we’ve linked to a complete example of training mnist with Keras and multiple GPUs that you can try out. With mirror strategy, you don’t need to make any changes to your model code or your training loop, so it makes it very easy to use. This is because we’ve changed many underlying components of TensorFlow to be distribution aware. So this includes the optimizer, batch norm layers, metrics, and summaries are all now distribution aware. You don’t need to make any changes to your input pipeline as well as long as you’re using the recommended TF data APIs. And finally saving and checkpointing work seamlessly as well. So you can save with no or one distribution strategy and a store with another seamlessly.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第7张图片

Data parallelism和AllReduce

Now that you’ve seen some code on how to use mirror strategy to scale to multiple GPUs, let’s look under the hood a little bit and see what mirror strategy does. In a nutshell, mirror strategy implements data parallelism architecture. It mirrors the variables on each device GPU and hence the name mirror strategy, and it uses AllReduce to keep these variables in sync. And using these techniques, it implements synchronous training. So that’s a lot of terminology. Let’s unpack each of these a bit.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第8张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第9张图片

What is data parallelism? Let’s say you have end workers or end devices . In data parallelism, each device runs the same model and computation but for the different subset of the input data. Each device computes the loss and gradients based on the training samples that it sees. And then we combine these gradients and update the models parameters. The updated model is then used in the next round of computation. As I mentioned before, mirror strategy mirrors the variables across the different devices. So let’s say you have a variable A your model. It’ll be replicated as A0, A1, A2, and A3 across the four different devices. And together these four variables conceptually form a single conceptual variable called a mirrored variable. These variables are kept in sync by applying identical updates. A class of algorithms called AllReduce can be used to keep variables in sync by applying identical gradient updates. AllReduce algorithms can be used to aggregate the gradients across the different devices, for example, by adding them up and making them available on each device. It’s a fused algorithm that can be very efficient and reduce the overhead of synchronization by quite a bit.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第10张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第11张图片

There are many versions of algorithm-- AllReduce algorithms available based on the communication available between the different devices. One common algorithm is what is known as ring all-reduce. In ring all-reduce, each device sends a chunk of its gradients to its successor on the ring and receives another chunk from its predecessor. There are a few more such rounds of rate and exchanges, and at the end of these exchanges, each device has received a combined copy of all the gradients. Ring-Allreduce also uses network bandwidth optimally because it ensures that both the upload and download bandwidth at each host is fully utilized. We have a team working on fast implementations of all reduce for various network topologies. Some hardware vendors such as the Nvidia provide specialized implementation of all-reduce for their hardware, for example, Nvidia . The bottom line is that AllReduce can be fast when you have multiple devices on a single machine or a small number of machines with strong connectivity. Putting all these pieces together, mirror strategy uses mirrored variables and all reduce to implement synchronous training.

how AllReduce workers

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第12张图片

So let’s see how that works. Let’s say you have two devices, device and and your model has two layers, A and B. Each layer has a single variable. And as you can see, the variables are replicated across the two devices. Each device received one subset of the input data, and it computes the forward pass using its local copy of the variables. It then computes a backward pass and computes the gradients. Once agreements are computed on each device, the devices communicate with each other using all reduce to aggregate the gradients. And once the gradients are aggregated, each device updates its local copy of the variables. So in this way, the devices are always kept in sync. The next forward pass doesn’t begin until each device has received a copy of the combined gradients and updated its variables. All reduce can further optimize things and bring down your training time by overlapping computation of gradients at lower layers in the network with transmission of gradients at the higher layers. So in this case, you can see-- you can compute the gradients of layer A while you’re transmitting the gradients for layer B. And this can further reduce your training time.

(省略TPU部分)

多节点分布式学习

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第13张图片

What about multiple nodes the way we have multiple computers? Because the fact is that even though you can cram in a lot of GPU cards, for example, on a single computer, sooner or later, if you do massive amounts of training, you will need to consider an architecture where you can scale out the multiple nodes as well. So this is an example where we see four worker nodes with four GPU cards in each of them. In terms of support for multi-GPU-- multi-node support, we have currently support for premade estimators in terms of [INAUDIBLE] which is subject to be released shortly. And we are working very, very hard with some awesome developers to get this support into Keras as well. So you should be aware that Keras support will be there as soon as possible.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第14张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第15张图片

Keras转estimator

However, if you do want to use Keras with a multi-node distribution strategy, you can actually achieve that using a little trick that’s available in the Keras, and that’s called-- it’s a function called the TF dot Keras estimator-- modelestimator that takes a Keras model as an argument and then it actually returns an estimator that you can use for multi-node training.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第16张图片

建立多节点环境－kubernetes

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第17张图片

So how do we set up a multi-node training environment in the first place? This was a really, really difficult problem up until the technology that’s open source now called Kubernetes was released. And so we-- even though you can set up multi-node training with TensorFlow without running Kubernetes, it will certainly help to use Kubernetes as the orchestration platform to fire up multiple modes. And Kubernetes this is available in most clouds GCP and I think AWS and others as well.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第18张图片

So how does that work? Well, a Kubernetes cluster contains a set of nodes. So in this particular picture, you can see three nodes. In each of them is a worker node. And what TensorFlow requires in order for this to work is that each of these nodes have an environment variable called TF underscore config defined. So every single node that you’re having your cluster needs to have this variable defined. And in this TF config, you have two parts, first of all, the cluster part, which defines all of the hosts that participates in the distributed training, all the nodes in your cluster. And the second one is really to specify who am I. What is my identity within this cluster? So you can see the task here is . So this worker is hostport . It’s . That’s hostport, and it’s meaning that it’s hostand that-- at that port. So that’s how you need to configure your cluster in order to do this.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第19张图片

So that is really cumbersome to go around and round to all of the nodes and actually provide the specific configuration and Kubernetes provides-- so how do you configure this-- Kubernetes provides an excellent way of doing that through its deployment configuration, the yaml file, so you can actually distribute the configuration, the environment variables to set on the respective nodes. So how do we integrate that with TensorFlow? Well, it’s part of the initial support. And this is just one way of doing it. There are multiple ways, but this is one way that we’ve tested. You can use a template engine called Jinja. And you create a file called a Jinja file, and there is actually such a file available in the TensorFlow slash ecosystem repository, observe not the TensorFlow repository. This is the ecosystem. There will be a directory under that repository called distribution underscore strategy that contains useful functions to use with distribution strategies. So you can use this file as a template in order to automatically generate the deployment dot yaml for the Kubernetes cluster.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第20张图片

So what would that look like for a configuration like this where we have three nodes? Well, it’s really, really simple. The only thing you need to do in this file-- the Jinja file-- is the highlighted configuration up here. You set the worker replicas to three nodes. The rest is just code that you keep for all of the executions you setup to do. Make sense? So this is actually a macro that populates TF config based on this parameter up here.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第21张图片

So that’s very simple but what about the code? We’ve now configured the Kubernetes cluster to be able to do this distributed training with TensorFlow, but there are also some stuff we need to do with the code as we had for the single node as well. So it’s approximately the same as for single node, the multi GPU configuration. So this is the estimator lingo. So I provide a config here. You see the run config? It’s just a standard estimator construct. And I set the train distribute parameter to tf.contrib.distribute.CollectiveAllReduceStrategy, so not mirrored strategy for multi-node configuration. It’s collective AllReduce strategy. And then I specify the number of GPUs I have available for each of these workers that I have my cluster. And that’s it. Given that I have that config object, I can just put that as part of the config parameter when I do the conversion from Keras over to an estimator. And I now have multi-GPU-- multi-node, multi-GPU in each of the nodes configured for TensorFlow.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第22张图片

CollectiveAllReduceStrategy

And so let’s look at this collective AllReduce strategy because that’s something different than what we talked about previously with a mirrored strategy. So what is that thing? Well, it is specifically designed for multiple worker nodes. And it’s essentially based on mirrored strategy, but it adds functionality in order to deal with multi-host or multi-workers in my cluster. And the good thing about this is that it automatically selects the best algorithm for doing reduce– the AllReduce function across this cluster. So what does that mean? What kind of algorithms do we have for doing AllReduce in a multi-node configuration? Well, one of them is very similar to what we have for a single node, which is to ring-all reduce in which case the GPUs, they just travel across the nodes and they perform an overall ring reduce across multiple hosts and GPUs. So essentially the same as for single node. It’s just that they are traversing hosts with all of the penalties associated of course of doing that depending on the interconnect between these hosts.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第23张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第24张图片

Another algorithm is hierarchical all reduced. I think that this really complicated English word. And what happens here is that we essentially pass all of the variables up to a single GPU card on the respective hosts. See that. We all send them missing an error-- two errors over here-- with one arrow here. Never mind that. They’re supposed to all send this stuff to GPU. And then we do an AllReduce across the nodes there. And the GPUs performing that operation then propagates back to the individual GPUs within its own node.

So depending on network and other characteristics of your setup and hardware, one of these solutions would work very well. And the thing with collective overdue strategy is they will automatically detect the best algorithm to use in your distributed cluster. So that was multi-node, multi-accelerator cards within the nodes.

other way to scale to multiple nodes

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第25张图片

There are also other ways to scale to multiple nodes with TensorFlow. And one of them-- how many of you are familiar with parameter server strategy? Parameter servers? This is the classical way of how you do TensorFlow distributed training. And eventually this-- actually this way, the classical way, you should not continue to do that. You should actually-- once we roll out distribution strategies, that’s the way to go. So what I’m describing here is essentially the parameter server strategy, but instead of describing it in the old classical way of doing TensorFlow, I’m going to describe how to do it with distribution strategies. Does that make sense? Yeah. If you didn’t understand that and you haven’t used TO, just don’t worry about it.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第26张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第27张图片

Just listen to what I have to say here. To get a recap of what the parameter service strategy is, it’s essentially a strategy where we have shared storage. We have a number of worker nodes, and they’re working on batches of shared stories. They’re working completely independently. Well, not completely we’ll see shortly. But they are working independently calculating gradients based on batches. And then we have a number of parameter servers. So these workers, when they are finished with the batch, they send it up to the parameter servers. The parameter servers, they have the updates from the other workers, so they calculate the average of the gradients and then pass all of those variables down to the workers. So it’s not synchronous. These workers, they will get updates on the variables in that synchronous fashion, which has good sides and bad sides. The good side is one worker can go out, and the other workers can still execute as normal. That’s the way this works. So how can we set this up in a distributed strategy cluster? Well, it’s real easy. Instead of just specifying the worker replicas in their Jinja file, we also specify the PS underscore replicas. So that’s the number of parameter servers that we have in our Kubernetes cluster. So that is the Kubernetes setup. Now what about the code? So that’s also really easy. You saw the run config-- the config parameter previously. Instead of using the collective AllReduce strategy-- I got that right this time-- collective AllReduce strategy, you used the parameter server strategy. See that? So it’s just another type there. You still specified the number of GPUs per worker, you specify the config object to-- Keras model to estimator function call, and you’re all done. So very, very few lines of code needs changing even though we’re talking about massively different way of doing distributed TensorFlow-- TensorFlow training. There is one more configuration that we are working on. I think we will have a release of this in at least we can try out. That is a really, really cool setup where you actually run distributed training from your laptop. And in this particular case, you have all of your model training code here. And the only thing you-- so forget about parameter server.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第28张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第29张图片

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第30张图片

Now we’re back to multiple workers and AllReduce here. The only thing you fire up on these workers is the TF underscore STD underscore server dot pi or whatever variant of that you want to use because this code is available also in the TensorFlow ecosystem repository. So you can go check it out how we did it for this normal setup, and you can change it to whatever way you want. The thing is that this script and installation of the workers, they don’t have the model program at all. So when we fire up the model training from our laptop or workstation here, it will distribute that model over to those. So if you have any changes to your model code, you can just make it locally, and it will automatically distribute that out to all of the workers. Now you may say, oh, that’s a hassle because now I’ve got to install this script on all the workers. And you do not have to do that because the only thing you do is just specify the script parameter in the Jinja file that you’ve seen a couple of times now-- and we have the same number of workers here-- and that means that the scripts will actually start on all of these nodes. So what we’re talking about here is the capability to fire up a Kubernetes cluster with an arbitrary number of nodes. Without any installation of code, you can use a local laptop, and it will automatically distribute the model and the training to all of these worker nodes just by having these two lines here.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第31张图片

What about the code? . So again, we have the wrong config here. And this time, we’re going to set a parameter called experimental distribute to the distribute config. And as part of distribute config, we are going to embed a collective AllReduce strategy with, as we saw before, the number of GPUs we have per worker. But the distributed config requires one more parameter, and that is the remote cluster. The cluster-- the master node here needs to know the cluster to which it should send all the model code for these demos that are waiting there for the model code to be shared. Make sense? So you gotta specify that parameters. Then you’re finishing up your config object in model testimony to specify the config object. And as you’ve seen before, it’s just a couple of lines of difference between these different configurations. That’s really it for TensorFlow multi-node training.

分布式TensorFlow －基于keras和kubernetes的分布式tensorflow训练_第32张图片

Spring AI与机器学习：智能应用开发新范式 tmjpz04412 人工智能 spring 机器学习
SpringAI与机器学习的整合SpringAI是一个基于Spring生态的AI开发框架，旨在简化智能应用的开发流程。通过SpringAI，开发者可以快速集成机器学习模型，构建高效的智能应用。SpringAI支持多种机器学习库和框架，如TensorFlow、PyTorch和Scikit-learn，提供统一的API接口。SpringAI的核心优势在于其模块化设计和自动化配置。开发者无需关心复杂的依
Tensorflow-gpu运行时报错Non-OK-status: GpuLaunchKernel GEM的左耳返 python tensorflow 深度学习 python
Tensorflow-gpu运行时报错Non-OK-status:GpuLaunchKernel(FillPhiloxRandomKernelLaunch,num_blocks,block_size,0,d.stream(),gen,data,size,dist)status:Internal:invaliddevicefunctionFatalPythonerror:Aborted说明你安装的C
PyTorch武侠演义第一卷：初入江湖第7章：矿洞中的计算禁制空中湖 pytorch武侠演绎 pytorch 人工智能 python
第一卷：初入江湖第7章：矿洞中的计算禁制矿洞深处罗盘残件在接近矿洞时突然发热，指针疯狂旋转。"就是这里，"欧阳长老抚摸着洞壁上的计算图刻痕，“TensorFlow帮用静态图封印了矿脉。”林小码看到：幽蓝矿脉构成巨大的计算图结构水晶矿簇随呼吸节奏明灭（CUDA核心）矿道中流淌着数据光流（内存带宽）"小心！"大师突然拉回林小码。他刚才踩中的矿砖下陷，触发岩壁上的机关——数十道计算图锁链从四面八方射来！
PyTorch武侠演义第一卷：初入江湖第5章：玉如意的秘密
第一卷：初入江湖第5章：玉如意的秘密百年秘辛藏经阁最深处，大师掀开尘封的《门派大事记》，指向一幅泛黄的画卷：“看，这就是百年前的优化器长老——欧阳调参。”画中人手持玉如意，面前悬浮着九个水晶球。林小码凑近细看，发现如意上刻着「lr=0.001」。“当年TensorFlow帮为何要盗损失玉佩？”大师叹息：“因为这块玉佩，正是控制玉如意能量的钥匙…”突然，书架后传来机关转动的咔嗒声。一道暗门缓缓打开，
CIFAR-10 文件下载函数谢小帅
函数挺有意思的，可能将来写项目会用到。importtensorflowastfimportosimportsysfromsix.movesimporturllibimporttarfileFLAGS=tf.app.flags.FLAGSFLAGS.data_dir='cifar10_data/'DATA_URL='http://www.cs.toronto.edu/~kriz/cifar-10-b
排名前十的编程语言及其详细对比 NurDroid 开发语言
根据2025年4月的最新TIOBE排行榜以及其他综合榜单，当前排名前十的编程语言及其详细对比如下：1.Python•排名：第1位•核心特点：简洁语法、动态类型、丰富的生态库（如NumPy、TensorFlow）。•应用领域：AI/机器学习、数据分析、自动化脚本、Web开发（Django/Flask框架）。•性能：解释型语言，执行速度较慢，但开发效率极高，适合快速原型设计。•趋势：持续领跑AI领域，
Tensorflow的基础知识(二) climb66的夏天
1.张量的索引与切片操作通过索引与切片操作可以提取张量的部分数据，它们的使用频率非常高。1.1索引操作在Tensorflow中，支持基本的[i][j]···标准索引方式，也支持通过逗号分隔索引号的索引方式。例如:x=tf.random.normal([4,32,32,3])x[0]#取第一张图片的数据x[0][1]#取第一张图片的第二行x[0][1][2]#取第一张图片，第二行，第三列的数据x[2
OpenCV结合深度学习进行图像分类香蕉可乐荷包蛋 #OpenCV opencv 深度学习分类
文章目录1.支持的深度学习框架和模型格式2.模型加载方式加载预训练模型示例：3.图像预处理流程4.前向传播与推理5.结果解析与后处理6.性能优化技巧启用GPU加速：批量处理：代码示例在资源中有上传1.支持的深度学习框架和模型格式OpenCV的DNN模块支持多种主流深度学习框架训练的模型：TensorFlow:支持冻结图(.pb)和SavedModel格式Caffe:支持.prototxt和.caf
MNIST 手写数字识别模型分析橘子编程 Python学习指南 python matplotlib
功能概述这段代码实现了一个基于TensorFlow和Keras的MNIST手写数字识别模型。主要功能包括：加载并预处理MNIST数据集构建一个简单的全连接神经网络模型训练模型并评估其性能使用训练好的模型进行预测保存和加载模型代码解析1.导入必要的库importmatplotlibimporttensorflow.kerasaskerasimporttensorflowastfimportnumpy
20250704-基于强化学习在云计算环境中的虚拟机资源调度研究
基于强化学习在云计算环境中的虚拟机资源调度研究随着云计算规模的持续扩大，数据中心虚拟机资源调度面临动态负载、异构资源适配及多目标优化等挑战。传统启发式算法在复杂场景下易陷入局部最优，而深度强化学习（DRL）凭借序贯决策能力为该问题提供了新路径。本研究以动态多目标组合优化理论为基础，结合CloudSimPy仿真框架与TensorFlow，构建“仿真-训练-验证”闭环调度系统，重点设计动态加权多目标奖
【大模型】Hugging Face常见模型格式详解 EulerBlind 大模型 LLM 人工智能语言模型
HuggingFace作为全球最大的机器学习模型社区，支持多种不同的模型格式。这些格式各有特点，适用于不同的使用场景。本文将详细介绍HuggingFace上常见的模型格式，帮助开发者选择合适的模型格式。模型格式分类概览HuggingFace上的模型格式主要可以分为以下几类：1.原始框架格式PyTorch格式(.bin,.pt,.pth)TensorFlow格式(.h5,.pb,SavedModel
Python依赖冲突若宮いヴ Python Python 依赖冲突包管理
笔者在安装scikit-image包时发现tensorflowimport时直接崩溃，后发现scikit-image(后简称skimage)和tensorflow-gpu(后简称tensorflow)都依赖于numpy包，不幸的是，最新版本的scikit-image和tensorflow依赖的numpy包版本不相同并且互相不兼容(o=^•ェ•)o┏━┓，笔者也曾经在各搜索引擎寻找解决方案……无非是
PyTorch深度学习工具箱整理总结前网易架构师-高司机深度学习+AI pytorch
一、pytorch简介Pytorch是torch的python版本，是由Facebook开源的神经网络框架，专门针对GPU加速的深度神经网络（DNN）编程。Torch是一个经典的对多维矩阵数据进行操作的张量（tensor）库，在机器学习和其他数学密集型应用有广泛应用。与Tensorflow的静态计算图不同，pytorch的计算图是动态的，可以根据计算需要实时改变计算图。但由于Torch语言采用Lu
Python训练 + Go优化 + C#部署：端到端AI模型的跨语言实践威哥说编程人工智能学习资料库 python golang c#
在现代AI应用中，如何高效地训练、优化、并最终部署AI模型是一项复杂且具有挑战性的任务。在这一过程中，选择合适的编程语言和工具可以显著提高效率和系统的性能。Python作为AI领域的主流语言，具有丰富的深度学习框架（如PyTorch和TensorFlow），在模型训练方面处于领先地位。然而，针对计算密集型任务（如数据预处理、加密等），Go语言因其高效的并发处理和出色的性能，成为优化计算的理想选择。
分类模型（BERT）训练全流程巴伦是只猫人工智能分类 bert 数据挖掘
使用BERT实现分类模型的完整训练流程BERT(BidirectionalEncoderRepresentationsfromTransformers)是一种强大的预训练语言模型，在各种NLP任务中表现出色。下面我将详细梳理使用BERT实现文本分类模型的完整训练过程。1.准备工作1.1环境配置pipinstalltransformerstorchtensorflowpandassklearn1.2
边缘计算与量子模型优化驱动医疗诊断新突破
内容概要在医疗人工智能领域，边缘计算与量子模型优化的协同演进正重构诊断系统的技术范式。通过将计算节点前置至医疗设备端，边缘架构有效解决了传统云端模型面临的实时性瓶颈，配合量子优化算法对复杂特征空间的快速寻优能力，使得CT、MRI等高维影像数据的解析效率提升显著。值得关注的是，框架选型直接影响着模型部署的可行性——TensorFlow在移动端推理优化方面的工具链完备性，与PyTorch动态图机制对迭
数据质量是机器学习项目的核心痛点，AI技术能提供智能化解决方案。 zzywxc787 python pandas numpy 人工智能自动化运维 AI编程
一、数据质量诊断系统（Python实现）importpandasaspdimportnumpyasnpimportmatplotlib.pyplotaspltfromsklearn.clusterimportKMeansfromsklearn.ensembleimportIsolationForestfromtensorflow.keras.modelsimportSequentialfromte
tensorflow sigmoid_cross_entropy_with_logits 函数解释及公式推导 CrazyWolf_081c
tensorflowsigmoid_cross_entropy_with_logits函数解释及公式推导tensorflow官方文档解释参考pytorch--BCELosspytorch--BCELoss解释参考定义在tensorflow/python/ops/nn_impl.py.功能：计算在给定logits和label之间的sigmoidcrossentropy。测量离散分类任务中的概率误差，
AI产品经理面试宝典第42天：学习方法与产品流程解析 TGITCIC AI产品经理一线大厂面试题产品经理 AI面试大模型面试 AI产品经理面试大模型产品经理面试 AI产品大模型产品
具体问答：学习产品及AI知识的方法问：请谈谈您是如何学习产品及AI知识的，以及您认为哪些资源对您帮助最大答：我的学习体系包含三个维度：分层知识架构、实践验证闭环、资源筛选机制。在知识获取阶段，采用「理论-案例-工具」三级学习法：通过《人工智能：一种现代的方法》构建AI基础框架，用TensorFlow官方文档掌握工程实现，结合《启示录》《俞军产品方法论》理解产品逻辑。实践环节采用「项目反哺」模式，例
TensorFlow为AI人工智能航空航天领域带来变革 AI原生应用开发人工智能 tensorflow python ai
TensorFlow为AI人工智能航空航天领域带来变革关键词：TensorFlow、人工智能、航空航天、机器学习、深度学习、神经网络、自主系统摘要：本文探讨了TensorFlow这一强大的机器学习框架如何推动航空航天领域的创新。我们将从基础概念入手，逐步深入分析TensorFlow在航天器导航、卫星图像处理、飞行器自主决策等关键应用场景中的实现原理。通过实际代码示例和架构图解，展示TensorFl
Android TensorFlow
安装TensorFlow在Android设备上TensorFlowLite是专为移动和嵌入式设备优化的轻量级解决方案。以下为在Android上集成TensorFlowLite的步骤。添加依赖在build.gradle文件中添加TensorFlowLite依赖：dependencies{implementation'org.tensorflow:tensorflow-lite:2.x.x'imple
opencv、torch、torchvision、tensorflow的区别
一、框架定位与核心差异PyTorch动态计算图：实时构建计算图支持Python原生控制流（如循环/条件），调试便捷。学术主导：2025年工业部署份额24%，适合快速原型开发（如无人机自动驾驶、情绪识别）。TensorFlow静态计算图优化：预编译图结构提升部署效率支持动态图（Eager模式）兼顾灵活性。工业部署首选：市场份额38%，擅长边缘计算（YOLO部署）和大规模项目（工业自动化）-59）。O
模型移植实战：从PyTorch到ONNX完整指南慕婉0307 神经网络 pytorch 人工智能 python
一、认识ONNXONNX（OpenNeuralNetworkExchange）是一种开放的模型表示格式，由微软和Facebook（现Meta）在2017年共同推出，旨在解决深度学习模型在不同框架之间的互操作性问题。ONNX的主要优势包括：跨框架兼容性：支持主流深度学习框架间的模型转换，包括PyTorch、TensorFlow、MXNet、CNTK等例如，可以将PyTorch训练的ResNet模型导
python3.9安装tensorflow-gpu 2.6.0和torch-gpu版本各依赖包的版本对应关系
首先使用的cuDNN（8.1）、CUDA（11.2）、tensorflow-gpu（2.6.0）、python（3.9）之间对应版本Window环境下安装pytorch下载地址tensorflow官网CUDA下载官网cuDNN下载官网注意：cuDNN需要注册absl-py0.15.0astunparse1.6.3cachetools5.3.2certifi2023.7.22charset-norm
TensorFlow GPU 2.10.1 for Python 3.9快速安装指南疑样
本文还有配套的精品资源，点击获取简介：TensorFlowGPU2.10.1是专为Windowsx64和Python3.9设计的TensorFlow版本，它集成了GPU支持以加快深度学习模型的训练。本指南提供了该版本的概述、安装步骤及注意事项，旨在帮助开发者利用其性能优势提升机器学习项目的效率。1.TensorFlowGPU介绍1.1TensorFlow的起源与功能TensorFlow是由Goog
评估遥感云雾浓度的无参化指标（适用于其它合成雾的场景）夏天是冰红茶去雾与加雾 opencv 计算机视觉人工智能
前言本文总结了四种用于评估图像雾浓度的无参考指标：FADE、densityD、AuthESI和JSFD。FADE通过MATLAB实现，能较好反映雾气浓度但计算耗时；densityD基于TensorFlow，对天空场景较为敏感；AuthESI主要用于评估合成雾真实性，不适用于浓度评估；JSFD结合HSV空间S值、白点比例和暗通道特征，准确性较高但计算时间长。实验表明，FADE和JSFD以及densi
# TF Eager Execution 阅读笔记 tsiic
TFEagerExecution阅读笔记@[TensonFlow]看了半天不知道Eager是啥，这哪能看下去。所以Google了一下，在知乎发现如下解释：......就开启了Eager模式，这时，TensorFlow会从原先的声明式（declarative）编程形式变成命令式（imperative）编程形式。当写下语句"c=tf.matmul(a,b)"后（以及其他任何tf开头的函数），就会直接执
神经网络常见激活函数 13-Softplus函数亲持红叶神经网络常见激活函数神经网络人工智能深度学习
文章目录Softplus函数+导函数函数和导函数图像优缺点PyTorch中的Softplus函数TensorFlow中的Softplus函数Softplus函数+导函数Softplus函数Softplus⁡(x)=ln⁡(1+e x)\begin{aligned}\operatorname{Softplus}(x)&=\ln\bigl(1+e^{\,x}\bigr)\end{aligned}Sof
TensorFlow深度学习实战——DCGAN详解与实现盼小辉丶深度学习 tensorflow 生成对抗网络
TensorFlow深度学习实战——DCGAN详解与实现0.前言1.DCGAN架构2.构建DCGAN生成手写数字图像2.1生成器与判别器架构2.2构建DCGAN相关链接0.前言深度卷积生成对抗网络(DeepConvolutionalGenerativeAdversarialNetwork,DCGAN)是一种基于生成对抗网络(GenerativeAdversarialNetwork,GAN)的深度学
AI人工智能领域TensorFlow的模型训练策略 AIGC应用创新大全人工智能 tensorflow python ai
AI人工智能领域TensorFlow的模型训练策略关键词：TensorFlow、模型训练、深度学习、神经网络、优化策略、分布式训练、迁移学习摘要：本文将深入探讨TensorFlow框架下的模型训练策略，从基础概念到高级技巧，全面解析如何高效训练深度学习模型。我们将从数据准备、模型构建、训练优化到部署应用，一步步揭示TensorFlow模型训练的核心技术，并通过实际代码示例展示最佳实践。背景介绍目的
设计模式介绍 tntxia 设计模式
设计模式来源于土木工程师克里斯托弗亚历山大（http://en.wikipedia.org/wiki/Christopher_Alexander）的早期作品。他经常发表一些作品，内容是总结他在解决设计问题方面的经验，以及这些知识与城市和建筑模式之间有何关联。有一天，亚历山大突然发现，重复使用这些模式可以让某些设计构造取得我们期望的最佳效果。亚历山大与萨拉-石川佳纯和穆雷西乐弗斯坦合作
android高级组件使用(一) 百合不是茶 android RatingBar Spinner
1、自动完成文本框（AutoCompleteTextView） AutoCompleteTextView从EditText派生出来，实际上也是一个文本编辑框，但它比普通编辑框多一个功能：当用户输入一个字符后，自动完成文本框会显示一个下拉菜单，供用户从中选择，当用户选择某个菜单项之后，AutoCompleteTextView按用户选择自动填写该文本框。使用AutoCompleteTex
[网络与通讯]路由器市场大有潜力可挖掘 comsci 网络
如果国内的电子厂商和计算机设备厂商觉得手机市场已经有点饱和了,那么可以考虑一下交换机和路由器市场的进入问题..... 这方面的技术和知识,目前处在一个开放型的状态,有利于各类小型电子企业进入 &nbs
自写简单Redis内存统计shell 商人shang Linux shell 统计Redis内存
#!/bin/bash address="192.168.150.128:6666,192.168.150.128:6666" hosts=(${address//,/ }) sfile="staticts.log" for hostitem in ${hosts[@]} do ipport=(${hostitem
单例模式(饿汉 vs懒汉) oloz 单例模式
package 单例模式; /* * 应用场景:保证在整个应用之中某个对象的实例只有一个 * 单例模式种的《懒汉模式》 * */ public class Singleton { //01 将构造方法私有化，外界就无法用new Singleton()的方式获得实例 private Singleton(){}; //02 申明类得唯一实例 priva
springMvc json支持杨白白 json springmvc
1.Spring mvc处理json需要使用jackson的类库，因此需要先引入jackson包 2在spring mvc中解析输入为json格式的数据:使用@RequestBody来设置输入 @RequestMapping("helloJson") public @ResponseBody JsonTest helloJson() {
android播放，掃描添加本地音頻文件小桔子
最近幾乎沒有什麽事情，繼續鼓搗我的小東西。想在項目中加入一個簡易的音樂播放器功能，就像華為p6桌面上那麼大小的音樂播放器。用過天天動聽或者QQ音樂播放器的人都知道，可已通過本地掃描添加歌曲。不知道他們是怎麼實現的，我覺得應該掃描設備上的所有文件，過濾出音頻文件，每個文件實例化為一個實體，記錄文件名、路徑、歌手、類型、大小等信息。具體算法思想，
oracle常用命令 aichenglong oracle dba 常用命令
1 创建临时表空间 create temporary tablespace user_temp tempfile 'D:\oracle\oradata\Oracle9i\user_temp.dbf' size 50m autoextend on next 50m maxsize 20480m extent management local
25个Eclipse插件 AILIKES eclipse插件
提高代码质量的插件1. FindBugsFindBugs可以帮你找到Java代码中的bug，它使用Lesser GNU Public License的自由软件许可。2. CheckstyleCheckstyle插件可以集成到Eclipse IDE中去，能确保Java代码遵循标准代码样式。3. ECLemmaECLemma是一款拥有Eclipse Public License许可的免费工具，它提供了
Spring MVC拦截器+注解方式实现防止表单重复提交 baalwolf spring mvc
原理：在新建页面中Session保存token随机码，当保存时验证，通过后删除，当再次点击保存时由于服务器端的Session中已经不存在了，所有无法验证通过。 1.新建注解： ? 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
《Javascript高级程序设计(第3版)》闭包理解 bijian1013 JavaScript
“闭包是指有权访问另一个函数作用域中的变量的函数。”--《Javascript高级程序设计(第3版)》看以下代码： <script type="text/javascript"> function outer() { var i = 10; return f
AngularJS Module类的方法 bijian1013 JavaScript AngularJS Module
AngularJS中的Module类负责定义应用如何启动，它还可以通过声明的方式定义应用中的各个片段。我们来看看它是如何实现这些功能的。一.Main方法在哪里如果你是从Java或者Python编程语言转过来的，那么你可能很想知道AngularJS里面的main方法在哪里？这个把所
[Maven学习笔记七]Maven插件和目标 bit1129 maven插件
插件(plugin)和目标(goal) Maven，就其本质而言，是一个插件执行框架，Maven的每个目标的执行逻辑都是由插件来完成的，一个插件可以有1个或者几个目标，比如maven-compiler-plugin插件包含compile和testCompile，即maven-compiler-plugin提供了源代码编译和测试源代码编译的两个目标使用插件和目标使得我们可以干预
【Hadoop八】Yarn的资源调度策略 bit1129 hadoop
1. Hadoop的三种调度策略 Hadoop提供了3中作业调用的策略， FIFO Scheduler Fair Scheduler Capacity Scheduler 以上三种调度算法，在Hadoop MR1中就引入了，在Yarn中对它们进行了改进和完善.Fair和Capacity Scheduler用于多用户共享的资源调度 2. 多用户资源共享的调度
Nginx使用Linux内存加速静态文件访问 ronin47
Nginx是一个非常出色的静态资源web服务器。如果你嫌它还不够快，可以把放在磁盘中的文件，映射到内存中，减少高并发下的磁盘IO。先做几个假设。nginx.conf中所配置站点的路径是/home/wwwroot/res，站点所对应文件原始存储路径：/opt/web/res shell脚本非常简单，思路就是拷贝资源文件到内存中，然后在把网站的静态文件链接指向到内存中即可。具体如下：
关于Unity3D中的Shader的知识 brotherlamp unity unity资料 unity教程 unity视频 unity自学
首先先解释下Unity3D的Shader，Unity里面的Shaders是使用一种叫ShaderLab的语言编写的，它同微软的FX文件或者NVIDIA的CgFX有些类似。传统意义上的vertex shader和pixel shader还是使用标准的Cg/HLSL 编程语言编写的。因此Unity文档里面的Shader，都是指用ShaderLab编写的代码，然后我们来看下Unity3D自带的60多个S
CopyOnWriteArrayList vs ArrayList bylijinnan java
package com.ljn.base; import java.util.ArrayList; import java.util.Iterator; import java.util.List; import java.util.concurrent.CopyOnWriteArrayList; /** * 总述： * 1.ArrayListi不是线程安全的，CopyO
内存中栈和堆的区别 chicony 内存
1、内存分配方面：堆：一般由程序员分配释放，若程序员不释放，程序结束时可能由OS回收。注意它与数据结构中的堆是两回事，分配方式是类似于链表。可能用到的关键字如下：new、malloc、delete、free等等。栈：由编译器(Compiler)自动分配释放，存放函数的参数值，局部变量的值等。其操作方式类似于数据结构中
回答一位网友对Scala的提问 chenchao051 scala map
本来准备在私信里直接回复了，但是发现不太方便，就简要回答在这里。问题写道对于scala的简洁十分佩服，但又觉得比较晦涩，例如一例，Map("a" -> List(11,111)).flatMap(_._2)，可否说下最后那个函数做了什么，真正在开发的时候也会如此简洁？谢谢先回答一点，在实际使用中，Scala毫无疑问就是这么简单。
mysql 取每组前几条记录 daizj mysql 分组最大值最小值每组三条记录
一、对分组的记录取前N条记录：例如：取每组的前3条最大的记录 1.用子查询： SELECT * FROM tableName a WHERE 3> (SELECT COUNT(*) FROM tableName b WHERE b.id=a.id AND b.cnt>a. cnt) ORDER BY a.id,a.account DE
HTTP深入浅出 http请求 dcj3sjt126com http
HTTP(HyperText Transfer Protocol)是一套计算机通过网络进行通信的规则。计算机专家设计出HTTP，使HTTP客户（如Web浏览器）能够从HTTP服务器(Web服务器)请求信息和服务，HTTP目前协议的版本是1.1.HTTP是一种无状态的协议，无状态是指Web浏览器和Web服务器之间不需要建立持久的连接，这意味着当一个客户端向服务器端发出请求，然后We
判断MySQL记录是否存在方法比较 dcj3sjt126com mysql
把数据写入到数据库的时，常常会碰到先要检测要插入的记录是否存在，然后决定是否要写入。　　我这里总结了判断记录是否存在的常用方法：　　sql语句： select count ( * ) from tablename; 　　然后读取count(*)的值判断记录是否存在。对于这种方法性能上有些浪费，我们只是想判断记录记录是否存在，没有必要全部都查出来。
对HTML XML的一点认识 e200702084 html xml
感谢http://www.w3school.com.cn提供的资料 HTML 文档中的每个成分都是一个节点。节点根据 DOM，HTML 文档中的每个成分都是一个节点。 DOM 是这样规定的：整个文档是一个文档节点每个 HTML 标签是一个元素节点包含在 HTML 元素中的文本是文本节点每一个 HTML 属性是一个属性节点注释属于注释节点 Node 层次
jquery分页插件 genaiwei jquery Web 前端分页插件
//jquery页码控件// 创建一个闭包 (function($) { // 插件的定义 $.fn.pageTool = function(options) { var totalPa
Mybatis与Ibatis对照入门于学习 Josh_Persistence mybatis ibatis 区别联系
一、为什么使用IBatis/Mybatis 对于从事 Java EE 的开发人员来说，iBatis 是一个再熟悉不过的持久层框架了，在 Hibernate、JPA 这样的一站式对象 / 关系映射（O/R Mapping）解决方案盛行之前，iBaits 基本是持久层框架的不二选择。即使在持久层框架层出不穷的今天，iBatis 凭借着易学易用、
C中怎样合理决定使用那种整数类型？秋风扫落叶 c 数据类型
如果需要大数值(大于32767或小于32767), 使用long 型。否则, 如果空间很重要 (如有大数组或很多结构), 使用 short 型。除此之外, 就使用 int 型。如果严格定义的溢出特征很重要而负值无关紧要, 或者你希望在操作二进制位和字节时避免符号扩展的问题, 请使用对应的无符号类型。但是, 要注意在表达式中混用有符号和无符号值的情况。 &nbs
maven问题 zhb8015 maven问题
问题1： Eclipse 中新建maven项目无法添加src/main/java 问题 eclipse创建maevn web项目，在选择maven_archetype_web原型后，默认只有src/main/resources这个Source Floder。按照maven目录结构，添加src/main/ja
(二)androidpn-server tomcat版源码解析之--push消息处理 spjich java androdipn 推送
在 (一)androidpn-server tomcat版源码解析之--项目启动这篇中，已经描述了整个推送服务器的启动过程，并且把握到了消息的入口即XmppIoHandler这个类，今天我将继续往下分析下面的核心代码，主要分为3大块，链接创建，消息的发送，链接关闭。先贴一段XmppIoHandler的部分代码 /** * Invoked from an I/O proc
用js中的formData类型解决ajax提交表单时文件不能被serialize方法序列化的问题中华好儿孙 JavaScript Ajax Web 上传文件 FormData
var formData = new FormData($("#inputFileForm")[0]); $.ajax({ type:'post', url:webRoot+"/electronicContractUrl/webapp/uploadfile", data:formData, async: false, ca
mybatis常用jdbcType数据类型 ysj5125094 mybatis mapper jdbcType
MyBatis 通过包含的jdbcType 类型 BIT FLOAT CHAR

分布式TensorFlow － 基于keras和kubernetes的分布式tensorflow训练