- 为了在未来的人工智能世界中取得成功,学生们必须学习人类写作的优点
睿邸管家
澳大利亚各地的学生在新学年开始使用铅笔、钢笔和键盘学习写字。在工作场所,机器也在学习写作,如此有效,几年之内,它们可能会写得比人类更好。有时它们已经做到了,就像Grammarly这样的应用程序所展示的那样。当然,人类现在的日常写作可能很快就会由具有人工智能(AI)的机器来完成。手机和电子邮件软件常用的预测文本是无数人每天都在使用的一种人工智能写作形式。据AI行业研究机构称,到2022年,人工智能及
- Anaconda 和 Miniconda:功能详解与选择建议
古月฿
python入门pythonconda
Anaconda和Miniconda详细介绍一、Anaconda的详细介绍1.什么是Anaconda?Anaconda是一个开源的包管理和环境管理工具,在数据科学、机器学习以及科学计算领域发挥着关键作用。它以Python和R语言为基础,为用户精心准备了大量预装库和工具,极大地缩短了搭建数据科学环境的时间。对于那些想要快速开展数据分析、模型训练等工作的人员来说,Anaconda就像是一个一站式的“数
- 人工智能应用研究快讯 2021-11-30
峰谷皆平
[HTML]ArtificialIntelligenceforSkinCancerDetection:ScopingReviewATakiddin,JSchneider,YYang,AAbd-Alrazaq...JournalofMedicalInternet...,2021ABSTACT:Background:Skincanceristhemostcommoncancertypeaffectin
- 机器学习必备数学与编程指南:从入门到精通
a小胡哦
机器学习基础机器学习人工智能
一、机器学习核心数学基础1.线性代数(神经网络的基础)必须掌握:矩阵运算(乘法、转置、逆)向量空间与线性变换特征值分解与奇异值分解(SVD)为什么重要:神经网络本质就是矩阵运算学习技巧:用NumPy实际操作矩阵运算2.概率与统计(模型评估的关键)核心概念:条件概率与贝叶斯定理概率分布(正态、泊松、伯努利)假设检验与p值应用场景:朴素贝叶斯、A/B测试3.微积分(优化算法的基础)重点掌握:导数与偏导
- 从振动信号到精准预警:AI 如何重塑工业设备健康管理?
缘华工业智维
人工智能计算机视觉边缘计算信息与通信
在智能制造浪潮席卷全球的当下,工业生产正经历着从传统模式向智能化、数字化转型的深刻变革。在这场变革中,AI驱动的振动分析技术犹如一颗璀璨新星,成为工业设备可靠运行的“健康卫士”。它通过在设备关键部位部署振动传感器,如同医生为患者听诊般实时采集设备运行时的振动信号,再借助强大的人工智能算法对这些“工业脉搏”进行深度解析,从而实现对工业设备从故障预警到寿命预测的全周期精准守护。一、AI振动分析:设备状
- AI 生成虚拟宠物:24 小时陪你聊天解闷
大力出奇迹985
人工智能宠物
本文围绕AI生成虚拟宠物展开,介绍这类依托人工智能技术诞生的虚拟伙伴,能实现24小时不间断陪伴聊天,为人们解闷。文中详细阐述其技术基础,包括自然语言处理、机器学习等;分析多样功能,如个性化互动、情绪回应等;探讨在独居人群、压力大者等不同群体中的应用场景,最后总结其为人们生活带来的积极影响及未来发展潜力,展现AI虚拟宠物在陪伴领域的独特价值。一、AI生成虚拟宠物的诞生背景与技术基石在快节奏的现代社会
- 基于Python的AI健康助手:开发与部署全攻略
AI算力网络与通信
AI算力网络与通信原理AI人工智能大数据架构python人工智能开发语言ai
基于Python的AI健康助手:开发与部署全攻略关键词:Python、AI健康助手、机器学习、自然语言处理、Flask、部署、健康管理摘要:本文将详细介绍如何使用Python开发一个AI健康助手,从需求分析、技术选型到核心功能实现,再到最终部署上线的完整过程。我们将使用自然语言处理技术理解用户健康咨询,通过机器学习模型提供个性化建议,并展示如何用Flask框架构建Web应用接口。文章包含大量实际代
- GPT-4 在 AIGC 中的微调技巧:让模型更懂你的需求
AIGC应用创新大全
AI人工智能与大数据应用开发MCP&Agent云算力网络AIGCai
GPT-4在AIGC中的微调技巧:让模型更懂你的需求关键词:GPT-4、AIGC、模型微调、监督学习、指令优化、过拟合预防、个性化生成摘要:AIGC(人工智能生成内容)正在重塑内容创作行业,但通用的GPT-4模型可能无法精准匹配你的垂直需求——比如写电商爆款文案时总“跑题”,或生成技术文档时专业术语不够。本文将用“教小朋友学画画”的通俗类比,从微调的底层逻辑讲到实战技巧,带你掌握让GPT-4“更懂
- AIGC内容生成实战:如何用ChatGPT+DALL·E打造高转化内容
AI大模型应用工坊
AI大模型开发实战AIGCchatgptai
AIGC内容生成实战:如何用ChatGPT+DALL·E打造高转化内容关键词:AIGC、ChatGPT、DALL·E、内容生成、高转化营销、多模态协同、提示词工程摘要:随着AIGC(人工智能生成内容)技术的爆发式发展,ChatGPT(文本生成)与DALL·E(图像生成)的组合已成为内容创作领域的“黄金搭档”。本文将深度解析二者的协同原理,结合实战案例演示从需求分析到内容落地的全流程,并揭示提升内容
- 数据分析领域中AI人工智能的发展前景展望
AI大模型应用工坊
AI大模型开发实战数据分析人工智能数据挖掘ai
数据分析领域中AI人工智能的发展前景展望关键词:数据分析、人工智能、机器学习、深度学习、数据挖掘、预测分析、自动化摘要:本文深入探讨了人工智能在数据分析领域的发展现状和未来趋势。我们将从核心技术原理出发,分析AI如何改变传统数据分析范式,详细讲解机器学习算法在数据分析中的应用,并通过实际案例展示AI驱动的数据分析解决方案。文章还将探讨行业应用场景、工具生态以及未来发展面临的挑战和机遇,为数据分析师
- AI人工智能中的数据挖掘:提升智能决策能力
AI人工智能中的数据挖掘:提升智能决策能力关键词:数据挖掘、人工智能、机器学习、智能决策、数据分析、特征工程、模型优化摘要:本文深入探讨了数据挖掘在人工智能领域中的核心作用,重点分析了如何通过数据挖掘技术提升智能决策能力。文章从基础概念出发,详细介绍了数据挖掘的关键算法、数学模型和实际应用场景,并通过Python代码示例展示了数据挖掘的全流程。最后,文章展望了数据挖掘技术的未来发展趋势和面临的挑战
- 数据中台中的数据科学工作台:Jupyter集成方案
AI大数据智能洞察
大数据与AI人工智能jupyter信息可视化ideai
数据中台中的数据科学工作台:Jupyter集成方案关键词:数据中台、数据科学工作台、JupyterNotebook、数据科学、机器学习、数据可视化、协作开发摘要:本文深入探讨了在数据中台架构中集成JupyterNotebook作为数据科学工作台的完整解决方案。我们将从数据中台的基本概念出发,详细分析Jupyter在数据科学工作流中的核心作用,介绍多种集成方案和技术实现细节,并通过实际案例展示如何构
- 2018年中南大学中英翻译
某翁
参考:20180827235856533.jpg【1】机器学习理论表明,机器学习算法能从有限个训练集样本上得到较好的泛化【1】Machinelearningtheoryshowsthatmachinelearningalgorithmcangeneralizewellfromfinitetrainingsetsampleslimited有限的infinite无限的【2】这似乎违背了一些基本的逻辑准
- 【三桥君】MCP中台,究竟如何实现多模型、多渠道、多环境的统一管控?如何以MCP为核心设计AI应用架构?
三桥君
《三桥君MCP落地方法论》《三桥君AI大模型落地方法论》#《三桥君AI产品方法论》人工智能AI产品经理MCPAPI三桥君系统架构llama
你好,我是✨三桥君✨本文介绍>>一、引言随着人工智能技术的快速发展,越来越多的企业开始引入大语言模型(LLM)以提升用户体验和运营效率。然而,如何高效、稳定地将这些AI能力落地到生产环境呢?传统的系统架构往往难以应对AI应用的高并发、低延迟和灵活扩展需求,因此,从整体架构角度设计AI应用架构显得尤为重要。本文三桥君将深入探讨以MCP为核心的AI应用架构,并分析多种部署方式的优劣势,为企业在AI落地
- 深入理解卷积神经网络(CNN)与循环神经网络(RNN)
CodeJourney.
cnnrnn人工智能
在当今的人工智能领域,神经网络无疑是最为璀璨的明珠之一。而卷积神经网络(ConvolutionalNeuralNetworks,CNN)和循环神经网络(RecurrentNeuralNetworks,RNN)作为神经网络家族中的重要成员,各自有着独特的架构和强大的功能,广泛应用于众多领域。本文将深入探讨这两种神经网络的原理、特点以及应用场景,为对深度学习感兴趣的读者提供全面的知识讲解。一、卷积神经
- 今年校招竞争真激烈
12_05
程序员满大街,都要找不到工作了。即使人工智能满大街,我也后悔当初没学机器学习,后悔当初没学Java。C++真难找工作。难道毕了业就失业吗?好担心!
- 时序预测 | MATLAB实现贝叶斯优化CNN-GRU时间序列预测(股票价格预测)
Matlab机器学习之心
matlabcnngru
✅作者简介:热爱数据处理、数学建模、仿真设计、论文复现、算法创新的Matlab仿真开发者。更多Matlab代码及仿真咨询内容点击主页:Matlab科研工作室个人信条:格物致知,期刊达人。内容介绍股票价格预测一直是金融领域一个极具挑战性的课题。其内在的非线性、随机性和复杂性使得传统的预测方法难以取得令人满意的效果。近年来,深度学习技术,特别是卷积神经网络(CNN)和门控循环单元(GRU)的结合,为时
- 时序预测 | MATLAB实现BO-CNN-GRU贝叶斯优化卷积门控循环单元时间序列预测
Matlab算法改进和仿真定制工程师
matlabcnngru
✅作者简介:热爱数据处理、数学建模、算法创新的Matlab仿真开发者。更多Matlab代码及仿真咨询内容点击:Matlab科研工作室个人信条:格物致知。内容介绍时间序列预测在各个领域都具有重要的应用价值,例如金融市场预测、气象预报、交通流量预测等。准确地预测未来趋势对于决策制定至关重要。近年来,深度学习技术在时间序列预测领域取得了显著进展,其中卷积神经网络(CNN)和门控循环单元(GRU)由于其强
- Python Gradio:实现交互式图像编辑
PythonAI编程架构实战家
Python编程之道python开发语言ai
PythonGradio:实现交互式图像编辑关键词:Python,Gradio,交互式图像编辑,计算机视觉,深度学习,图像处理,Web应用摘要:本文将深入探讨如何使用Python的Gradio库构建交互式图像编辑应用。我们将从基础概念开始,逐步介绍Gradio的核心功能,并通过实际代码示例展示如何实现各种图像处理功能。文章将涵盖图像滤镜应用、对象检测、风格迁移等高级功能,同时提供完整的项目实战案例
- 基于随机森林的白酒风味智能分类系统:从数据到洞察的完整实践
笙囧同学
python
作者:笙囧同学|中科院计算机大模型方向硕士|全栈开发爱好者座右铭:偷懒是人生进步的阶梯联系方式:
[email protected]各大平台账号/公众号:笙囧同学前言大家好,我是笙囧同学!今天给大家分享一个超级有趣且技术含量爆表的项目——白酒风味智能分类系统。作为一个既爱技术又爱美酒的程序员,我花了大量时间研究如何用机器学习的方法来"品酒",让AI帮我们识别白酒的风味特征。这个项目融合了机器学习、数
- 如何运用深度学习打造高效AI人工智能系统
AI智能探索者
AIAgent智能体开发实战人工智能深度学习ai
如何运用深度学习打造高效AI人工智能系统关键词:深度学习、AI系统、神经网络、模型优化、实战开发摘要:本文将从深度学习的核心概念出发,结合生活实例和代码实战,系统讲解如何构建高效AI系统。我们会拆解数据准备、模型设计、训练优化、部署落地的全流程,揭秘“数据-模型-训练-推理”的协同机制,并通过具体案例演示从0到1开发AI系统的关键技巧,帮助开发者掌握打造高效AI系统的底层逻辑。背景介绍目的和范围在
- 非欧空间计算加速:图神经网络与微分几何计算的GPU优化(流形数据的内存布局优化策略)
九章云极AladdinEdu
空间计算神经网络人工智能gpu算力算法java开发语言
一、非欧空间计算的革命性意义与核心挑战在三维形状分析、社交网络建模、分子动力学模拟等领域,非欧几里得空间数据(流形数据)的处理正推动人工智能技术向更复杂的几何结构迈进。传统欧式空间优化方法在处理流形数据时面临根本性局限:黎曼度量导致距离计算失效、局部坐标系动态变化引发内存访问模式混乱、曲率变化影响并行计算效率。本文提出基于分块流形存储(BlockedManifoldStorage,BMS)与层次化
- Spring AI与机器学习:智能应用开发新范式
tmjpz04412
人工智能spring机器学习
SpringAI与机器学习的整合SpringAI是一个基于Spring生态的AI开发框架,旨在简化智能应用的开发流程。通过SpringAI,开发者可以快速集成机器学习模型,构建高效的智能应用。SpringAI支持多种机器学习库和框架,如TensorFlow、PyTorch和Scikit-learn,提供统一的API接口。SpringAI的核心优势在于其模块化设计和自动化配置。开发者无需关心复杂的依
- 响应式编程实践:Spring Boot WebFlux构建高性能非阻塞服务
fanxbl957
Webspringboot后端java
博主介绍:Java、Python、js全栈开发“多面手”,精通多种编程语言和技术,痴迷于人工智能领域。秉持着对技术的热爱与执着,持续探索创新,愿在此分享交流和学习,与大家共进步。全栈开发环境搭建运行攻略:多语言一站式指南(环境搭建+运行+调试+发布+保姆级详解)感兴趣的可以先收藏起来,希望帮助更多的人响应式编程实践:SpringBootWebFlux构建高性能非阻塞服务一、引言在当今数字化时代,互
- 企业级RAG的数据方案选择 - 向量数据库、图数据库和知识图谱
南七小僧
AI技术产品经理网站开发人工智能数据库知识图谱人工智能
如何为企业RAG选择合适的数据存储方式摘要:本文讨论了矢量数据库、图数据库和知识图谱在解决信息检索挑战方面的重要性,特别是针对企业规模的检索增强生成(RAG)。看看海外人工智能企业Writer是如何利用知识图谱增强企业级RAG。要点概要:矢量数据库高效存储数据,但缺乏上下文和关联信息。图数据库优先考虑数据点之间的关系,受益于关系结构。知识图谱在语义存储方面表现出色,由于其能够编码丰富的上下文信息,
- 【人工智能入门必看的最全Python编程实战(1)】
DFCED
人工智能python开发语言深度学习找工作就业
--------------------------------------------------------------------------------------------------------------------1.AIGC未来发展前景未完持续…1.1人工智能相关科研重要性拥有一篇人工智能科研论文及专利软著竞赛是保研考研留学深造以及找工作的关键门票!!!拥有一篇人工智能科研论文
- 基于深度学习的目标检测算法综述:从RCNN到YOLOv13,一文看懂十年演进!
人工智能教程
深度学习目标检测算法人工智能自动驾驶YOLO机器学习
一、引言:目标检测的十年巨变2012年AlexNet拉开深度学习序幕,2014年RCNN横空出世,目标检测从此进入“深度时代”。十年间,算法从两阶段到单阶段,从Anchor-base到Anchor-free,从CNN到Transformer,从2D到3D,从监督学习到自监督学习,迭代速度之快令人目不暇接。本文将系统梳理基于深度学习的目标检测算法,带你全面了解技术演进、核心思想、代表算法、工业落地与
- Baumer工业相机堡盟工业相机如何通过YoloV8深度学习模型实现不同水果的检测识别(C#代码,UI界面版)
Baumer工业相机堡盟工业相机如何通过YoloV8深度学习模型实现不同水果的检测识别(C#代码,UI界面版))工业相机使用YoloV8模型实现不同水果的检测识别工业相机通过YoloV8模型实现不同水果的检测识别的技术背景在相机SDK中获取图像转换图像的代码分析工业相机图像转换Bitmap图像格式和Mat图像重要核心代码本地文件图像转换Bitmap图像格式和Mat图像重要核心代码Mat图像导入Yo
- 2025毫米波雷达技术白皮书:智能汽车与物联网的感知核心
随着人工智能、物联网(IoT)和智能汽车产业的迅猛发展,毫米波雷达技术正成为感知领域的核心驱动力。毫米波雷达凭借其高精度、全天候和强抗干扰能力,广泛应用于智能汽车的自动驾驶、物联网的环境感知以及工业自动化。2025年,毫米波雷达技术在性能、应用场景和市场规模上都达到了一个全新的高度。本白皮书将深入探讨毫米波雷达技术的核心优势、发展趋势及其在智能汽车与物联网中的应用前景,同时推荐各大品牌的领先产品方
- 从零开始构建深度学习环境:基于Pytorch、CUDA与cuDNN的虚拟环境搭建与实践(适合初学者)
荣华富贵8
程序员的知识储备2程序员的知识储备3深度学习pytorch人工智能
摘要:深度学习正在引领人工智能技术的革新,而对于初学者来说,正确搭建深度学习环境是迈向AI研究与应用的第一步。本文将为读者提供一套详尽的教程,指导如何在本地环境中搭建Pytorch、CUDA与cuDNN,以及如何利用Anaconda和PyCharm进行高效开发。内容涵盖从环境配置、常见错误修正,到基础的深度学习模型构建及训练。我们旨在为深度学习零基础的入门者提供一个全面且易于理解的“保姆级”教程,
- 多线程编程之理财
周凡杨
java多线程生产者消费者理财
现实生活中,我们一边工作,一边消费,正常情况下会把多余的钱存起来,比如存到余额宝,还可以多挣点钱,现在就有这个情况:我每月可以发工资20000万元 (暂定每月的1号),每月消费5000(租房+生活费)元(暂定每月的1号),其中租金是大头占90%,交房租的方式可以选择(一月一交,两月一交、三月一交),理财:1万元存余额宝一天可以赚1元钱,
- [Zookeeper学习笔记之三]Zookeeper会话超时机制
bit1129
zookeeper
首先,会话超时是由Zookeeper服务端通知客户端会话已经超时,客户端不能自行决定会话已经超时,不过客户端可以通过调用Zookeeper.close()主动的发起会话结束请求,如下的代码输出内容
Created /zoo-739160015
CONNECTEDCONNECTED
.............CONNECTEDCONNECTED
CONNECTEDCLOSEDCLOSED
- SecureCRT快捷键
daizj
secureCRT快捷键
ctrl + a : 移动光标到行首ctrl + e :移动光标到行尾crtl + b: 光标前移1个字符crtl + f: 光标后移1个字符crtl + h : 删除光标之前的一个字符ctrl + d :删除光标之后的一个字符crtl + k :删除光标到行尾所有字符crtl + u : 删除光标至行首所有字符crtl + w: 删除光标至行首
- Java 子类与父类这间的转换
周凡杨
java 父类与子类的转换
最近同事调的一个服务报错,查看后是日期之间转换出的问题。代码里是把 java.sql.Date 类型的对象 强制转换为 java.sql.Timestamp 类型的对象。报java.lang.ClassCastException。
代码:
- 可视化swing界面编辑
朱辉辉33
eclipseswing
今天发现了一个WindowBuilder插件,功能好强大,啊哈哈,从此告别手动编辑swing界面代码,直接像VB那样编辑界面,代码会自动生成。
首先在Eclipse中点击help,选择Install New Software,然后在Work with中输入WindowBui
- web报表工具FineReport常用函数的用法总结(文本函数)
老A不折腾
finereportweb报表工具报表软件java报表
文本函数
CHAR
CHAR(number):根据指定数字返回对应的字符。CHAR函数可将计算机其他类型的数字代码转换为字符。
Number:用于指定字符的数字,介于1Number:用于指定字符的数字,介于165535之间(包括1和65535)。
示例:
CHAR(88)等于“X”。
CHAR(45)等于“-”。
CODE
CODE(text):计算文本串中第一个字
- mysql安装出错
林鹤霄
mysql安装
[root@localhost ~]# rpm -ivh MySQL-server-5.5.24-1.linux2.6.x86_64.rpm Preparing... #####################
- linux下编译libuv
aigo
libuv
下载最新版本的libuv源码,解压后执行:
./autogen.sh
这时会提醒找不到automake命令,通过一下命令执行安装(redhat系用yum,Debian系用apt-get):
# yum -y install automake
# yum -y install libtool
如果提示错误:make: *** No targe
- 中国行政区数据及三级联动菜单
alxw4616
近期做项目需要三级联动菜单,上网查了半天竟然没有发现一个能直接用的!
呵呵,都要自己填数据....我了个去这东西麻烦就麻烦的数据上.
哎,自己没办法动手写吧.
现将这些数据共享出了,以方便大家.嗯,代码也可以直接使用
文件说明
lib\area.sql -- 县及县以上行政区划分代码(截止2013年8月31日)来源:国家统计局 发布时间:2014-01-17 15:0
- 哈夫曼加密文件
百合不是茶
哈夫曼压缩哈夫曼加密二叉树
在上一篇介绍过哈夫曼编码的基础知识,下面就直接介绍使用哈夫曼编码怎么来做文件加密或者压缩与解压的软件,对于新手来是有点难度的,主要还是要理清楚步骤;
加密步骤:
1,统计文件中字节出现的次数,作为权值
2,创建节点和哈夫曼树
3,得到每个子节点01串
4,使用哈夫曼编码表示每个字节
- JDK1.5 Cyclicbarrier实例
bijian1013
javathreadjava多线程Cyclicbarrier
CyclicBarrier类
一个同步辅助类,它允许一组线程互相等待,直到到达某个公共屏障点 (common barrier point)。在涉及一组固定大小的线程的程序中,这些线程必须不时地互相等待,此时 CyclicBarrier 很有用。因为该 barrier 在释放等待线程后可以重用,所以称它为循环的 barrier。
CyclicBarrier支持一个可选的 Runnable 命令,
- 九项重要的职业规划
bijian1013
工作学习
一. 学习的步伐不停止 古人说,活到老,学到老。终身学习应该是您的座右铭。 世界在不断变化,每个人都在寻找各自的事业途径。 您只有保证了足够的技能储
- 【Java范型四】范型方法
bit1129
java
范型参数不仅仅可以用于类型的声明上,例如
package com.tom.lang.generics;
import java.util.List;
public class Generics<T> {
private T value;
public Generics(T value) {
this.value =
- 【Hadoop十三】HDFS Java API基本操作
bit1129
hadoop
package com.examples.hadoop;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataInputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoo
- ua实现split字符串分隔
ronin47
lua split
LUA并不象其它许多"大而全"的语言那样,包括很多功能,比如网络通讯、图形界面等。但是LUA可以很容易地被扩展:由宿主语言(通常是C或 C++)提供这些功能,LUA可以使用它们,就像是本来就内置的功能一样。LUA只包括一个精简的核心和最基本的库。这使得LUA体积小、启动速度快,从 而适合嵌入在别的程序里。因此在lua中并没有其他语言那样多的系统函数。习惯了其他语言的字符串分割函
- java-从先序遍历和中序遍历重建二叉树
bylijinnan
java
public class BuildTreePreOrderInOrder {
/**
* Build Binary Tree from PreOrder and InOrder
* _______7______
/ \
__10__ ___2
/ \ /
4
- openfire开发指南《连接和登陆》
开窍的石头
openfire开发指南smack
第一步
官网下载smack.jar包
下载地址:http://www.igniterealtime.org/downloads/index.jsp#smack
第二步
把smack里边的jar导入你新建的java项目中
开始编写smack连接openfire代码
p
- [移动通讯]手机后盖应该按需要能够随时开启
comsci
移动
看到新的手机,很多由金属材质做的外壳,内存和闪存容量越来越大,CPU速度越来越快,对于这些改进,我们非常高兴,也非常欢迎
但是,对于手机的新设计,有几点我们也要注意
第一:手机的后盖应该能够被用户自行取下来,手机的电池的可更换性应该是必须保留的设计,
- 20款国外知名的php开源cms系统
cuiyadll
cms
内容管理系统,简称CMS,是一种简易的发布和管理新闻的程序。用户可以在后端管理系统中发布,编辑和删除文章,即使您不需要懂得HTML和其他脚本语言,这就是CMS的优点。
在这里我决定介绍20款目前国外市面上最流行的开源的PHP内容管理系统,以便没有PHP知识的读者也可以通过国外内容管理系统建立自己的网站。
1. Wordpress
WordPress的是一个功能强大且易于使用的内容管
- Java生成全局唯一标识符
darrenzhu
javauuiduniqueidentifierid
How to generate a globally unique identifier in Java
http://stackoverflow.com/questions/21536572/generate-unique-id-in-java-to-label-groups-of-related-entries-in-a-log
http://stackoverflow
- php安装模块检测是否已安装过, 使用的SQL语句
dcj3sjt126com
sql
SHOW [FULL] TABLES [FROM db_name] [LIKE 'pattern']
SHOW TABLES列举了给定数据库中的非TEMPORARY表。您也可以使用mysqlshow db_name命令得到此清单。
本命令也列举数据库中的其它视图。支持FULL修改符,这样SHOW FULL TABLES就可以显示第二个输出列。对于一个表,第二列的值为BASE T
- 5天学会一种 web 开发框架
dcj3sjt126com
Web框架framework
web framework层出不穷,特别是ruby/python,各有10+个,php/java也是一大堆 根据我自己的经验写了一个to do list,按照这个清单,一条一条的学习,事半功倍,很快就能掌握 一共25条,即便很磨蹭,2小时也能搞定一条,25*2=50。只需要50小时就能掌握任意一种web框架
各类web框架大同小异:现代web开发框架的6大元素,把握主线,就不会迷路
建议把本文
- Gson使用三(Map集合的处理,一对多处理)
eksliang
jsongsonGson mapGson 集合处理
转载请出自出处:http://eksliang.iteye.com/blog/2175532 一、概述
Map保存的是键值对的形式,Json的格式也是键值对的,所以正常情况下,map跟json之间的转换应当是理所当然的事情。 二、Map参考实例
package com.ickes.json;
import java.lang.refl
- cordova实现“再点击一次退出”效果
gundumw100
android
基本的写法如下:
document.addEventListener("deviceready", onDeviceReady, false);
function onDeviceReady() {
//navigator.splashscreen.hide();
document.addEventListener("b
- openldap configuration leaning note
iwindyforest
configuration
hostname // to display the computer name
hostname <changed name> // to change
go to: /etc/sysconfig/network, add/modify HOSTNAME=NEWNAME to change permenately
dont forget to change /etc/hosts
- Nullability and Objective-C
啸笑天
Objective-C
https://developer.apple.com/swift/blog/?id=25
http://www.cocoachina.com/ios/20150601/11989.html
http://blog.csdn.net/zhangao0086/article/details/44409913
http://blog.sunnyxx
- jsp中实现参数隐藏的两种方法
macroli
JavaScriptjsp
在一个JSP页面有一个链接,//确定是一个链接?点击弹出一个页面,需要传给这个页面一些参数。//正常的方法是设置弹出页面的src="***.do?p1=aaa&p2=bbb&p3=ccc"//确定目标URL是Action来处理?但是这样会在页面上看到传过来的参数,可能会不安全。要求实现src="***.do",参数通过其他方法传!//////
- Bootstrap A标签关闭modal并打开新的链接解决方案
qiaolevip
每天进步一点点学习永无止境bootstrap纵观千象
Bootstrap里面的js modal控件使用起来很方便,关闭也很简单。只需添加标签 data-dismiss="modal" 即可。
可是偏偏有时候需要a标签既要关闭modal,有要打开新的链接,尝试多种方法未果。只好使用原始js来控制。
<a href="#/group-buy" class="btn bt
- 二维数组在Java和C中的区别
流淚的芥末
javac二维数组数组
Java代码:
public class test03 {
public static void main(String[] args) {
int[][] a = {{1},{2,3},{4,5,6}};
System.out.println(a[0][1]);
}
}
运行结果:
Exception in thread "mai
- systemctl命令用法
wmlJava
linuxsystemctl
对比表,以 apache / httpd 为例 任务 旧指令 新指令 使某服务自动启动 chkconfig --level 3 httpd on systemctl enable httpd.service 使某服务不自动启动 chkconfig --level 3 httpd off systemctl disable httpd.service 检查服务状态 service h
Twice I’ve tried to realistically present the performance of the algorithm. Twice was my paper rejected because of “unfinished methods” or “disappointing results”. There’s a whole culture of “rounding-up”, and trying to do the evaluations fairly just gives you trouble. When fair evaluations get rejected and rounders-up pass through, what do you do?
Anonymous’s story is surely common.
On any given paper, there is an incentive to “cheat” with some of the above methods. This can be hard to resist when so much rides on a paper acceptance _and_ some of the above cheats are not easily detected. Nevertheless, it should be resisted because “cheating” of this sort inevitably fools you as well as others. Fooling yourself in research is a recipe for a career that goes nowhere. Your techniques simply won’t apply well to new problems, you won’t be able to tackle competitions, and ultimately you won’t even trust your own intuition, which is fatal in research.
My best advice for anonymous is to accept that life is difficult here. Spend extra time testing on many datasets rather than a few. Spend extra time thinking about what make a good algorithm, or not. Take the long view and note that, in the long run, the quantity of papers you write is not important, but rather their level of impact. Using a “cheat” very likely subverts long term impact.
How about an index of negative results in machine learning? There’s a Journal of Negative Results in other domains: Ecology & Evolutionary Biology, Biomedicine, and there is Journal of Articles in Support of the Null Hypothesis. A section on negative results in machine learning conferences? This kind of information is very useful in preventing people from taking pathways that lead nowhere: if one wants to classify an algorithm into good/bad, one certainly benefits from unexpectedly bad examples too, not just unexpectedly good examples.
I visited the workshop on negative results at NIPS 2002. My impression was that it did not work well.
The difficulty with negative results in machine learning is that they are too easy. For example, there are a plethora of ways to say that “learning is impossible (in the worst case)”. On the applied side, it’s still common for learning algorithms to not work on simple-seeming problems. In this situation, positive results (this works) are generally more valuable than negative results (this doesn’t work).
This discussion reminds of some interesting research on “anti-learning“, by Adam Kowalczyk. This research studies (empirically and theoretically) machine learning algorithms that yield good performance on the training set but worse than random performance on the independent test set.
Hmm, rereading this post. What do you mean by “brittle”? Why is mutual information brittle?
Standard deviation of loss across the CV folds is not a bad summary of variation in CV performance. I’m not sure one can just reject a paper where the authors bothered to disclose the variation, rather than just plopping out the average. Standard error carries some Gaussian assumptions, but it is still a valid summary. The distribution of loss is sometimes quite close to being Gaussian, too.
As for significance, I came up with the notion of CV-values that measure how often method A is better than method B in a randomly chosen fold of cross-validation replicated very many times.
What I mean by brittle: Suppose you have a box which takes some feature values as input and predicts some probability of label 1 as output. You are not allowed to open this box or determine how it works other than by this process of giving it inputs and observing outputs.
Let x be an input.
Let y be an output.
Assume (x,y) are drawn from a fixed but unknown distribution D.
Let p(x) be a prediction.
For classification error I(|y – p(x)| < 0.5) you can prove a theorem of the rough form:
forall D, with high probability over the draw of m examples independently from D,
expected classification error rate of the box with respect to D is bounded by a function of the observations.
What I mean by “brittle” is that no statement of this sort can be made for any unbounded loss (including log-loss which is integral to mutual information and entropy). You can of course open up the box and analyze its structure or make extra assumptions about D to get a similar but inherently more limited analysis.
The situation with leave-one-out cross validation is not so bad, but it’s still pretty bad. In particular, there exists a very simple learning algorithm/problem pair with the property that the leave-one-out estimate has the variance and deviations of a single coin flip. Yoshua Bengio and Yves Grandvalet in fact proved that there is no unbiased estimator of variance. The paper that I pointed to above shows that for K-fold cross validation on m examples, all moments of the deviations might only be as good as on a test set of size $m/K$.
I’m not sure what a ‘valid summary’ is, but leave-one-out cross validation can not provide results I trust, because I know how to break it.
I have personally observed people using leave-one-out cross validation with feature selection to quickly achieve a severe overfit.
Thanks for the explanation of brittleness! This is a problem with log-loss, but I’d say that it is not a problem with mutual information. Mutual information has well-defined upper bounds. For log-loss, you can put a bound into effect by mixing the prediction with a uniform distribution over y, bounding the maximum log-loss in a way that’s analogous to the Laplace probability estimate. While I agree that unmixed log-loss is brittle, I find classification accuracy noisy.
A reasonable compromise is Brier score. It’s a proper loss function (so it makes good probabilistic sense), and it’s a generalization of classification error where the Brier score of a non-probabilistic classifier equals its classification error, but a probabilistic classifier can benefit from distributing the odds. So, the result you mention holds also for Brier score.
If I perform 2-replicated 5-fold CV of the NBC performance on the Pima indians dataset, I get the following [0.76 0.75 0.87 0.76 0.74 0.77 0.79 0.72 0.78 0.82 0.81 0.79 0.73 0.74 0.82 0.79 0.74 0.77 0.83 0.75 0.79 0.73 0.79 0.80 0.76]. Of course, I can plop out the average of 0.78. But it is nicer to say that the standard deviation is 0.04, and summarize the result as 0.78 +- 0.04. The performance estimate is a random quantity too. In fact, if you perform many replications of cross-validation, the classification accuracy will have a Gaussian-like shape too (a bit skewed, though).
I too recommend against LOO, for the simple reason that the above empirical summaries are often awfully strange.
Very very interesting. However, I still feel (but would love to be convinced otherwise) that when the dataset is small and no additional data can be obtained, LOO-CV is the best among the (admittedly non-ideal) choices. What do you suggest as a practical alternative for a small dataset?
I’m not convinced by your observation about people using LOO-CV with feature selection to overfit. Isn’t this just a problem with reusing the same validation set multiple times? Even if I use a completely separately drawn validation set, which Bengio and Grandvalet show yield an unbiased estimtae of the variance of the prediction error, I can still easily overfit the validation set when doing feature selection, right?
This is my first post on your blog. Thanks so much for putting it up — a very nice resource!
Aleks’s technique for bounding log loss by wrapping the box in a system that mixes with the uniform distribution has a problem: it introduces perverse incentives for the box. One reason why people consider log loss is that the optimal prediction is the probability. When we mix with the uniform distribution, this no longer becomes true. Mixing with the uniform distribution shifts all probabilistic estimates towards 0.5, which means that if the box wants to minimize log loss, it should make an estimate p such that after mixing, you get the actual probability.
David McAllester advocates truncation as a solution to the unboundedness. This has the advantage that it doesn’t create perverse incentives over all nonextreme probabilities.
Even when we swallow the issues of bounding log loss, rates of convergence are typically slower than for classification, essentially because the dynamic range of the loss is larger. Thus, we can expect log loss estimates to be more “noisy”.
Before trusing mutual information, etc…, I want to see rate of convergence bounds of the form I mentioned above.
I’m not sure what Brier score is precisely, but justing using L(p,y)=(p-y)^2 has all the properties mentioned.
I consider reporting standard deviation of cross validation to be problematic. The basic reason is that it’s unclear what I’m supposed to learn. If it has a small deviation, this does not mean that I can expect the future error rate on i.i.d. samples to be within the range of the +/-. It does not mean that if I cut the data in another way (and the data is i.i.d.), I can expect to get results in the same range. There are specific simple counterexamples to each of these intuitions. So, while reporting the range of results you see may be a ‘summary’, it does not seem to contain much useful information for developing confidence in the results.
One semi-reasonable alternative is to report the confidence interval for a Binomial with m/K coin flips, which fits intuition (1), for the classifier formed by drawing randomly from the set of cross-validated classifiers. This won’t leave many people happy, because the intervals become much broader.
The notion that cross validation errors are “gaussian-like” is also false in general, on two counts:
This is an important issue because it’s not always obvious from experimental results (and intuitions derived from experimental results) whether the approach works. The math says that if you rely on leave-one-out cross-validation in particular you’ll end up with bad inuitions about future performance. You may not encounter this problem on some problems, but the monsters are out there.
For rif’s questions — keep in mind that I’m only really considering methods of developing confidence here. I’m ok with people using whatever ugly nasty hacks they want in producing a good predictor. You are correct about the feature selection example being about using the same validation set multiple times. (Bad!) The use of leave-one-out simply aggravated the effect of this with respect to using a holdout set because it’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.
Developing good confidence on a small dataset is a hard problem. The simplest solution is to accept the need for a test set even though you have few examples. In this case, it might be worthwhile to compute very exact confidence intervals (code here). Doing K-fold cross validation on m examples and using confidence intervals for m/K coin flips is better, but by an unknown (and variable) amount. The theory approach, which has never yet worked well, is to very carefully use the examples for both purposes. A blend of these two approaches can be helpful, but the computation is a bit rough. I’m currently working with Matti Kääriäinen on seeing how well the progressive validation approach can be beat into shape.
And of course we should remember that all of this is only meaningful when the data is i.i.d, which it often clearly is not.
I think we have a case where the assumptions of applied machine learners differ from the assumptions of the theoretical machine learners. Let’s hash it out!
==
* (Half-)Brier score is 0.5(p-y)^2, where p and y are vectors of probabilities (p-predicted, y-observed).
* A side consequence of mixing is also truncation; but mixing is smooth, whereas truncation results in discontinuities of the gradient. There is a good justification for mixing: if you see that you misclassify in 10% of the cases on the unseen test data, you can anticipate similar error in the future, and calibrate the predictions by mixing with the uniform distribution.
* Standard deviation of the CV results is a foundation for bias/variance decomposition and a tremendous amount of work in applied statistics and machine learning. I wouldn’t toss it away so lightly, and especially not based on the argument of non-independence of folds. The purpose of non-independence of folds in the first place is that you get a better estimate of the distribution over all the training/test splits of a fixed proportion (one could say that the split is chosen by i.i.d., not the instances). You get a better estimate with 10-fold CV than by picking 10 train/test splits by random.
* Both binomial and Gaussian model of the error distribution are just models. Neither of them is ‘true’, but they are based on slightly different assumptions. I generally look at the histogram and eyeball it for gaussianity, as I have done in my example. The fact that it is a skewed distribution (with the truncated hump at ~85%) empirically invalidates the binomial error model too. One can compute the first two moments as a “finite” summary as an informative summary even if the underlying distribution has more of them.
I am not advocating ‘tossing’ cross-validation. I am saying that caution should be exercised in trusting it.
Do you have a URL for this other analysis?
You are right to be skeptical about models, but the ordering of skepticism seems important. Models which make more assumptions (and in particular which makes assumptions that are clearly false) should be viewed with more skepticism.
What is standard deviation of cross validation errors is supposed to describe? I listed and dismissed a couple possibilities, so now I’m left without an understanding.
I’d like to follow up a bit on your comment that “It’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.” I was not familiar with this fact. Could you discuss this in more detail, or provide a reference that would help me follow this up? Quite interesting.
I didn’t mean to imply that you’d disagree with cross-validation in general. The issue at hand is whether the standard deviation of CV errors is useful or not. I can see two reasons for why one can be unhappy about it:
a) It can happen that you get accuracy of 0.99 +- 0.03. What could that mean? The standard deviation is a summary. If you provide a summary consisting of the first two moments, it does not mean that you believe in the Gaussian model – of course those statistics are not sufficient. It is a summary that roughly describes the variance of the classifier, inasmuch that the mean accuracy indicates its bias.
b) The instances in a training and test set are not i.i.d. Yes, but the above summary relates to the question: “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?” This is a different question than “Given a randomly chosen instance, what will be the classifier’s expected accuracy?”
Several people have a problem with b) and use bootstrap instead of cross-validation in bias/variance analysis. Still, I don’t see a problem with the formulation, if one doesn’t attempt to perceive CV as an approximation to making statements about i.i.d. samples.
rif – see today’s post under “Examples”.
Aleks, I regard the 0.99 +/- 0.3 issue as a symptom that the wrong statistics are being used (i.e. assuming gaussianity on obviously non-gaussian draws).
I’m not particularly interested in “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?†because I generally think the goal of learning is doing well on future examples. Why should I care about this?
Reporting 0.99 +- 0.03 does not imply that one who wrote it believes that the distribution is Gaussian. Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99? Anyone surely knows that the classification accuracy cannot be more than 1.0, it would be most arrogant to assume such ignorance.
CV is the de facto standard method of evaluating classifiers, and many people trust the results that come out of this. Even if I might not like this approach, it is a standard, it’s an experimental bottom line. “Future examples” are something you don’t have, something you can only make assumptions about. Cross-validation and learning curves employ the training data as to empirically demonstrate the stability and convergence of the learning algorithm on what effectively *is* future data for the algorithm, under the weak assumption of permutability of the training data. Permutability is a weaker assumption than iid. My main problem with most applications of CV is that people don’t replicate the cross-validation on multiple assignments to folds, something that’s been pointed out quite nicely by, e.g.,
Estimating Replicability of Classifier Learning Experiments. ICML, 2004.
The problem with LOO is that you *cannot* perform multiple replications.
If your assumptions grow from iid, you shouldn’t use cross-validation, it’s a) not solving your problem, and b) you could get better results with an evaluation method that assumes more. It is unfair to criticize CV on these grounds. One can grow a whole different breed of statistics based on permutability and training/test splitting.
Reporting 0.99 +- 0.03 does mean that the inappropriate statistics are being used.
I am not trying to claim anything about the belief of the person making the application (and certainly not trying to be arrogant).
I have a problem with reporting the +/- 0.03. It seems that it has no interesting interpretation, and the obvious statistical interpretation is simply wrong.
The standard statistical “meaning” of 0.99 +- 0.03 is a confidence interval about an observation. A confidence interval [lower_bound(observation), upper_bound(observation)] has the property that, subject to your assumptions, it will contain the true value of some parameter with high probability over the random draw of the observation. The parameter I care about is the accuracy, the probability that the classifier is correct. Since the true error rate can not go above 1, this confidence interval must be constructed with respect to the wrong assumptions about the observation generating process. This isn’t that damning though – what’s really hard to swallow is that this method routinely results in intervals which are much narrower than the standard statistical interpretation would suggest. In other words, it generates overconfidence.
> Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99?
Absolutely. 0.99 can be interpreted as an unbiased monte carlo estimate of the “true” accuracy. I do not have an interpretation of 0.03, and the obvious interpretations are misleading due to nongaussianity and nonindependence in the basic process. Using this obvious interpretation routinely leads to overconfidence which is what this post was about.
I don’t regard the distinction between “permutable” and “independent” as significant here, because DeFinetti’s theorem says that all exchangeable (i.e. permutable) sequences can be thought of as i.i.d. samples conditioned on the draw of a hidden random variable. We do not care what the value of this hidden random variable is because a good confidence interval for accuracy works no matter what the datageneration process is. Consequently, the ‘different breed’ you speak of will end up being the same breed.
Many people use cross validation in a way that I don’t disagree with. For example, tuning parameters might be reasonable. I don’t even have a problem with using cross validation error to report performance (except when this creates a subtle instance of “reproblem”). What seems unreasonable is making confidence interval-like statements subject to known-wrong assumptions. This seems especially unreasonable when there are simple alternatives which don’t make known-wrong assumptions.
I think you are correct: many other people (I would not say it’s quite “the” standard) try to compute (and report) confidence interval-like summaries. I think it’s harmful to do so because of the routine overconfidence this creates.
rif — Another reason LOO CV is bad because it asymptotically suboptimal. For example if you use Leave One Out cross-validation for feature selection, you might end up selecting suboptimal subset, even with infinite training sample. Te neural-nets FAQ talks about it: http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html
Experimentally, Ronny Kohavi and Breiman found independently that 10 is the best number of folds for CV.
The FAQ says “cross-validation is markedly superior [to split sample validation] for small data sets; this fact is demonstrated dramatically by Goutte (1997)”. (google scholar has the paper), but I’m not sure their conclusions extend beyond their Gaussian synthetic data.
I agree with you regarding the inappropriateness of +- notation, and I also agree about general overconfidence of confidence intervals. Over here it says: “LTCM’s loss in August 1998 was a -10.5 sigma-event on the firm’s risk model, and a -14 sigma-event in terms of the actual previous price movements. Sometimes overfitting is very expensive
LTCM “lost” quite a few hundred million US$ (“lost” — financial transactions are largely a zero-sum game).
What if I’d had written 0.99(0.03), without implying that 0.03 is a confidence interval (because it is not)? It is quite rare in statistics to provide confidence intervals – usually one provides either the standard deviation of the distribution or the standard error of the estimate of the mean. Still, I consider the 0.03 a very useful piece of information, and I’m grateful to any author that is dilligent enough to provide some information about the variation in the performance. I’d reject a paper that only provides the mean for a small dataset, or didn’t perform multiply replicated experiments.
As much as I’m concerned this is The Right Way of dealing with confidence intervals of cross-validated loss is to perform multiple replications of cross-validation, and provide the scores at appropriate percentiles. My level of agreement with the binomial model is about at the same level as your agreement with the Gaussian model. Probability of error is meaningless: there are instances that you can almost certainly predict right, there are instances that you usually misclassify, and there are boundary instances where the predictions of the classifier vary, depending on the properties of the split. Treating all these groups as one would be misleading.
Regarding de Finetti, one has to be careful: there is a difference between finite and infinite exchangeability. The theorem goes from *infinite* exchangeability to iid. When you have an infinite set, there is no difference between forming a finite sample by sampling-with-replacement (bootstrap) versus sampling-without-replacement (cross-validation). When you have a finite set to sample from, it’s two different breeds.
As for assumptions, they are all wrong… But some are more agreeable than others.
0.99(0.03) is somewhat better, but I suspect people still interpret it as a confidence interval, even when you explicitly state that it is not.
Another problem is that I still don’t know why it’s interesting. You assert it’s very interesting, but can you explain why? How do you use it? Saying 0.99(0.03) seems semantically equivalent to saying “I achieved test set performance of 0.99 with variation 0.03 across all problems on the UCI database”, except not nearly as reassuring because the cross-validation folds do not encompass as much variation across real-world problems.
On Binomial vs. Gaussian model: the Binomial model (at least) has the advantage that it is not trivially disprovable.
On probability of error: it’s easy to criticize any small piece of information as incomplete. Nevertheless, we like small pieces of information because we can better understand and use them. “How often should I expect the classifier to be wrong in the future” seems like an interesting (if incomplete) piece of information to me. A more practical problem with your objection is that distinguishing between “always right”, “always wrong” and “sometimes right” examples is much harder, requiring more assumptions, than distinguishing error rate. Hence, such judgements will be more often wrong.
I had assumed you were interested in infinite exchangeability because we are generally interested in what the data tells us about future (not yet seen) events. Analysis which is only meaningful with respect to known labeled examples simply doesn’t interest me, in the same way that training error rate doesn’t interest me.
Why bother to make a paper, at all? Why don’t you code stuff and throw it into e-market? There are forums, newsgroups, and selected “peers” for things that are incomplete and require some discussion.
No, 0.99(0.03) means 0.99 classification error across 90:10 training-test splits on a single data set. It is quite meaningless to try to assume any kind of average classification error across different data sets.
Regarding probability of error, if it’s easy to acquire this kind of information, why not do it?
Infinite exchangeability does not apply to a finite population. What do you do when I gather *all* the 25 cows from the farm and measure them? You cannot pretend that there are infinitely many cows in the farm. You can, however, wonder about the number of cows (2,5, 10, 25?) you really need to measure to characterize all the 25 with reasonable precision.
I maintain that future is unknowable. Any kind of a statement regarding the performance of a particular classifier trained from data should always be seen as relative to the data set.
This still isn’t answering my question: Why is 0.03 useful? I can imagine using an error rate in decision making. I can imagine using a confidence interval on the error rate in decision making. But, I do not know how to use 0.03 in any useful way.
Note that 0.99 means 0.99 average classification error across multiple 90:10 splits. 0.99(0.03) should mean something else if 0.03 is useful.
Your comment on exchangeability makes more sense now. In this situation, what happens is that (basically) you trade using a Binomial distribution for a Hypergeometric distribution to analyze the number of errors on the portion of the set you haven’t seen. The trade Binomial->Hypergeometric doesn’t alter intuitions very much because the distributions are fairly similar (Binomial is a particlular limit of the Hypergeometric, etc…)
0.03 gives you an indication of reliability, stability of a classifier. This relates to the old bias/variance tradeoff. A short bias/variance reading list:
Neural networks and the bias/variance dilemma
Bias, Variance, and Arcing Classifiers
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
This still isn’t the answer I want. How is 0.03 useful? How do you use it?
The meaning of “stability” here seems odd. It seems to imply nothing about how the algorithm would perform for new problems or even for a new draw of the process generating the current training examples. Why do we care about this very limited notion of stability?
If you don’t mind a somewhat philosophical argument, examine the Figure 5 in Modelling Modelled. The NBC becomes highly stable beyond 150 instances. On the other hand, C4.5 has a higher average utility, but also a greater variation in its utility on the test set. Is it meaningful to compare both methods when the training set consists of ~100 instances? The difference in expected utility is negligible in comparison to the total amount of variation in performance.
This still isn’t answering my question. How and why do you use 0.03? There should be a simple answer to this, just like there are simple answeres for 0.99 and for confidence intervals about 0.99.
(I don’t want to spend time debating what is and is not “meaningful”, because that seems to vague.)
(0.03) indicates how much the classification accuracy is affected by the choice of the training data across the experiments. It quantifies the variance of the learned model. It describes that the estimate of classification accuracy across test sets of a certain size is not a number, it is a distribution.
I get my distribution of expected classification accuracy through sampling, and the only assumption is the fixed choice of the relative size of the training and test set. The purpose of (0.03) is to stress that the classification accuracy estimate depends on the assignment of instances to training or test set. You get your confidence interval starting from an arbitrary point estimate “0.99” along with a very strong binomial assumption, one that is invalidated by the above sampling experiments. It’s a simple answer alright, but a very dubious set of assumptions.
By now, I’ve listed sufficiently many papers that attempt to justify the bias/variance problem, and the purpose of (0.03) should be apparent in the context of this problem. Do you have a good reason for disagreeing with with the whole issue of bias/variance decomposition?
I know what (0.03) indicates, but this still doesn’t answer my question. How do we _use_ it? How is this information supposed to affect the choices that we make? The central question is whether or not (0.03) is relevant to decision making, and I don’t yet see that relevance.
“Binomial distribution” is not the assumption. Instead, it is the implication. The assumption is iid samples. This assumption is not always true, but none of the experiments in the ‘modelling modeled’ reference seem to be the sort which disprove the correctness of the assumption. In particular, cutting up the data in several different ways and learning different classifiers with different observed test error rates cannot disprove the independence assumption.
This reminds me of Lance’s post on calibrating weather prediction numbers. The weatherman tells us that (subjective) probability of rain tomorrow is 0.8 How do (should) we use that? Now suppose we know something about the prior he used to come up with the 0.8 estimate. Does that change the way we use the number?
Re: Yaroslav – Yes, if the prior doesn’t match our own prior, we can squeeze out the update and update *our* prior.
Re: John – If you accept the bias/variance issue, then (0.03) is interesting therefore intrinsically useful
I guess you don’t buy this. It concerns the estimation of risk, second-order probability (probability-of-probability), etc. The issue is that you cannot characterize the error rate reliably, and must therefore use a probability distribution. This is the same pattern as with introducing error rate because you cannot say whether a a classifier is always correct or always wrong.
A more practical utility is comparing two classifiers in two cases. In one case, the classifier A gets the classification accuracy of 0.88(0.31) and B gets 0.90(0.40). What probability would you assign to the statement “A is better than B?” in the absence of any other information? Now consider another experiment, where you get 0.88(0.01) for A and 0.90(0.01) for B.
Why would I want to assign a probability to “A is better than B”? How would you even do that given this information? And what does “better” mean?
a) What is the definition you use to do model selection? b) Any assignment is based upon a particular data set. c) “better” – lower aggregate loss on the test set.
a) I am generally inclined to avoid model selection because it is a source of overfitting. I would generally rather make a weighted integration of predictions. If pressed for computational reasons, I might choose the classifier with the smallest cross validation or validation set error rate.
I still don’t understand why you want to assign a probability.
b) I don’t understand your response. You give examples of 0.88(0.01) and 0.90(0.01). How do you use the 0.01 to decide?
c) I agree with your definition of better, as long as the test set is not involved in the cross validation process.
Interesting! Now I understand: all the stuff I’ve been talking about in this thread is very much about the tools and tricks in order to do model selection. But you dislike model selection, so obviously these tools and tricks may indeed seem useless.
a) If you have to make a choice, how easy is it for you to then state that A is better than B? It’s very rare that A would always be better than B. Instead, it may usually be better. Probability captures the uncertainty inherent to making such a choice. The probability of 0.9 means that in 90% of the test batches, A will be better.
b) With A:0.88(0.01) vs B:0.90(0.01), B will almost always be better than A. With A:0.88(0.1) vs B:0.90(0.1), we can’t really say which one will be better, and a choice could be arbitrary.
c) OK, but assume you have a certain batch of the data. That’s all you have. What do you do? Create a single test/train split, or create a bunch of them and ‘integrate out’ the dependence of your estimate on the particular choice?
Regarding the purpose of model selection. I’m sometimes working with experts, e.g. MD’s, who gathered the data and want to see the model. I train SVM, I train classification trees, I train NBC, I train many other things. Eventually, I would like to give them a single nicely presented model. They cannot evaluate or teach this ensemble of models. They won’t get insights from an overly complex model, they need something simpler, something they can teach/give to their ambulance staff to make decisions. So the nitty-gritty reality of practical machine learning has quite an explicit model complexity cost.
And one way of dealing with model complexity is model selection. It’s cold and brutal, but it gets the job done. The above probability is a way of quantifying how unjustified or arbitrary it is in a particular case. If it’s too brutal and if the models are making independent errors, then one can think about how to approximate or present the ensemble. Of course, I’d want to hand the experts the full Bayesian posterior, but how do I print it out on an A4 sheet of paper so that the expert can compare it to her intuition and experience?
Of course, I’m not saying that everyone should be concerned about model complexity and presentability. I am just trying to justify its importance to applied data analysis.
I understand that some form of predictor simplification/model selection is sometimes necessary.
a) I still don’t understand why you want to assign a probability to one being better than another. If we accept that model selection/simplification must be done, then it seems like you must make a hard choice. Why are probabilities required?
b) The reasoning about B and A does not hold on future data in general (and I am not interested in examples where we have already measured the label). In particular, I can give you learning algorithm/problem pairs in which there is a very good chance you will observe something which looks like a significant difference over cross validation folds, but which is not significant. The extreme example mentioned in this post shows you can get 1.00(0.00) and 0.00(0.00) for two algorithms producing classifiers with the same error rate.
c) If I thought there was any chance of a time ordering in the data, I would using a single train/test split with later things in the test set. I might also be tempted to play with “progressive validation” (although that’s much less standard). If there was obviously no time dependence, I might use k-fold cross validation (with _small_ k) and consider the average error rate a reasonable predictor of future performance. If I wanted to know roughly how well I might reasonably expect to do in the future and thought the data was i.i.d. (or effectively so), I would use the test set bound.
a) I consider 10-fold cross-validation to be a series of 10 experiments. For each of these experiments, we obtain a particular error rate. For a particular experiment, A might be better than B, but for a different experiment B would be better than A. Both probability and the standard deviations are ways of modelling the uncertainty that comes with this. If I cannot make a sure choice, and if modelling uncertainty is not too expensive, why not model it?
b) Any fixed method can be defeated by an adaptive adversary. I’m looking for a sensible evaluation protocol that will discount both overfitting and underfitting, and I realize that nothing is perfect.
c) I agree with your suggestions, especially with the choice of a small ‘k’. Still, I would stress that cross-validation is to be replicated multiple times, with several different permutations of the fold-assignment vector. Otherwise, the results are excessively dependent on a particular assignment to folds. If something affects your results, and if you are unsure about it, then you should not keep it fixed, but vary it.
a) I consider the notion that 10-fold cross validation is 10 experiments very misleading, because there can exist very strong dependencies between the 10 “experiments”. It’s like computing the average and standard deviations of the wheel locations of race car #1 and race car #2. These simply aren’t independent, and so the amount of evidence they provide towards “race car #1 is better than race car #2″ is essentially the same as the amount of evidence given by “race car #1 is in front of race car #2″.
b) Pleading “but nothing works in general” is not convincing to me. In the extreme, this argument can be used to justify anything. There are some things which are more robust than other things, and it seems obvious that we should prefer the more robust things. If you use confidence intervals, this nasty example will not result in nonsense numbers, as it does with the empirical variance approach.
You may try to counterclaim that there are examples where confidence intervals fail, but the empirical variance approach works. If so, state them. If not, the confidence interval approach at least provides something reasonable subject to a fairly intuitive assumption. No such statement holds for the empirical variance approach.
c) I generally agree, as time allows.
I agree about b), but continue to disagree about a). The argument behind it is somewhat intricate. We’re estimating something random with a non-random set of experiments. Let me pose a small problem/analogy: if you wanted to use monte carlo sampling to estimate the area of a certain shape in 2D, but you can only take 10 samples, would you draw these samples purely at random? You would not, because you would risk the chance that you’d sample the same point twice, and would gain no information. Cross-validation is a bit like that: it tries to diversify the samples in order to get a better estimate with fewer samples. Does it make sense?
No, it does not. Cross validation makes samples which are (in analogy) more likely to be the same than independent samples. That’s why you can get the 1.00(0.00) or 0.00(0.00) behavior.
Back to this tar baby
I understand your concern, but it is inherent to *sampling without replacement* of instances as contrasted to *sampling with replacement* of instances. I was not arguing bootstrap or iid versus training/test split or cross-validation. I was arguing for cross-validation compared to random splitting into the training and test set.
It’s quite clear that i.i.d. is often incompatible with sampling without replacement, and I can demonstrate this experimentally. In some cases, i.i.d. is appropriate (large populations, random sampling), and in other cases splitting is appropriate (finite populations, exhaustive or stratified sampling). These two stances should be kept apart and not mixed, as seems to be the fashion. What should be a challenge is to study learning in the latter case.
I don’t understand what is meant by “incompatible” here.
Assuming m independent samples, what we know (detailed here) is that K-fold cross validation has a smaller variance, skew, or other higher order moment then a random train/test split with the test set of size m/K. We do not and cannot (fully) know how much smaller this variance is. There exist examples where K-fold cross validation has the same behavior as a random train/test split.
If you want to argue that cross-validation is a good idea because it removes variance, I can understand that. If you want to argue that the individual runs with different held out folds are experiments, I disagree. This really is like averaging the position of wheels on a race car. It reduces variance (i.e. doesn’t let a race car with a missing wheel win), but it is still only one experiment (i.e. one race). If you want more experiments, you should not share examples between runs of the learning algorithm.
Incompatible means that assuming i.i.d. within the classifier will penalize you if the classifier is evaluated using cross-validation: the classifier is not as confident as it can afford to be. I’m not arguing that CV is better, I’m just arguing that it’s different. I try to be agnostic with respect to evaluation protocols, and adapt to the problem at hand. CV tests some things, bootstrap other things, each method has its pathologies, but advocating a single individual train/test split is complete rubbish unless you’re in highly cost-constrained adversial situation.
But now I’ll play the devil’s advocate again. Assume that I’m training on 10% and testing on 90% of data in “-10″-fold CV. Yes, the experiments are not independent. Why should they be? Why shouldn’t I exhaustively test all the tires of the car in four *dependent* experiments? Why shouldn’t I test the blood pressure of every patient just once, even if this makes my experiments dependent? Why shouldn’t I hold out for validation each and every choice of 10% of instances? Why is having this kind of dependence any less silly than sampling the *same* tire multiple times in order to keep the samplings “independent”? Would it be less silly than sampling just one tire and compute a bound based on that single measurement, as any additional measure could be dependent? Why is using a Gaussian to model the heights of *all* the players in a basketball team silly, even if the samples are not independent?
The notion that “advocating a single individual train/test split is complete rubish except in a cost constrained adversarial situation” is rubbish. As an example, suppose you have data from wall street and are trying to predict stock performance. This data is cheap and plentiful, but the notion of using cross validation is simply insane due to the “survivor effect”: future nonzero stock price is a strong and unfair predictor of past stock price. If you try to use cross validation, you will simply solve the wrong problem.
What’s happening here is that cross validation relies upon identicality of data in a far more essential manner than just having a training set and a test set. It is essential to understand this in considering methods for looking at your performance.
For your second point, I agree with the idea of reducing variance via cross validation (see second paragraph of comment 42) when the data is IID. What I disagree with is making confidence interval-like statements about the error rate based upon these nonindependent tests. If you want to know that one race car is better than another, you run them both on different tracks and observe the outcome. You don’t average over their wheel positions in one race and pretend that each wheel position represents a different race.
Well, of course neither cross-validation nor bootstrap makes sense when the assumption of instance exchangeability is clearly not justified. It was very funny to see R. Kalman make this mistake in http://www.pnas.org/cgi/content/abstract/101/38/13709/ – a journalist noticed this and wrote a pretty devastating paper on why peer review is important. My comment on “rubbish” was in the context of the validity of instance exchangeability, of course.
Regarding your note on “reducing variance”: I believe that you’re trying to find some benefit of cross-validation in the context of IID. Although you might do that, the crux of my message is that finite exchangeability (FEX) exercised by CV is different from infinite exchangeability (iid) exercised by bootstrap. Finite exchangeability has value on its own, not just as an approximation to infinite exchangeability. In fact, I’d consider finite exchangeability as primary, and infinite exchangeability as n approximation to it. I guess that your definition of confidence interval is based upon IID, so if I do “confidence intervals” based on FEX, it may look wrong.
I hope that I understand you correctly. What I’m suggesting is to allow for and appreciate the assumption of finite exchangeability, and build theory that accomodates for it. Until then, it would be unfair to dismiss empirical work assuming FEX in some places just because most theory work assumes IID.
I’ve worked on FEX confidence intervals here. The details change, but not the basic message w.r.t. the IID assumption.
The basic issue we seem to be debating, regardless of assumptions about the world, is whether we should think of the different runs of cross validation as “different” experiments. I know of no reasonable assumption under which the answer is “yes” and many reasonable assumptions under which the answer is “no”. For this conversation to be further constructive, I think you need to (a) state a theorem and (b) argue that it is relevant.
[...] Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here: including “reprobleming”, “data set selection”, and probably “overfitting by review”. It isn’t too surprising to observe this: when the testers of a drug have $109 or more riding on the outcome the temptation to make the outcome “right” is extreme. [...]
Useful list. Should be made required reading for students of ML.