- 为了在未来的人工智能世界中取得成功,学生们必须学习人类写作的优点
睿邸管家
澳大利亚各地的学生在新学年开始使用铅笔、钢笔和键盘学习写字。在工作场所,机器也在学习写作,如此有效,几年之内,它们可能会写得比人类更好。有时它们已经做到了,就像Grammarly这样的应用程序所展示的那样。当然,人类现在的日常写作可能很快就会由具有人工智能(AI)的机器来完成。手机和电子邮件软件常用的预测文本是无数人每天都在使用的一种人工智能写作形式。据AI行业研究机构称,到2022年,人工智能及
- Anaconda 和 Miniconda:功能详解与选择建议
古月฿
python入门pythonconda
Anaconda和Miniconda详细介绍一、Anaconda的详细介绍1.什么是Anaconda?Anaconda是一个开源的包管理和环境管理工具,在数据科学、机器学习以及科学计算领域发挥着关键作用。它以Python和R语言为基础,为用户精心准备了大量预装库和工具,极大地缩短了搭建数据科学环境的时间。对于那些想要快速开展数据分析、模型训练等工作的人员来说,Anaconda就像是一个一站式的“数
- 人工智能应用研究快讯 2021-11-30
峰谷皆平
[HTML]ArtificialIntelligenceforSkinCancerDetection:ScopingReviewATakiddin,JSchneider,YYang,AAbd-Alrazaq...JournalofMedicalInternet...,2021ABSTACT:Background:Skincanceristhemostcommoncancertypeaffectin
- 机器学习必备数学与编程指南:从入门到精通
a小胡哦
机器学习基础机器学习人工智能
一、机器学习核心数学基础1.线性代数(神经网络的基础)必须掌握:矩阵运算(乘法、转置、逆)向量空间与线性变换特征值分解与奇异值分解(SVD)为什么重要:神经网络本质就是矩阵运算学习技巧:用NumPy实际操作矩阵运算2.概率与统计(模型评估的关键)核心概念:条件概率与贝叶斯定理概率分布(正态、泊松、伯努利)假设检验与p值应用场景:朴素贝叶斯、A/B测试3.微积分(优化算法的基础)重点掌握:导数与偏导
- 《UNIX网络编程卷1:套接字联网API》第8章:基本UDP套接字编程深度解析
《UNIX网络编程卷1:套接字联网API》第8章:基本UDP套接字编程深度解析(8000字图文实战)一、UDP协议核心特性与编程模型1.1UDP协议设计哲学UDP(UserDatagramProtocol)是面向无连接的传输层协议(图1),其核心特征包括:无连接通信:无需三次握手,直接发送数据报尽最大努力交付:不保证可靠性、不维护连接状态报文边界保留:接收方读取的数据与发送方写入完全一致低开销高效
- 从振动信号到精准预警:AI 如何重塑工业设备健康管理?
缘华工业智维
人工智能计算机视觉边缘计算信息与通信
在智能制造浪潮席卷全球的当下,工业生产正经历着从传统模式向智能化、数字化转型的深刻变革。在这场变革中,AI驱动的振动分析技术犹如一颗璀璨新星,成为工业设备可靠运行的“健康卫士”。它通过在设备关键部位部署振动传感器,如同医生为患者听诊般实时采集设备运行时的振动信号,再借助强大的人工智能算法对这些“工业脉搏”进行深度解析,从而实现对工业设备从故障预警到寿命预测的全周期精准守护。一、AI振动分析:设备状
- 免费编程课程大汇总:从入门到精通的一站式资源
大力出奇迹985
人工智能大数据
在数字化时代,编程已成为一项至关重要的技能,无论是为了职业发展还是个人兴趣,学习编程都极具价值。本文精心汇总了丰富的免费编程课程资源,涵盖从基础入门到精通的各个阶段。通过全面介绍如Coursera、edX等在线学习平台,Codecademy、freeCodeCamp等交互式学习网站,以及B站、网易云课堂等视频课程平台的免费课程,为编程学习者提供了一站式的资源指南,帮助读者轻松开启编程学习之旅,逐步
- AI 生成虚拟宠物:24 小时陪你聊天解闷
大力出奇迹985
人工智能宠物
本文围绕AI生成虚拟宠物展开,介绍这类依托人工智能技术诞生的虚拟伙伴,能实现24小时不间断陪伴聊天,为人们解闷。文中详细阐述其技术基础,包括自然语言处理、机器学习等;分析多样功能,如个性化互动、情绪回应等;探讨在独居人群、压力大者等不同群体中的应用场景,最后总结其为人们生活带来的积极影响及未来发展潜力,展现AI虚拟宠物在陪伴领域的独特价值。一、AI生成虚拟宠物的诞生背景与技术基石在快节奏的现代社会
- 基于Python的AI健康助手:开发与部署全攻略
AI算力网络与通信
AI算力网络与通信原理AI人工智能大数据架构python人工智能开发语言ai
基于Python的AI健康助手:开发与部署全攻略关键词:Python、AI健康助手、机器学习、自然语言处理、Flask、部署、健康管理摘要:本文将详细介绍如何使用Python开发一个AI健康助手,从需求分析、技术选型到核心功能实现,再到最终部署上线的完整过程。我们将使用自然语言处理技术理解用户健康咨询,通过机器学习模型提供个性化建议,并展示如何用Flask框架构建Web应用接口。文章包含大量实际代
- GPT-4 在 AIGC 中的微调技巧:让模型更懂你的需求
AIGC应用创新大全
AI人工智能与大数据应用开发MCP&Agent云算力网络AIGCai
GPT-4在AIGC中的微调技巧:让模型更懂你的需求关键词:GPT-4、AIGC、模型微调、监督学习、指令优化、过拟合预防、个性化生成摘要:AIGC(人工智能生成内容)正在重塑内容创作行业,但通用的GPT-4模型可能无法精准匹配你的垂直需求——比如写电商爆款文案时总“跑题”,或生成技术文档时专业术语不够。本文将用“教小朋友学画画”的通俗类比,从微调的底层逻辑讲到实战技巧,带你掌握让GPT-4“更懂
- AIGC内容生成实战:如何用ChatGPT+DALL·E打造高转化内容
AI大模型应用工坊
AI大模型开发实战AIGCchatgptai
AIGC内容生成实战:如何用ChatGPT+DALL·E打造高转化内容关键词:AIGC、ChatGPT、DALL·E、内容生成、高转化营销、多模态协同、提示词工程摘要:随着AIGC(人工智能生成内容)技术的爆发式发展,ChatGPT(文本生成)与DALL·E(图像生成)的组合已成为内容创作领域的“黄金搭档”。本文将深度解析二者的协同原理,结合实战案例演示从需求分析到内容落地的全流程,并揭示提升内容
- 数据分析领域中AI人工智能的发展前景展望
AI大模型应用工坊
AI大模型开发实战数据分析人工智能数据挖掘ai
数据分析领域中AI人工智能的发展前景展望关键词:数据分析、人工智能、机器学习、深度学习、数据挖掘、预测分析、自动化摘要:本文深入探讨了人工智能在数据分析领域的发展现状和未来趋势。我们将从核心技术原理出发,分析AI如何改变传统数据分析范式,详细讲解机器学习算法在数据分析中的应用,并通过实际案例展示AI驱动的数据分析解决方案。文章还将探讨行业应用场景、工具生态以及未来发展面临的挑战和机遇,为数据分析师
- AI人工智能中的数据挖掘:提升智能决策能力
AI人工智能中的数据挖掘:提升智能决策能力关键词:数据挖掘、人工智能、机器学习、智能决策、数据分析、特征工程、模型优化摘要:本文深入探讨了数据挖掘在人工智能领域中的核心作用,重点分析了如何通过数据挖掘技术提升智能决策能力。文章从基础概念出发,详细介绍了数据挖掘的关键算法、数学模型和实际应用场景,并通过Python代码示例展示了数据挖掘的全流程。最后,文章展望了数据挖掘技术的未来发展趋势和面临的挑战
- 数据中台中的数据科学工作台:Jupyter集成方案
AI大数据智能洞察
大数据与AI人工智能jupyter信息可视化ideai
数据中台中的数据科学工作台:Jupyter集成方案关键词:数据中台、数据科学工作台、JupyterNotebook、数据科学、机器学习、数据可视化、协作开发摘要:本文深入探讨了在数据中台架构中集成JupyterNotebook作为数据科学工作台的完整解决方案。我们将从数据中台的基本概念出发,详细分析Jupyter在数据科学工作流中的核心作用,介绍多种集成方案和技术实现细节,并通过实际案例展示如何构
- 2018年中南大学中英翻译
某翁
参考:20180827235856533.jpg【1】机器学习理论表明,机器学习算法能从有限个训练集样本上得到较好的泛化【1】Machinelearningtheoryshowsthatmachinelearningalgorithmcangeneralizewellfromfinitetrainingsetsampleslimited有限的infinite无限的【2】这似乎违背了一些基本的逻辑准
- 【三桥君】MCP中台,究竟如何实现多模型、多渠道、多环境的统一管控?如何以MCP为核心设计AI应用架构?
三桥君
《三桥君MCP落地方法论》《三桥君AI大模型落地方法论》#《三桥君AI产品方法论》人工智能AI产品经理MCPAPI三桥君系统架构llama
你好,我是✨三桥君✨本文介绍>>一、引言随着人工智能技术的快速发展,越来越多的企业开始引入大语言模型(LLM)以提升用户体验和运营效率。然而,如何高效、稳定地将这些AI能力落地到生产环境呢?传统的系统架构往往难以应对AI应用的高并发、低延迟和灵活扩展需求,因此,从整体架构角度设计AI应用架构显得尤为重要。本文三桥君将深入探讨以MCP为核心的AI应用架构,并分析多种部署方式的优劣势,为企业在AI落地
- 深入理解卷积神经网络(CNN)与循环神经网络(RNN)
CodeJourney.
cnnrnn人工智能
在当今的人工智能领域,神经网络无疑是最为璀璨的明珠之一。而卷积神经网络(ConvolutionalNeuralNetworks,CNN)和循环神经网络(RecurrentNeuralNetworks,RNN)作为神经网络家族中的重要成员,各自有着独特的架构和强大的功能,广泛应用于众多领域。本文将深入探讨这两种神经网络的原理、特点以及应用场景,为对深度学习感兴趣的读者提供全面的知识讲解。一、卷积神经
- 今年校招竞争真激烈
12_05
程序员满大街,都要找不到工作了。即使人工智能满大街,我也后悔当初没学机器学习,后悔当初没学Java。C++真难找工作。难道毕了业就失业吗?好担心!
- 时序预测 | MATLAB实现贝叶斯优化CNN-GRU时间序列预测(股票价格预测)
Matlab机器学习之心
matlabcnngru
✅作者简介:热爱数据处理、数学建模、仿真设计、论文复现、算法创新的Matlab仿真开发者。更多Matlab代码及仿真咨询内容点击主页:Matlab科研工作室个人信条:格物致知,期刊达人。内容介绍股票价格预测一直是金融领域一个极具挑战性的课题。其内在的非线性、随机性和复杂性使得传统的预测方法难以取得令人满意的效果。近年来,深度学习技术,特别是卷积神经网络(CNN)和门控循环单元(GRU)的结合,为时
- 时序预测 | MATLAB实现BO-CNN-GRU贝叶斯优化卷积门控循环单元时间序列预测
Matlab算法改进和仿真定制工程师
matlabcnngru
✅作者简介:热爱数据处理、数学建模、算法创新的Matlab仿真开发者。更多Matlab代码及仿真咨询内容点击:Matlab科研工作室个人信条:格物致知。内容介绍时间序列预测在各个领域都具有重要的应用价值,例如金融市场预测、气象预报、交通流量预测等。准确地预测未来趋势对于决策制定至关重要。近年来,深度学习技术在时间序列预测领域取得了显著进展,其中卷积神经网络(CNN)和门控循环单元(GRU)由于其强
- Python Gradio:实现交互式图像编辑
PythonAI编程架构实战家
Python编程之道python开发语言ai
PythonGradio:实现交互式图像编辑关键词:Python,Gradio,交互式图像编辑,计算机视觉,深度学习,图像处理,Web应用摘要:本文将深入探讨如何使用Python的Gradio库构建交互式图像编辑应用。我们将从基础概念开始,逐步介绍Gradio的核心功能,并通过实际代码示例展示如何实现各种图像处理功能。文章将涵盖图像滤镜应用、对象检测、风格迁移等高级功能,同时提供完整的项目实战案例
- 基于随机森林的白酒风味智能分类系统:从数据到洞察的完整实践
笙囧同学
python
作者:笙囧同学|中科院计算机大模型方向硕士|全栈开发爱好者座右铭:偷懒是人生进步的阶梯联系方式:
[email protected]各大平台账号/公众号:笙囧同学前言大家好,我是笙囧同学!今天给大家分享一个超级有趣且技术含量爆表的项目——白酒风味智能分类系统。作为一个既爱技术又爱美酒的程序员,我花了大量时间研究如何用机器学习的方法来"品酒",让AI帮我们识别白酒的风味特征。这个项目融合了机器学习、数
- 如何运用深度学习打造高效AI人工智能系统
AI智能探索者
AIAgent智能体开发实战人工智能深度学习ai
如何运用深度学习打造高效AI人工智能系统关键词:深度学习、AI系统、神经网络、模型优化、实战开发摘要:本文将从深度学习的核心概念出发,结合生活实例和代码实战,系统讲解如何构建高效AI系统。我们会拆解数据准备、模型设计、训练优化、部署落地的全流程,揭秘“数据-模型-训练-推理”的协同机制,并通过具体案例演示从0到1开发AI系统的关键技巧,帮助开发者掌握打造高效AI系统的底层逻辑。背景介绍目的和范围在
- 非欧空间计算加速:图神经网络与微分几何计算的GPU优化(流形数据的内存布局优化策略)
九章云极AladdinEdu
空间计算神经网络人工智能gpu算力算法java开发语言
一、非欧空间计算的革命性意义与核心挑战在三维形状分析、社交网络建模、分子动力学模拟等领域,非欧几里得空间数据(流形数据)的处理正推动人工智能技术向更复杂的几何结构迈进。传统欧式空间优化方法在处理流形数据时面临根本性局限:黎曼度量导致距离计算失效、局部坐标系动态变化引发内存访问模式混乱、曲率变化影响并行计算效率。本文提出基于分块流形存储(BlockedManifoldStorage,BMS)与层次化
- Spring AI与机器学习:智能应用开发新范式
tmjpz04412
人工智能spring机器学习
SpringAI与机器学习的整合SpringAI是一个基于Spring生态的AI开发框架,旨在简化智能应用的开发流程。通过SpringAI,开发者可以快速集成机器学习模型,构建高效的智能应用。SpringAI支持多种机器学习库和框架,如TensorFlow、PyTorch和Scikit-learn,提供统一的API接口。SpringAI的核心优势在于其模块化设计和自动化配置。开发者无需关心复杂的依
- UCLAMP0501P.TCT SEMTECH:超低电容TVS二极管 0.25pF+20kV防护!
UCLAMP0501P.TCTSEMTECH:超低电容TVS一、产品简介UCLAMP0501P.TCT是SEMTECH最新推出的超低电容单通道TVS二极管,采用第五代硅雪崩技术,专为5G手机天线、IoT设备、超极本USB4接口设计。以0.25pF行业最低电容和20kV防护等级,成为高速信号保护的终极解决方案!二、五大颠覆性优势信号0损伤0.25pF超低电容(比头发丝细1000倍)支持40GbpsT
- RCLAMP0504S.TCT 升特半导体TVS二极管 无损传输+军工防护+纳米护甲 ESD防护芯片
深圳市尚想信息技术有限公司
ESD防护芯片SemtechUSB4车规电子AI硬件
RCLAMP0504S.TCTSemtechTVS二极管阵列一、产品简介RCLAMP0504S.TCT是Semtech新一代超低电容TVS二极管阵列,专为USB4、Thunderbolt™4、HDMI2.1等超高速接口打造!以0.3pF行业最低电容和20Gbps无损传输能力,成为高端电子设备的"隐形防护盾"!二、五大颠覆性优势信号0损耗0.3pF超低电容(比前代降低40%),支持20Gbps超高速
- RCLAMP0512TQTCT 升特半导体 TVS二极管 12通道全防护芯片 以太网/PLC控制/5G基站专用
RCLAMP0512TQTCTSemtech:12通道全防护TVS阵列一、产品简介RCLAMP0512TQTCT是Semtech最新推出的12通道超低电容TVS二极管阵列,专为工业以太网、PLC控制、5G基站等高干扰环境设计!凭借0.4pF超低电容+30kV浪涌防护能力,成为严苛环境下的"电路防弹衣"!二、六大核心优势军工级防护标准30kV/10kA浪涌防护(IEC61000-4-5Level4)
- 响应式编程实践:Spring Boot WebFlux构建高性能非阻塞服务
fanxbl957
Webspringboot后端java
博主介绍:Java、Python、js全栈开发“多面手”,精通多种编程语言和技术,痴迷于人工智能领域。秉持着对技术的热爱与执着,持续探索创新,愿在此分享交流和学习,与大家共进步。全栈开发环境搭建运行攻略:多语言一站式指南(环境搭建+运行+调试+发布+保姆级详解)感兴趣的可以先收藏起来,希望帮助更多的人响应式编程实践:SpringBootWebFlux构建高性能非阻塞服务一、引言在当今数字化时代,互
- RCLAMP2574N.TCT Semtech:超低钳位TVS二极管 0.5pF超低电容+±30kV超强防护
深圳市尚想信息技术有限公司
TVS二极管Semtech半导体工业以太网车载电子5G防护
RCLAMP2574N.TCTSemtech:超低钳位TVS阵列一、产品简介RCLAMP2574N.TCT是Semtech新一代多通道TVS二极管阵列,采用专利硅雪崩技术,专为千兆以太网、工业总线、汽车电子等严苛环境设计。以0.5pF超低电容和±30kV超强防护能力,成为高速接口的"防弹护甲"!二、五大核爆优势军工级防护±30kV接触放电(IEC61000-4-2Level4++)0.5ns极速响
- Spring4.1新特性——综述
jinnianshilongnian
spring 4.1
目录
Spring4.1新特性——综述
Spring4.1新特性——Spring核心部分及其他
Spring4.1新特性——Spring缓存框架增强
Spring4.1新特性——异步调用和事件机制的异常处理
Spring4.1新特性——数据库集成测试脚本初始化
Spring4.1新特性——Spring MVC增强
Spring4.1新特性——页面自动化测试框架Spring MVC T
- Schema与数据类型优化
annan211
数据结构mysql
目前商城的数据库设计真是一塌糊涂,表堆叠让人不忍直视,无脑的架构师,说了也不听。
在数据库设计之初,就应该仔细揣摩可能会有哪些查询,有没有更复杂的查询,而不是仅仅突出
很表面的业务需求,这样做会让你的数据库性能成倍提高,当然,丑陋的架构师是不会这样去考虑问题的。
选择优化的数据类型
1 更小的通常更好
更小的数据类型通常更快,因为他们占用更少的磁盘、内存和cpu缓存,
- 第一节 HTML概要学习
chenke
htmlWebcss
第一节 HTML概要学习
1. 什么是HTML
HTML是英文Hyper Text Mark-up Language(超文本标记语言)的缩写,它规定了自己的语法规则,用来表示比“文本”更丰富的意义,比如图片,表格,链接等。浏览器(IE,FireFox等)软件知道HTML语言的语法,可以用来查看HTML文档。目前互联网上的绝大部分网页都是使用HTML编写的。
打开记事本 输入一下内
- MyEclipse里部分习惯的更改
Array_06
eclipse
继续补充中----------------------
1.更改自己合适快捷键windows-->prefences-->java-->editor-->Content Assist-->
Activation triggers for java的右侧“.”就可以改变常用的快捷键
选中 Text
- 近一个月的面试总结
cugfy
面试
本文是在学习中的总结,欢迎转载但请注明出处:http://blog.csdn.net/pistolove/article/details/46753275
前言
打算换个工作,近一个月面试了不少的公司,下面将一些面试经验和思考分享给大家。另外校招也快要开始了,为在校的学生提供一些经验供参考,希望都能找到满意的工作。 
- HTML5一个小迷宫游戏
357029540
html5
通过《HTML5游戏开发》摘抄了一个小迷宫游戏,感觉还不错,可以画画,写字,把摘抄的代码放上来分享下,喜欢的同学可以拿来玩玩!
<html>
<head>
<title>创建运行迷宫</title>
<script type="text/javascript"
- 10步教你上传githib数据
张亚雄
git
官方的教学还有其他博客里教的都是给懂的人说得,对已我们这样对我大菜鸟只能这么来锻炼,下面先不玩什么深奥的,先暂时用着10步干净利索。等玩顺溜了再用其他的方法。
操作过程(查看本目录下有哪些文件NO.1)ls
(跳转到子目录NO.2)cd+空格+目录
(继续NO.3)ls
(匹配到子目录NO.4)cd+ 目录首写字母+tab键+(首写字母“直到你所用文件根就不再按TAB键了”)
(查看文件
- MongoDB常用操作命令大全
adminjun
mongodb操作命令
成功启动MongoDB后,再打开一个命令行窗口输入mongo,就可以进行数据库的一些操作。输入help可以看到基本操作命令,只是MongoDB没有创建数据库的命令,但有类似的命令 如:如果你想创建一个“myTest”的数据库,先运行use myTest命令,之后就做一些操作(如:db.createCollection('user')),这样就可以创建一个名叫“myTest”的数据库。
一
- bat调用jar包并传入多个参数
aijuans
下面的主程序是通过eclipse写的:
1.在Main函数接收bat文件传递的参数(String[] args)
如: String ip =args[0]; String user=args[1]; &nbs
- Java中对类的主动引用和被动引用
ayaoxinchao
java主动引用对类的引用被动引用类初始化
在Java代码中,有些类看上去初始化了,但其实没有。例如定义一定长度某一类型的数组,看上去数组中所有的元素已经被初始化,实际上一个都没有。对于类的初始化,虚拟机规范严格规定了只有对该类进行主动引用时,才会触发。而除此之外的所有引用方式称之为对类的被动引用,不会触发类的初始化。虚拟机规范严格地规定了有且仅有四种情况是对类的主动引用,即必须立即对类进行初始化。四种情况如下:1.遇到ne
- 导出数据库 提示 outfile disabled
BigBird2012
mysql
在windows控制台下,登陆mysql,备份数据库:
mysql>mysqldump -u root -p test test > D:\test.sql
使用命令 mysqldump 格式如下: mysqldump -u root -p *** DBNAME > E:\\test.sql。
注意:执行该命令的时候不要进入mysql的控制台再使用,这样会报
- Javascript 中的 && 和 ||
bijian1013
JavaScript&&||
准备两个对象用于下面的讨论
var alice = {
name: "alice",
toString: function () {
return this.name;
}
}
var smith = {
name: "smith",
- [Zookeeper学习笔记之四]Zookeeper Client Library会话重建
bit1129
zookeeper
为了说明问题,先来看个简单的示例代码:
package com.tom.zookeeper.book;
import com.tom.Host;
import org.apache.zookeeper.WatchedEvent;
import org.apache.zookeeper.ZooKeeper;
import org.apache.zookeeper.Wat
- 【Scala十一】Scala核心五:case模式匹配
bit1129
scala
package spark.examples.scala.grammars.caseclasses
object CaseClass_Test00 {
def simpleMatch(arg: Any) = arg match {
case v: Int => "This is an Int"
case v: (Int, String)
- 运维的一些面试题
yuxianhua
linux
1、Linux挂载Winodws共享文件夹
mount -t cifs //1.1.1.254/ok /var/tmp/share/ -o username=administrator,password=yourpass
或
mount -t cifs -o username=xxx,password=xxxx //1.1.1.1/a /win
- Java lang包-Boolean
BrokenDreams
boolean
Boolean类是Java中基本类型boolean的包装类。这个类比较简单,直接看源代码吧。
public final class Boolean implements java.io.Serializable,
- 读《研磨设计模式》-代码笔记-命令模式-Command
bylijinnan
java设计模式
声明: 本文只为方便我个人查阅和理解,详细的分析以及源代码请移步 原作者的博客http://chjavach.iteye.com/
import java.util.ArrayList;
import java.util.Collection;
import java.util.List;
/**
* GOF 在《设计模式》一书中阐述命令模式的意图:“将一个请求封装
- matlab下GPU编程笔记
cherishLC
matlab
不多说,直接上代码
gpuDevice % 查看系统中的gpu,,其中的DeviceSupported会给出matlab支持的GPU个数。
g=gpuDevice(1); %会清空 GPU 1中的所有数据,,将GPU1 设为当前GPU
reset(g) %也可以清空GPU中数据。
a=1;
a=gpuArray(a); %将a从CPU移到GPU中
onGP
- SVN安装过程
crabdave
SVN
SVN安装过程
subversion-1.6.12
./configure --prefix=/usr/local/subversion --with-apxs=/usr/local/apache2/bin/apxs --with-apr=/usr/local/apr --with-apr-util=/usr/local/apr --with-openssl=/
- sql 行列转换
daizj
sql行列转换行转列列转行
行转列的思想是通过case when 来实现
列转行的思想是通过union all 来实现
下面具体例子:
假设有张学生成绩表(tb)如下:
Name Subject Result
张三 语文 74
张三 数学 83
张三 物理 93
李四 语文 74
李四 数学 84
李四 物理 94
*/
/*
想变成
姓名 &
- MySQL--主从配置
dcj3sjt126com
mysql
linux下的mysql主从配置: 说明:由于MySQL不同版本之间的(二进制日志)binlog格式可能会不一样,因此最好的搭配组合是Master的MySQL版本和Slave的版本相同或者更低, Master的版本肯定不能高于Slave版本。(版本向下兼容)
mysql1 : 192.168.100.1 //master mysq
- 关于yii 数据库添加新字段之后model类的修改
dcj3sjt126com
Model
rules:
array('新字段','safe','on'=>'search')
1、array('新字段', 'safe')//这个如果是要用户输入的话,要加一下,
2、array('新字段', 'numerical'),//如果是数字的话
3、array('新字段', 'length', 'max'=>100),//如果是文本
1、2、3适当的最少要加一条,新字段才会被
- sublime text3 中文乱码解决
dyy_gusi
Sublime Text
sublime text3中文乱码解决
原因:缺少转换为UTF-8的插件
目的:安装ConvertToUTF8插件包
第一步:安装能自动安装插件的插件,百度“Codecs33”,然后按照步骤可以得到以下一段代码:
import urllib.request,os,hashlib; h = 'eb2297e1a458f27d836c04bb0cbaf282' + 'd0e7a30980927
- 概念了解:CGI,FastCGI,PHP-CGI与PHP-FPM
geeksun
PHP
CGI
CGI全称是“公共网关接口”(Common Gateway Interface),HTTP服务器与你的或其它机器上的程序进行“交谈”的一种工具,其程序须运行在网络服务器上。
CGI可以用任何一种语言编写,只要这种语言具有标准输入、输出和环境变量。如php,perl,tcl等。 FastCGI
FastCGI像是一个常驻(long-live)型的CGI,它可以一直执行着,只要激活后,不
- Git push 报错 "error: failed to push some refs to " 解决
hongtoushizi
git
Git push 报错 "error: failed to push some refs to " .
此问题出现的原因是:由于远程仓库中代码版本与本地不一致冲突导致的。
由于我在第一次git pull --rebase 代码后,准备push的时候,有别人往线上又提交了代码。所以出现此问题。
解决方案:
1: git pull
2:
- 第四章 Lua模块开发
jinnianshilongnian
nginxlua
在实际开发中,不可能把所有代码写到一个大而全的lua文件中,需要进行分模块开发;而且模块化是高性能Lua应用的关键。使用require第一次导入模块后,所有Nginx 进程全局共享模块的数据和代码,每个Worker进程需要时会得到此模块的一个副本(Copy-On-Write),即模块可以认为是每Worker进程共享而不是每Nginx Server共享;另外注意之前我们使用init_by_lua中初
- java.lang.reflect.Proxy
liyonghui160com
1.简介
Proxy 提供用于创建动态代理类和实例的静态方法
(1)动态代理类的属性
代理类是公共的、最终的,而不是抽象的
未指定代理类的非限定名称。但是,以字符串 "$Proxy" 开头的类名空间应该为代理类保留
代理类扩展 java.lang.reflect.Proxy
代理类会按同一顺序准确地实现其创建时指定的接口
- Java中getResourceAsStream的用法
pda158
java
1.Java中的getResourceAsStream有以下几种: 1. Class.getResourceAsStream(String path) : path 不以’/'开头时默认是从此类所在的包下取资源,以’/'开头则是从ClassPath根下获取。其只是通过path构造一个绝对路径,最终还是由ClassLoader获取资源。 2. Class.getClassLoader.get
- spring 包官方下载地址(非maven)
sinnk
spring
SPRING官方网站改版后,建议都是通过 Maven和Gradle下载,对不使用Maven和Gradle开发项目的,下载就非常麻烦,下给出Spring Framework jar官方直接下载路径:
http://repo.springsource.org/libs-release-local/org/springframework/spring/
s
- Oracle学习笔记(7) 开发PLSQL子程序和包
vipbooks
oraclesql编程
哈哈,清明节放假回去了一下,真是太好了,回家的感觉真好啊!现在又开始出差之旅了,又好久没有来了,今天继续Oracle的学习!
这是第七章的学习笔记,学习完第六章的动态SQL之后,开始要学习子程序和包的使用了……,希望大家能多给俺一些支持啊!
编程时使用的工具是PLSQL
Twice I’ve tried to realistically present the performance of the algorithm. Twice was my paper rejected because of “unfinished methods” or “disappointing results”. There’s a whole culture of “rounding-up”, and trying to do the evaluations fairly just gives you trouble. When fair evaluations get rejected and rounders-up pass through, what do you do?
Anonymous’s story is surely common.
On any given paper, there is an incentive to “cheat” with some of the above methods. This can be hard to resist when so much rides on a paper acceptance _and_ some of the above cheats are not easily detected. Nevertheless, it should be resisted because “cheating” of this sort inevitably fools you as well as others. Fooling yourself in research is a recipe for a career that goes nowhere. Your techniques simply won’t apply well to new problems, you won’t be able to tackle competitions, and ultimately you won’t even trust your own intuition, which is fatal in research.
My best advice for anonymous is to accept that life is difficult here. Spend extra time testing on many datasets rather than a few. Spend extra time thinking about what make a good algorithm, or not. Take the long view and note that, in the long run, the quantity of papers you write is not important, but rather their level of impact. Using a “cheat” very likely subverts long term impact.
How about an index of negative results in machine learning? There’s a Journal of Negative Results in other domains: Ecology & Evolutionary Biology, Biomedicine, and there is Journal of Articles in Support of the Null Hypothesis. A section on negative results in machine learning conferences? This kind of information is very useful in preventing people from taking pathways that lead nowhere: if one wants to classify an algorithm into good/bad, one certainly benefits from unexpectedly bad examples too, not just unexpectedly good examples.
I visited the workshop on negative results at NIPS 2002. My impression was that it did not work well.
The difficulty with negative results in machine learning is that they are too easy. For example, there are a plethora of ways to say that “learning is impossible (in the worst case)”. On the applied side, it’s still common for learning algorithms to not work on simple-seeming problems. In this situation, positive results (this works) are generally more valuable than negative results (this doesn’t work).
This discussion reminds of some interesting research on “anti-learning“, by Adam Kowalczyk. This research studies (empirically and theoretically) machine learning algorithms that yield good performance on the training set but worse than random performance on the independent test set.
Hmm, rereading this post. What do you mean by “brittle”? Why is mutual information brittle?
Standard deviation of loss across the CV folds is not a bad summary of variation in CV performance. I’m not sure one can just reject a paper where the authors bothered to disclose the variation, rather than just plopping out the average. Standard error carries some Gaussian assumptions, but it is still a valid summary. The distribution of loss is sometimes quite close to being Gaussian, too.
As for significance, I came up with the notion of CV-values that measure how often method A is better than method B in a randomly chosen fold of cross-validation replicated very many times.
What I mean by brittle: Suppose you have a box which takes some feature values as input and predicts some probability of label 1 as output. You are not allowed to open this box or determine how it works other than by this process of giving it inputs and observing outputs.
Let x be an input.
Let y be an output.
Assume (x,y) are drawn from a fixed but unknown distribution D.
Let p(x) be a prediction.
For classification error I(|y – p(x)| < 0.5) you can prove a theorem of the rough form:
forall D, with high probability over the draw of m examples independently from D,
expected classification error rate of the box with respect to D is bounded by a function of the observations.
What I mean by “brittle” is that no statement of this sort can be made for any unbounded loss (including log-loss which is integral to mutual information and entropy). You can of course open up the box and analyze its structure or make extra assumptions about D to get a similar but inherently more limited analysis.
The situation with leave-one-out cross validation is not so bad, but it’s still pretty bad. In particular, there exists a very simple learning algorithm/problem pair with the property that the leave-one-out estimate has the variance and deviations of a single coin flip. Yoshua Bengio and Yves Grandvalet in fact proved that there is no unbiased estimator of variance. The paper that I pointed to above shows that for K-fold cross validation on m examples, all moments of the deviations might only be as good as on a test set of size $m/K$.
I’m not sure what a ‘valid summary’ is, but leave-one-out cross validation can not provide results I trust, because I know how to break it.
I have personally observed people using leave-one-out cross validation with feature selection to quickly achieve a severe overfit.
Thanks for the explanation of brittleness! This is a problem with log-loss, but I’d say that it is not a problem with mutual information. Mutual information has well-defined upper bounds. For log-loss, you can put a bound into effect by mixing the prediction with a uniform distribution over y, bounding the maximum log-loss in a way that’s analogous to the Laplace probability estimate. While I agree that unmixed log-loss is brittle, I find classification accuracy noisy.
A reasonable compromise is Brier score. It’s a proper loss function (so it makes good probabilistic sense), and it’s a generalization of classification error where the Brier score of a non-probabilistic classifier equals its classification error, but a probabilistic classifier can benefit from distributing the odds. So, the result you mention holds also for Brier score.
If I perform 2-replicated 5-fold CV of the NBC performance on the Pima indians dataset, I get the following [0.76 0.75 0.87 0.76 0.74 0.77 0.79 0.72 0.78 0.82 0.81 0.79 0.73 0.74 0.82 0.79 0.74 0.77 0.83 0.75 0.79 0.73 0.79 0.80 0.76]. Of course, I can plop out the average of 0.78. But it is nicer to say that the standard deviation is 0.04, and summarize the result as 0.78 +- 0.04. The performance estimate is a random quantity too. In fact, if you perform many replications of cross-validation, the classification accuracy will have a Gaussian-like shape too (a bit skewed, though).
I too recommend against LOO, for the simple reason that the above empirical summaries are often awfully strange.
Very very interesting. However, I still feel (but would love to be convinced otherwise) that when the dataset is small and no additional data can be obtained, LOO-CV is the best among the (admittedly non-ideal) choices. What do you suggest as a practical alternative for a small dataset?
I’m not convinced by your observation about people using LOO-CV with feature selection to overfit. Isn’t this just a problem with reusing the same validation set multiple times? Even if I use a completely separately drawn validation set, which Bengio and Grandvalet show yield an unbiased estimtae of the variance of the prediction error, I can still easily overfit the validation set when doing feature selection, right?
This is my first post on your blog. Thanks so much for putting it up — a very nice resource!
Aleks’s technique for bounding log loss by wrapping the box in a system that mixes with the uniform distribution has a problem: it introduces perverse incentives for the box. One reason why people consider log loss is that the optimal prediction is the probability. When we mix with the uniform distribution, this no longer becomes true. Mixing with the uniform distribution shifts all probabilistic estimates towards 0.5, which means that if the box wants to minimize log loss, it should make an estimate p such that after mixing, you get the actual probability.
David McAllester advocates truncation as a solution to the unboundedness. This has the advantage that it doesn’t create perverse incentives over all nonextreme probabilities.
Even when we swallow the issues of bounding log loss, rates of convergence are typically slower than for classification, essentially because the dynamic range of the loss is larger. Thus, we can expect log loss estimates to be more “noisy”.
Before trusing mutual information, etc…, I want to see rate of convergence bounds of the form I mentioned above.
I’m not sure what Brier score is precisely, but justing using L(p,y)=(p-y)^2 has all the properties mentioned.
I consider reporting standard deviation of cross validation to be problematic. The basic reason is that it’s unclear what I’m supposed to learn. If it has a small deviation, this does not mean that I can expect the future error rate on i.i.d. samples to be within the range of the +/-. It does not mean that if I cut the data in another way (and the data is i.i.d.), I can expect to get results in the same range. There are specific simple counterexamples to each of these intuitions. So, while reporting the range of results you see may be a ‘summary’, it does not seem to contain much useful information for developing confidence in the results.
One semi-reasonable alternative is to report the confidence interval for a Binomial with m/K coin flips, which fits intuition (1), for the classifier formed by drawing randomly from the set of cross-validated classifiers. This won’t leave many people happy, because the intervals become much broader.
The notion that cross validation errors are “gaussian-like” is also false in general, on two counts:
This is an important issue because it’s not always obvious from experimental results (and intuitions derived from experimental results) whether the approach works. The math says that if you rely on leave-one-out cross-validation in particular you’ll end up with bad inuitions about future performance. You may not encounter this problem on some problems, but the monsters are out there.
For rif’s questions — keep in mind that I’m only really considering methods of developing confidence here. I’m ok with people using whatever ugly nasty hacks they want in producing a good predictor. You are correct about the feature selection example being about using the same validation set multiple times. (Bad!) The use of leave-one-out simply aggravated the effect of this with respect to using a holdout set because it’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.
Developing good confidence on a small dataset is a hard problem. The simplest solution is to accept the need for a test set even though you have few examples. In this case, it might be worthwhile to compute very exact confidence intervals (code here). Doing K-fold cross validation on m examples and using confidence intervals for m/K coin flips is better, but by an unknown (and variable) amount. The theory approach, which has never yet worked well, is to very carefully use the examples for both purposes. A blend of these two approaches can be helpful, but the computation is a bit rough. I’m currently working with Matti Kääriäinen on seeing how well the progressive validation approach can be beat into shape.
And of course we should remember that all of this is only meaningful when the data is i.i.d, which it often clearly is not.
I think we have a case where the assumptions of applied machine learners differ from the assumptions of the theoretical machine learners. Let’s hash it out!
==
* (Half-)Brier score is 0.5(p-y)^2, where p and y are vectors of probabilities (p-predicted, y-observed).
* A side consequence of mixing is also truncation; but mixing is smooth, whereas truncation results in discontinuities of the gradient. There is a good justification for mixing: if you see that you misclassify in 10% of the cases on the unseen test data, you can anticipate similar error in the future, and calibrate the predictions by mixing with the uniform distribution.
* Standard deviation of the CV results is a foundation for bias/variance decomposition and a tremendous amount of work in applied statistics and machine learning. I wouldn’t toss it away so lightly, and especially not based on the argument of non-independence of folds. The purpose of non-independence of folds in the first place is that you get a better estimate of the distribution over all the training/test splits of a fixed proportion (one could say that the split is chosen by i.i.d., not the instances). You get a better estimate with 10-fold CV than by picking 10 train/test splits by random.
* Both binomial and Gaussian model of the error distribution are just models. Neither of them is ‘true’, but they are based on slightly different assumptions. I generally look at the histogram and eyeball it for gaussianity, as I have done in my example. The fact that it is a skewed distribution (with the truncated hump at ~85%) empirically invalidates the binomial error model too. One can compute the first two moments as a “finite” summary as an informative summary even if the underlying distribution has more of them.
I am not advocating ‘tossing’ cross-validation. I am saying that caution should be exercised in trusting it.
Do you have a URL for this other analysis?
You are right to be skeptical about models, but the ordering of skepticism seems important. Models which make more assumptions (and in particular which makes assumptions that are clearly false) should be viewed with more skepticism.
What is standard deviation of cross validation errors is supposed to describe? I listed and dismissed a couple possibilities, so now I’m left without an understanding.
I’d like to follow up a bit on your comment that “It’s easier to achieve large deviations from the expectation on a leave-one-out estimate than on a holdout set.” I was not familiar with this fact. Could you discuss this in more detail, or provide a reference that would help me follow this up? Quite interesting.
I didn’t mean to imply that you’d disagree with cross-validation in general. The issue at hand is whether the standard deviation of CV errors is useful or not. I can see two reasons for why one can be unhappy about it:
a) It can happen that you get accuracy of 0.99 +- 0.03. What could that mean? The standard deviation is a summary. If you provide a summary consisting of the first two moments, it does not mean that you believe in the Gaussian model – of course those statistics are not sufficient. It is a summary that roughly describes the variance of the classifier, inasmuch that the mean accuracy indicates its bias.
b) The instances in a training and test set are not i.i.d. Yes, but the above summary relates to the question: “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?” This is a different question than “Given a randomly chosen instance, what will be the classifier’s expected accuracy?”
Several people have a problem with b) and use bootstrap instead of cross-validation in bias/variance analysis. Still, I don’t see a problem with the formulation, if one doesn’t attempt to perceive CV as an approximation to making statements about i.i.d. samples.
rif – see today’s post under “Examples”.
Aleks, I regard the 0.99 +/- 0.3 issue as a symptom that the wrong statistics are being used (i.e. assuming gaussianity on obviously non-gaussian draws).
I’m not particularly interested in “Given a randomly chosen training/test 9:1 split of instances, what can we say about the classifier’s accuracy on the test set?†because I generally think the goal of learning is doing well on future examples. Why should I care about this?
Reporting 0.99 +- 0.03 does not imply that one who wrote it believes that the distribution is Gaussian. Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99? Anyone surely knows that the classification accuracy cannot be more than 1.0, it would be most arrogant to assume such ignorance.
CV is the de facto standard method of evaluating classifiers, and many people trust the results that come out of this. Even if I might not like this approach, it is a standard, it’s an experimental bottom line. “Future examples” are something you don’t have, something you can only make assumptions about. Cross-validation and learning curves employ the training data as to empirically demonstrate the stability and convergence of the learning algorithm on what effectively *is* future data for the algorithm, under the weak assumption of permutability of the training data. Permutability is a weaker assumption than iid. My main problem with most applications of CV is that people don’t replicate the cross-validation on multiple assignments to folds, something that’s been pointed out quite nicely by, e.g.,
Estimating Replicability of Classifier Learning Experiments. ICML, 2004.
The problem with LOO is that you *cannot* perform multiple replications.
If your assumptions grow from iid, you shouldn’t use cross-validation, it’s a) not solving your problem, and b) you could get better results with an evaluation method that assumes more. It is unfair to criticize CV on these grounds. One can grow a whole different breed of statistics based on permutability and training/test splitting.
Reporting 0.99 +- 0.03 does mean that the inappropriate statistics are being used.
I am not trying to claim anything about the belief of the person making the application (and certainly not trying to be arrogant).
I have a problem with reporting the +/- 0.03. It seems that it has no interesting interpretation, and the obvious statistical interpretation is simply wrong.
The standard statistical “meaning” of 0.99 +- 0.03 is a confidence interval about an observation. A confidence interval [lower_bound(observation), upper_bound(observation)] has the property that, subject to your assumptions, it will contain the true value of some parameter with high probability over the random draw of the observation. The parameter I care about is the accuracy, the probability that the classifier is correct. Since the true error rate can not go above 1, this confidence interval must be constructed with respect to the wrong assumptions about the observation generating process. This isn’t that damning though – what’s really hard to swallow is that this method routinely results in intervals which are much narrower than the standard statistical interpretation would suggest. In other words, it generates overconfidence.
> Would you argue that reporting 0.99 +- 0.03 is worse than just reporting 0.99?
Absolutely. 0.99 can be interpreted as an unbiased monte carlo estimate of the “true” accuracy. I do not have an interpretation of 0.03, and the obvious interpretations are misleading due to nongaussianity and nonindependence in the basic process. Using this obvious interpretation routinely leads to overconfidence which is what this post was about.
I don’t regard the distinction between “permutable” and “independent” as significant here, because DeFinetti’s theorem says that all exchangeable (i.e. permutable) sequences can be thought of as i.i.d. samples conditioned on the draw of a hidden random variable. We do not care what the value of this hidden random variable is because a good confidence interval for accuracy works no matter what the datageneration process is. Consequently, the ‘different breed’ you speak of will end up being the same breed.
Many people use cross validation in a way that I don’t disagree with. For example, tuning parameters might be reasonable. I don’t even have a problem with using cross validation error to report performance (except when this creates a subtle instance of “reproblem”). What seems unreasonable is making confidence interval-like statements subject to known-wrong assumptions. This seems especially unreasonable when there are simple alternatives which don’t make known-wrong assumptions.
I think you are correct: many other people (I would not say it’s quite “the” standard) try to compute (and report) confidence interval-like summaries. I think it’s harmful to do so because of the routine overconfidence this creates.
rif — Another reason LOO CV is bad because it asymptotically suboptimal. For example if you use Leave One Out cross-validation for feature selection, you might end up selecting suboptimal subset, even with infinite training sample. Te neural-nets FAQ talks about it: http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html
Experimentally, Ronny Kohavi and Breiman found independently that 10 is the best number of folds for CV.
The FAQ says “cross-validation is markedly superior [to split sample validation] for small data sets; this fact is demonstrated dramatically by Goutte (1997)”. (google scholar has the paper), but I’m not sure their conclusions extend beyond their Gaussian synthetic data.
I agree with you regarding the inappropriateness of +- notation, and I also agree about general overconfidence of confidence intervals. Over here it says: “LTCM’s loss in August 1998 was a -10.5 sigma-event on the firm’s risk model, and a -14 sigma-event in terms of the actual previous price movements. Sometimes overfitting is very expensive
LTCM “lost” quite a few hundred million US$ (“lost” — financial transactions are largely a zero-sum game).
What if I’d had written 0.99(0.03), without implying that 0.03 is a confidence interval (because it is not)? It is quite rare in statistics to provide confidence intervals – usually one provides either the standard deviation of the distribution or the standard error of the estimate of the mean. Still, I consider the 0.03 a very useful piece of information, and I’m grateful to any author that is dilligent enough to provide some information about the variation in the performance. I’d reject a paper that only provides the mean for a small dataset, or didn’t perform multiply replicated experiments.
As much as I’m concerned this is The Right Way of dealing with confidence intervals of cross-validated loss is to perform multiple replications of cross-validation, and provide the scores at appropriate percentiles. My level of agreement with the binomial model is about at the same level as your agreement with the Gaussian model. Probability of error is meaningless: there are instances that you can almost certainly predict right, there are instances that you usually misclassify, and there are boundary instances where the predictions of the classifier vary, depending on the properties of the split. Treating all these groups as one would be misleading.
Regarding de Finetti, one has to be careful: there is a difference between finite and infinite exchangeability. The theorem goes from *infinite* exchangeability to iid. When you have an infinite set, there is no difference between forming a finite sample by sampling-with-replacement (bootstrap) versus sampling-without-replacement (cross-validation). When you have a finite set to sample from, it’s two different breeds.
As for assumptions, they are all wrong… But some are more agreeable than others.
0.99(0.03) is somewhat better, but I suspect people still interpret it as a confidence interval, even when you explicitly state that it is not.
Another problem is that I still don’t know why it’s interesting. You assert it’s very interesting, but can you explain why? How do you use it? Saying 0.99(0.03) seems semantically equivalent to saying “I achieved test set performance of 0.99 with variation 0.03 across all problems on the UCI database”, except not nearly as reassuring because the cross-validation folds do not encompass as much variation across real-world problems.
On Binomial vs. Gaussian model: the Binomial model (at least) has the advantage that it is not trivially disprovable.
On probability of error: it’s easy to criticize any small piece of information as incomplete. Nevertheless, we like small pieces of information because we can better understand and use them. “How often should I expect the classifier to be wrong in the future” seems like an interesting (if incomplete) piece of information to me. A more practical problem with your objection is that distinguishing between “always right”, “always wrong” and “sometimes right” examples is much harder, requiring more assumptions, than distinguishing error rate. Hence, such judgements will be more often wrong.
I had assumed you were interested in infinite exchangeability because we are generally interested in what the data tells us about future (not yet seen) events. Analysis which is only meaningful with respect to known labeled examples simply doesn’t interest me, in the same way that training error rate doesn’t interest me.
Why bother to make a paper, at all? Why don’t you code stuff and throw it into e-market? There are forums, newsgroups, and selected “peers” for things that are incomplete and require some discussion.
No, 0.99(0.03) means 0.99 classification error across 90:10 training-test splits on a single data set. It is quite meaningless to try to assume any kind of average classification error across different data sets.
Regarding probability of error, if it’s easy to acquire this kind of information, why not do it?
Infinite exchangeability does not apply to a finite population. What do you do when I gather *all* the 25 cows from the farm and measure them? You cannot pretend that there are infinitely many cows in the farm. You can, however, wonder about the number of cows (2,5, 10, 25?) you really need to measure to characterize all the 25 with reasonable precision.
I maintain that future is unknowable. Any kind of a statement regarding the performance of a particular classifier trained from data should always be seen as relative to the data set.
This still isn’t answering my question: Why is 0.03 useful? I can imagine using an error rate in decision making. I can imagine using a confidence interval on the error rate in decision making. But, I do not know how to use 0.03 in any useful way.
Note that 0.99 means 0.99 average classification error across multiple 90:10 splits. 0.99(0.03) should mean something else if 0.03 is useful.
Your comment on exchangeability makes more sense now. In this situation, what happens is that (basically) you trade using a Binomial distribution for a Hypergeometric distribution to analyze the number of errors on the portion of the set you haven’t seen. The trade Binomial->Hypergeometric doesn’t alter intuitions very much because the distributions are fairly similar (Binomial is a particlular limit of the Hypergeometric, etc…)
0.03 gives you an indication of reliability, stability of a classifier. This relates to the old bias/variance tradeoff. A short bias/variance reading list:
Neural networks and the bias/variance dilemma
Bias, Variance, and Arcing Classifiers
A Unified Bias-Variance Decomposition for Zero-One and Squared Loss
This still isn’t the answer I want. How is 0.03 useful? How do you use it?
The meaning of “stability” here seems odd. It seems to imply nothing about how the algorithm would perform for new problems or even for a new draw of the process generating the current training examples. Why do we care about this very limited notion of stability?
If you don’t mind a somewhat philosophical argument, examine the Figure 5 in Modelling Modelled. The NBC becomes highly stable beyond 150 instances. On the other hand, C4.5 has a higher average utility, but also a greater variation in its utility on the test set. Is it meaningful to compare both methods when the training set consists of ~100 instances? The difference in expected utility is negligible in comparison to the total amount of variation in performance.
This still isn’t answering my question. How and why do you use 0.03? There should be a simple answer to this, just like there are simple answeres for 0.99 and for confidence intervals about 0.99.
(I don’t want to spend time debating what is and is not “meaningful”, because that seems to vague.)
(0.03) indicates how much the classification accuracy is affected by the choice of the training data across the experiments. It quantifies the variance of the learned model. It describes that the estimate of classification accuracy across test sets of a certain size is not a number, it is a distribution.
I get my distribution of expected classification accuracy through sampling, and the only assumption is the fixed choice of the relative size of the training and test set. The purpose of (0.03) is to stress that the classification accuracy estimate depends on the assignment of instances to training or test set. You get your confidence interval starting from an arbitrary point estimate “0.99” along with a very strong binomial assumption, one that is invalidated by the above sampling experiments. It’s a simple answer alright, but a very dubious set of assumptions.
By now, I’ve listed sufficiently many papers that attempt to justify the bias/variance problem, and the purpose of (0.03) should be apparent in the context of this problem. Do you have a good reason for disagreeing with with the whole issue of bias/variance decomposition?
I know what (0.03) indicates, but this still doesn’t answer my question. How do we _use_ it? How is this information supposed to affect the choices that we make? The central question is whether or not (0.03) is relevant to decision making, and I don’t yet see that relevance.
“Binomial distribution” is not the assumption. Instead, it is the implication. The assumption is iid samples. This assumption is not always true, but none of the experiments in the ‘modelling modeled’ reference seem to be the sort which disprove the correctness of the assumption. In particular, cutting up the data in several different ways and learning different classifiers with different observed test error rates cannot disprove the independence assumption.
This reminds me of Lance’s post on calibrating weather prediction numbers. The weatherman tells us that (subjective) probability of rain tomorrow is 0.8 How do (should) we use that? Now suppose we know something about the prior he used to come up with the 0.8 estimate. Does that change the way we use the number?
Re: Yaroslav – Yes, if the prior doesn’t match our own prior, we can squeeze out the update and update *our* prior.
Re: John – If you accept the bias/variance issue, then (0.03) is interesting therefore intrinsically useful
I guess you don’t buy this. It concerns the estimation of risk, second-order probability (probability-of-probability), etc. The issue is that you cannot characterize the error rate reliably, and must therefore use a probability distribution. This is the same pattern as with introducing error rate because you cannot say whether a a classifier is always correct or always wrong.
A more practical utility is comparing two classifiers in two cases. In one case, the classifier A gets the classification accuracy of 0.88(0.31) and B gets 0.90(0.40). What probability would you assign to the statement “A is better than B?” in the absence of any other information? Now consider another experiment, where you get 0.88(0.01) for A and 0.90(0.01) for B.
Why would I want to assign a probability to “A is better than B”? How would you even do that given this information? And what does “better” mean?
a) What is the definition you use to do model selection? b) Any assignment is based upon a particular data set. c) “better” – lower aggregate loss on the test set.
a) I am generally inclined to avoid model selection because it is a source of overfitting. I would generally rather make a weighted integration of predictions. If pressed for computational reasons, I might choose the classifier with the smallest cross validation or validation set error rate.
I still don’t understand why you want to assign a probability.
b) I don’t understand your response. You give examples of 0.88(0.01) and 0.90(0.01). How do you use the 0.01 to decide?
c) I agree with your definition of better, as long as the test set is not involved in the cross validation process.
Interesting! Now I understand: all the stuff I’ve been talking about in this thread is very much about the tools and tricks in order to do model selection. But you dislike model selection, so obviously these tools and tricks may indeed seem useless.
a) If you have to make a choice, how easy is it for you to then state that A is better than B? It’s very rare that A would always be better than B. Instead, it may usually be better. Probability captures the uncertainty inherent to making such a choice. The probability of 0.9 means that in 90% of the test batches, A will be better.
b) With A:0.88(0.01) vs B:0.90(0.01), B will almost always be better than A. With A:0.88(0.1) vs B:0.90(0.1), we can’t really say which one will be better, and a choice could be arbitrary.
c) OK, but assume you have a certain batch of the data. That’s all you have. What do you do? Create a single test/train split, or create a bunch of them and ‘integrate out’ the dependence of your estimate on the particular choice?
Regarding the purpose of model selection. I’m sometimes working with experts, e.g. MD’s, who gathered the data and want to see the model. I train SVM, I train classification trees, I train NBC, I train many other things. Eventually, I would like to give them a single nicely presented model. They cannot evaluate or teach this ensemble of models. They won’t get insights from an overly complex model, they need something simpler, something they can teach/give to their ambulance staff to make decisions. So the nitty-gritty reality of practical machine learning has quite an explicit model complexity cost.
And one way of dealing with model complexity is model selection. It’s cold and brutal, but it gets the job done. The above probability is a way of quantifying how unjustified or arbitrary it is in a particular case. If it’s too brutal and if the models are making independent errors, then one can think about how to approximate or present the ensemble. Of course, I’d want to hand the experts the full Bayesian posterior, but how do I print it out on an A4 sheet of paper so that the expert can compare it to her intuition and experience?
Of course, I’m not saying that everyone should be concerned about model complexity and presentability. I am just trying to justify its importance to applied data analysis.
I understand that some form of predictor simplification/model selection is sometimes necessary.
a) I still don’t understand why you want to assign a probability to one being better than another. If we accept that model selection/simplification must be done, then it seems like you must make a hard choice. Why are probabilities required?
b) The reasoning about B and A does not hold on future data in general (and I am not interested in examples where we have already measured the label). In particular, I can give you learning algorithm/problem pairs in which there is a very good chance you will observe something which looks like a significant difference over cross validation folds, but which is not significant. The extreme example mentioned in this post shows you can get 1.00(0.00) and 0.00(0.00) for two algorithms producing classifiers with the same error rate.
c) If I thought there was any chance of a time ordering in the data, I would using a single train/test split with later things in the test set. I might also be tempted to play with “progressive validation” (although that’s much less standard). If there was obviously no time dependence, I might use k-fold cross validation (with _small_ k) and consider the average error rate a reasonable predictor of future performance. If I wanted to know roughly how well I might reasonably expect to do in the future and thought the data was i.i.d. (or effectively so), I would use the test set bound.
a) I consider 10-fold cross-validation to be a series of 10 experiments. For each of these experiments, we obtain a particular error rate. For a particular experiment, A might be better than B, but for a different experiment B would be better than A. Both probability and the standard deviations are ways of modelling the uncertainty that comes with this. If I cannot make a sure choice, and if modelling uncertainty is not too expensive, why not model it?
b) Any fixed method can be defeated by an adaptive adversary. I’m looking for a sensible evaluation protocol that will discount both overfitting and underfitting, and I realize that nothing is perfect.
c) I agree with your suggestions, especially with the choice of a small ‘k’. Still, I would stress that cross-validation is to be replicated multiple times, with several different permutations of the fold-assignment vector. Otherwise, the results are excessively dependent on a particular assignment to folds. If something affects your results, and if you are unsure about it, then you should not keep it fixed, but vary it.
a) I consider the notion that 10-fold cross validation is 10 experiments very misleading, because there can exist very strong dependencies between the 10 “experiments”. It’s like computing the average and standard deviations of the wheel locations of race car #1 and race car #2. These simply aren’t independent, and so the amount of evidence they provide towards “race car #1 is better than race car #2″ is essentially the same as the amount of evidence given by “race car #1 is in front of race car #2″.
b) Pleading “but nothing works in general” is not convincing to me. In the extreme, this argument can be used to justify anything. There are some things which are more robust than other things, and it seems obvious that we should prefer the more robust things. If you use confidence intervals, this nasty example will not result in nonsense numbers, as it does with the empirical variance approach.
You may try to counterclaim that there are examples where confidence intervals fail, but the empirical variance approach works. If so, state them. If not, the confidence interval approach at least provides something reasonable subject to a fairly intuitive assumption. No such statement holds for the empirical variance approach.
c) I generally agree, as time allows.
I agree about b), but continue to disagree about a). The argument behind it is somewhat intricate. We’re estimating something random with a non-random set of experiments. Let me pose a small problem/analogy: if you wanted to use monte carlo sampling to estimate the area of a certain shape in 2D, but you can only take 10 samples, would you draw these samples purely at random? You would not, because you would risk the chance that you’d sample the same point twice, and would gain no information. Cross-validation is a bit like that: it tries to diversify the samples in order to get a better estimate with fewer samples. Does it make sense?
No, it does not. Cross validation makes samples which are (in analogy) more likely to be the same than independent samples. That’s why you can get the 1.00(0.00) or 0.00(0.00) behavior.
Back to this tar baby
I understand your concern, but it is inherent to *sampling without replacement* of instances as contrasted to *sampling with replacement* of instances. I was not arguing bootstrap or iid versus training/test split or cross-validation. I was arguing for cross-validation compared to random splitting into the training and test set.
It’s quite clear that i.i.d. is often incompatible with sampling without replacement, and I can demonstrate this experimentally. In some cases, i.i.d. is appropriate (large populations, random sampling), and in other cases splitting is appropriate (finite populations, exhaustive or stratified sampling). These two stances should be kept apart and not mixed, as seems to be the fashion. What should be a challenge is to study learning in the latter case.
I don’t understand what is meant by “incompatible” here.
Assuming m independent samples, what we know (detailed here) is that K-fold cross validation has a smaller variance, skew, or other higher order moment then a random train/test split with the test set of size m/K. We do not and cannot (fully) know how much smaller this variance is. There exist examples where K-fold cross validation has the same behavior as a random train/test split.
If you want to argue that cross-validation is a good idea because it removes variance, I can understand that. If you want to argue that the individual runs with different held out folds are experiments, I disagree. This really is like averaging the position of wheels on a race car. It reduces variance (i.e. doesn’t let a race car with a missing wheel win), but it is still only one experiment (i.e. one race). If you want more experiments, you should not share examples between runs of the learning algorithm.
Incompatible means that assuming i.i.d. within the classifier will penalize you if the classifier is evaluated using cross-validation: the classifier is not as confident as it can afford to be. I’m not arguing that CV is better, I’m just arguing that it’s different. I try to be agnostic with respect to evaluation protocols, and adapt to the problem at hand. CV tests some things, bootstrap other things, each method has its pathologies, but advocating a single individual train/test split is complete rubbish unless you’re in highly cost-constrained adversial situation.
But now I’ll play the devil’s advocate again. Assume that I’m training on 10% and testing on 90% of data in “-10″-fold CV. Yes, the experiments are not independent. Why should they be? Why shouldn’t I exhaustively test all the tires of the car in four *dependent* experiments? Why shouldn’t I test the blood pressure of every patient just once, even if this makes my experiments dependent? Why shouldn’t I hold out for validation each and every choice of 10% of instances? Why is having this kind of dependence any less silly than sampling the *same* tire multiple times in order to keep the samplings “independent”? Would it be less silly than sampling just one tire and compute a bound based on that single measurement, as any additional measure could be dependent? Why is using a Gaussian to model the heights of *all* the players in a basketball team silly, even if the samples are not independent?
The notion that “advocating a single individual train/test split is complete rubish except in a cost constrained adversarial situation” is rubbish. As an example, suppose you have data from wall street and are trying to predict stock performance. This data is cheap and plentiful, but the notion of using cross validation is simply insane due to the “survivor effect”: future nonzero stock price is a strong and unfair predictor of past stock price. If you try to use cross validation, you will simply solve the wrong problem.
What’s happening here is that cross validation relies upon identicality of data in a far more essential manner than just having a training set and a test set. It is essential to understand this in considering methods for looking at your performance.
For your second point, I agree with the idea of reducing variance via cross validation (see second paragraph of comment 42) when the data is IID. What I disagree with is making confidence interval-like statements about the error rate based upon these nonindependent tests. If you want to know that one race car is better than another, you run them both on different tracks and observe the outcome. You don’t average over their wheel positions in one race and pretend that each wheel position represents a different race.
Well, of course neither cross-validation nor bootstrap makes sense when the assumption of instance exchangeability is clearly not justified. It was very funny to see R. Kalman make this mistake in http://www.pnas.org/cgi/content/abstract/101/38/13709/ – a journalist noticed this and wrote a pretty devastating paper on why peer review is important. My comment on “rubbish” was in the context of the validity of instance exchangeability, of course.
Regarding your note on “reducing variance”: I believe that you’re trying to find some benefit of cross-validation in the context of IID. Although you might do that, the crux of my message is that finite exchangeability (FEX) exercised by CV is different from infinite exchangeability (iid) exercised by bootstrap. Finite exchangeability has value on its own, not just as an approximation to infinite exchangeability. In fact, I’d consider finite exchangeability as primary, and infinite exchangeability as n approximation to it. I guess that your definition of confidence interval is based upon IID, so if I do “confidence intervals” based on FEX, it may look wrong.
I hope that I understand you correctly. What I’m suggesting is to allow for and appreciate the assumption of finite exchangeability, and build theory that accomodates for it. Until then, it would be unfair to dismiss empirical work assuming FEX in some places just because most theory work assumes IID.
I’ve worked on FEX confidence intervals here. The details change, but not the basic message w.r.t. the IID assumption.
The basic issue we seem to be debating, regardless of assumptions about the world, is whether we should think of the different runs of cross validation as “different” experiments. I know of no reasonable assumption under which the answer is “yes” and many reasonable assumptions under which the answer is “no”. For this conversation to be further constructive, I think you need to (a) state a theorem and (b) argue that it is relevant.
[...] Drug studies. Pharmaceutical companies make predictions about the effects of their drugs and then conduct blind clinical studies to determine their effect. Unfortunately, they have also been caught using some of the more advanced techniques for cheating here: including “reprobleming”, “data set selection”, and probably “overfitting by review”. It isn’t too surprising to observe this: when the testers of a drug have $109 or more riding on the outcome the temptation to make the outcome “right” is extreme. [...]
Useful list. Should be made required reading for students of ML.