sql用于字符串的聚合函数
In this article, you’ll learn the tips for getting started using SQL string functions for data munging with SQL Server. In many cases, Machine learning outcomes are only as good as the data they’re built on – but the work of preparing data for analytics (that is, data wrangling) can eat up as much as 80% of your project efforts.
在本文中,您将学习有关使用SQL字符串函数进行SQL Server数据处理的提示。 在许多情况下,机器学习的结果仅取决于其构建的数据,但是为分析准备数据(即数据整理)的工作可能会占用多达80%的项目工作量。
In this guide, we’ll see the following topics:
在本指南中,我们将看到以下主题:
We’ll look at specific SQL string function examples including
我们将看一下特定SQL字符串函数示例,包括
Data munging (or Wrangling) is the process of data transformation into various states so that it is easier to work and understand the data. The transformation may lead to manually convert or modify or merge the data in a certain format to generate well-defined streams of data which is ready for consumption by the data analysis tools and techniques.
数据处理(或争吵)是指将数据转换为各种状态的过程,以便更轻松地工作和理解数据。 转换可能导致以某种格式手动转换或修改或合并数据,以生成定义明确的数据流,这些数据流已准备好供数据分析工具和技术使用。
The various data-sources are
各种数据源是
You don’t have to be working in data science very long before you discover the importance of SQL. We can refer to the Kdnugget Software Poll, the top analytics, data mining, and data science software used in 2015 and look at the SQL’s place. In the survey of data professionals, SQL is placed third in terms of its usage. It’s also the first database tool on the list. Now, you see that R is right at the top.
在您发现SQL的重要性之前,您不必在数据科学领域工作很久。 我们可以参考Kdnugget Software Poll ,这是2015年使用的顶级分析,数据挖掘和数据科学软件,并了解SQL的位置。 在对数据专业人员的调查中,SQL的使用率排名第三。 它也是列表中的第一个数据库工具。 现在,您会看到R在顶部。
Note: The above picture is a reference from the following website. www.kdnuggets.com
注意:上图是来自以下网站的参考。 www.kdnuggets.com
Let now take a deep dive into SQL string functions to see the different phases of data munging
现在让我们深入研究SQL字符串函数,以查看数据处理的不同阶段
SQL is further classified into Data Manipulation Language (DML) and Data Definition Language (DDL). These commands are used to work with data-sets more efficiently. We’ll take a look at DML commands at later part of the article.
SQL进一步分为数据处理语言(DML)和数据定义语言(DDL)。 这些命令用于更有效地处理数据集。 我们将在本文的后面部分介绍DML命令。
Now, let’s discuss and analyze some of the SQL commands and understand why data-scientist needs to know these commands to do their work efficiently. In most cases, a majority portion of their work is about data gathering, data preparation, data cleaning, and data restructuring. After the data preparation phase, the scientist can move forward with the data analysis. In some scenarios, it’s been assumed that about 70% to 80% of the time on the data science project is spent on data manipulation; if this is the case then most of that time is spent working with SQL queries.
现在,让我们讨论和分析一些SQL命令,并理解为什么数据科学家需要了解这些命令才能有效地完成其工作。 在大多数情况下,他们的大部分工作都与数据收集,数据准备,数据清理和数据重组有关。 在数据准备阶段之后,科学家可以继续进行数据分析。 在某些情况下,假设在数据科学项目上约有70%到80%的时间用于数据处理。 如果是这种情况,则大部分时间都花在处理SQL查询上。
The data cleansing is an art in data science; we often tend to collect data from multiple data sources. Many times the same data is stored differently in multiple systems. Let us classify the data munging process into the following categories:
数据清理是数据科学中的一门艺术。 我们通常倾向于从多个数据源收集数据。 很多时候,同一数据在多个系统中的存储方式不同。 让我们将数据处理过程分为以下几类:
As a basic principle, whenever we start working with a new data set, it is recommended to spend more time to understand the type and nature of the data.
作为基本原则,每当我们开始使用新数据集时,建议您花更多的时间来了解数据的类型和性质。
For example, one couple of data-sets may use abbreviations for departments and in some other datasets it may spell out the full name. We need to reformat data to get it into a consistent format.
例如,一对数据集可以使用部门的缩写,而在其他一些数据集中则可以拼写全名。 我们需要重新格式化数据以使其成为一致的格式。
Character | Description | Example |
ASCII | The ASCII() SQL String function servers as a characters encoding standard format. | In the following example, the ASCII values are returned for the given input
|
CHAR | The CHAR() SQL string function converts an int ASCII code to a character value. Use CHAR to insert control characters into character strings. | The control character are:
|
UPPER | The UPPER() SQL string function is used to convert the lower case to upper case of the given string | Convert the text ‘SQL Server 2017’ to upper-case
|
LOWER | The LOWER() SQL string function is used to convert the upper case to lower case of the given string | Convert the text ‘SQL Server 2017’ to lower-case
|
CONCAT | The CONCAT() SQL string function is used to concatenate two or more string | Concatenate strings ‘SQL Shack’ and ‘2018’ to SQL Shack 2018. |
DISTINCT | The DISTINCT keyword is used to eliminates duplicate records from the SQL results |
|
TRIM | The TRIM() SQL string function is used to remove blanks from the leading and trailing position of the given string | In the following example removes spaces from before and after the word SQL Server 2017. |
LTRIM | LTRIM() SQL string function is used to remove the leading blanks from a given string | In the following example removes the leading spaces from the word SQL Server 2017
|
RTRIM | RTRIM() SQL string function is used to remove trailing blanks from a given string | In the following example removes the trailing spaces from the word SQL Server 2017 |
RIGHT | The RIGHT() SQL string function is used to return a specified number of characters from the right side of the given string | The following example returns the 4 rightmost characters of the word SQL Server 2017.
|
LEFT | The LEFT() SQL string function is used to return a specified number of characters from the left side of the given string | The following example returns the 10 leftmost characters of the word SQL Server 2017. |
REPLACE | The REPLACE() SQL string function used to replace all the occurrences of a source string with a target string of a given string | Replaces the string ’vNext’ with ‘2018’.
|
REPLICATE | The REPLICATE() SQL string function is used repeat the given string into the specified number of times | The following example, the string ‘SQL Shack The string ‘SQL Shack Author’ is repeated 5 times |
SPACE | The SPACE() SQL string function replicates the number blanks that we want add to the given string | The following example concatenates ‘SQL Shack’, two spaces and the word ‘2018’ |
TRANSLATE | The TRANSLATE() SQL string function is used to perform a one-to-one, single-character substitution of a given string | In the following example the string replacement is performed using translate function. here |
REVERSE | The REVERSE() SQL string function is used to get a mirror image of the given string | The following example returns the word with the characters reversed.
|
FORMAT | The FORMAT() SQL string function is used to return specified formatted value | The FORMAT SQL string function introduced in SQL SERVER 2012. It returns the value to format in by specified format and optional culture in SQL Server 2017. Examples for Date and Time formats
|
CONCAT_WS | The CONCAT_WS() SQL string function is a Concatenate with Separator and is a special form of CONCAT() | Concat_WS() emulates the behavior of stuff and Coalesce function. In the following example, ‘-‘ is the delimiter specified in the first argument followed by firstname, Middle name and lastname. The output concatenates three columns from the Person table separating the value with a ‘-‘ here |
字符 | 描述 | 例 |
ASCII码 | ASCII() SQL String函数服务器将字符编码为标准格式。 | 在以下示例中,为给定的输入返回ASCII值 |
焦炭 | CHAR() SQL字符串函数将int ASCII代码转换为字符值。 使用CHAR将控制字符插入字符串。 | 控制字符为:
|
上 | UPPER() SQL字符串函数用于将给定字符串的小写转换为大写 | 将文本“ SQL Server 2017”转换为大写 |
降低 | LOWER() SQL字符串函数用于将给定字符串的大写转换为小写 | 将文本“ SQL Server 2017”转换为小写 |
康卡特 | CONCAT() SQL字符串函数用于连接两个或多个字符串 | 将字符串'SQL Shack'和'2018'连接到SQL Shack 2018。 |
不同 | DISTINCT关键字用于从SQL结果中消除重复的记录 |
|
修剪 | TRIM() SQL字符串函数用于从给定字符串的开头和结尾位置删除空格 | 在以下示例中,删除SQL Server 2017一词前后的空格。 |
LTRIM | LTRIM() SQL字符串函数用于从给定字符串中删除前导空格 | 在以下示例中,从SQL Server 2017一词中删除了前导空格 |
RTRIM | RTRIM() SQL字符串函数用于从给定字符串中删除结尾的空格 | 在以下示例中,从单词SQL Server 2017中删除了尾随空格 |
对 | RIGHT() SQL字符串函数用于从给定字符串的右侧返回指定数量的字符 | 下面的示例返回单词SQL Server 2017的最右边4个字符。 |
剩下 | LEFT() SQL字符串函数用于从给定字符串的左侧返回指定数量的字符 | 下面的示例返回单词SQL Server 2017的最左边的10个字符。 |
更换 | REPLACE() SQL字符串函数,用于用给定字符串的目标字符串替换所有出现的源字符串 | 将字符串“ vNext”替换为“ 2018”。 |
复制 | REPLICATE() SQL字符串函数用于将给定的字符串重复指定次数 | 下面的示例字符串'SQL Shack字符串'SQL Shack Author'重复了5次 |
空间 | SPACE() SQL字符串函数将要添加到给定字符串的数字空格复制 | 以下示例将“ SQL Shack”,两个空格和单词“ 2018”连接在一起 |
翻译 | TRANSLATE() SQL字符串函数用于对给定的字符串执行一对一,单字符替换 | 在以下示例中,使用转换功能执行字符串替换。 这里 |
逆转 | REVERSE() SQL字符串函数用于获取给定字符串的镜像 | 下面的示例返回带有相反字符的单词。 |
格式 | FORMAT() SQL字符串函数用于返回指定的格式化值 | SQL SERVER 2012中引入的FORMAT SQL字符串函数。它以SQL Server 2017中指定的格式和可选的区域性返回要格式化的值。日期和时间格式的示例 |
CONCAT_WS | CONCAT_WS() SQL字符串函数是带分隔符的串联,是CONCAT()的一种特殊形式 | Concat_WS()模拟填充和Coalesce函数的行为。 在下面的示例中,“-”是第一个参数中指定的分隔符,后跟名字,中间名和姓氏。 输出将Person表中的三列连接在一起,并用'-'分隔值 这里 |
In addition to matching and reformatting strings, we sometimes need to take them apart and extract pieces of stings. SQL Server provides some general purpose SQL string functions for extracting and overriding strings. Let’s start with a simple string that’s easy to experiment
除了匹配和重新格式化字符串外,我们有时还需要将它们拆开并提取pieces子。 SQL Server提供了一些通用SQL字符串函数来提取和覆盖字符串。 让我们从一个易于实验的简单字符串开始
Character | Description | Usage |
LEN | The LEN() SQL string function is used to determine the length of the given string excluding the trailing blanks | The following example selects the number of characters with an exclusion of the trailing spaces.
|
DATALENGTH | The DATALENGTH() SQL string function excludes the trailing blanks in a given string. If this is a problem, then use DATALENGTH SQL string function which includes the trailing blanks. | In this example, the trailing blanks are also considered while evaluating string length.
|
CHARINDEX | The CHARINDEX() SQL string function is used to return the location of a substring in a given string. | In the following example, the starting position ‘Shack’ of the first expression will be returned.
|
PATINDEX | The PATINDEX() SQL string function is used to get the starting position of the first occurrence of given pattern in a specified expression | In the following examples, ‘%’ and ‘_’ wildcard characters are used to fin the position of the pattern in a given expression. PATINDEX works just like LIKE operator but it returns the matching position.
|
SUBSTRING | The SUBSTRING() SQL string function is used to returns a specific portion of a given string | The following query returns only the part of an input character string. In this example,
|
STUFF | The STUFF() SQL string function is used to place a string within another string | The following example returns a character string created by inserting a word Demo at the starting position 5 without deleting any letters from the given string ‘SQL Shack’ |
STRING_AGG | The STRING_AGG() SQL string function is the an aggregate function used to compute a single result from a set of input values | The following example returns the names separated by ‘-‘ in a single result set. here |
STRING_SPLIT | The STRING_SPLIT() SQL string function is used to splits the input string by a specified separation character and returns the output split values in the form of table | SQL Server 2016 introduced a new STRING_SPLIT table-valued function. In an earlier version, we used to write function, CLR code to decode the values. |
字符 | 描述 | 用法 |
伦 | LEN() SQL字符串函数用于确定给定字符串的长度,不包括尾随空格 | 下面的示例选择不带尾随空格的字符数。 |
数据长度 | DATALENGTH() SQL字符串函数排除给定字符串中的尾随空格。 如果这是一个问题,请使用DATALENGTH SQL字符串函数,其中包括尾随空格。 | 在此示例中,在评估字符串长度时,还会考虑尾随空白。 |
CHARINDEX | CHARINDEX() SQL字符串函数用于返回给定字符串中子字符串的位置。 | 在下面的示例中,将返回第一个表达式的起始位置“ Shack”。 |
PATINDEX | PATINDEX() SQL字符串函数用于获取指定表达式中给定模式的首次出现的起始位置 | 在以下示例中,使用'%'和'_'通配符在给定表达式中查找模式的位置。 PATINDEX的工作方式类似于LIKE运算符,但它返回匹配位置。 |
订阅 | SUBSTRING() SQL字符串函数用于返回给定字符串的特定部分 | 以下查询仅返回输入字符串的一部分。 在这个例子中 |
东西 | STUFF() SQL字符串函数用于将一个字符串放置在另一个字符串中 | 以下示例返回一个字符串,该字符串是通过在起始位置5插入单词Demo而不从给定字符串'SQL Shack'中删除任何字母而创建的 |
STRING_AGG | STRING_AGG() SQL字符串函数是一个聚合函数,用于从一组输入值中计算单个结果 | 以下示例在单个结果集中返回用“-”分隔的名称。 这里 |
STRING_SPLIT | STRING_SPLIT() SQL字符串函数用于按指定的分隔符分隔输入字符串,并以表的形式返回输出分隔值 | SQL Server 2016引入了新的STRING_SPLIT表值函数。 在较早的版本中,我们曾经编写函数CLR代码来解码值。 |
Sometimes we will need to reformat numbers. This is especially true when we use calculations that have results with large numbers of decimal digits
有时我们需要重新格式化数字。 当我们使用具有大量十进制数字的结果的计算时,尤其如此
CONVERT | The CONVERT() SQL string function is used to convert an expression from one data type to another data type. The CONVERT SQL string function accepts in a style parameter which is used for formatting the SQL output | Implicit conversions do not require either the CAST function or the CONVERT function. Only explicit conversions require specification of the CAST or the CONVERT function. When converting a value from float or numeric to an integer, the CONVERT() SQL string function will truncate the result. For other conversions, the CONVERT() SQL string function will round the result. here |
CAST | The CAST() SQL string function is used to convert an expression from one data type to another data type. | Convert an expression from valid date string to DateTime.
|
STR | The STR() SQL string function converts a numeric value into a string | The following example converts an integer to character string and concatenate the value with the first string. |
兑换 | CONVERT() SQL字符串函数用于将表达式从一种数据类型转换为另一种数据类型。 CONVERT SQL字符串函数接受样式参数 ,该参数用于格式化SQL输出 | 隐式转换不需要CAST函数或CONVERT函数。 只有显式转换才需要指定CAST或CONVERT函数。 将值从浮点型或数字型转换为整数时,CONVERT()SQL字符串函数将截断结果。 对于其他转换,CONVERT()SQL字符串函数将舍入结果。 这里 |
投 | CAST() SQL字符串函数用于将表达式从一种数据类型转换为另一种数据类型。 | 将表达式从有效日期字符串转换为DateTime。 |
STR | STR() SQL字符串函数将数字值转换为字符串 | 下面的示例将整数转换为字符串,然后将值与第一个字符串连接在一起。 |
The use of the WHERE and HAVING clauses in a SELECT statement control the subset of the output from the source tables.
在SELECT语句中使用WHERE和HAVING子句控制源表的输出子集。
Regular expression search is based on patterns, wildcards, and special characters. Even if you never ever write CLR, regex (as it’s also known) can be useful to you today, right now. Open a new query window in SQL Server Management Studio.
正则表达式搜索基于模式,通配符和特殊字符。 即使您从未写过CLR,正则表达式(众所周知)现在也对您有用。 在SQL Server Management Studio中打开一个新的查询窗口。
The regular expression transformation exposes the power of regular expression matching within the pipeline. One or more columns can be selected, and for each column an individual expression can be applied. The way multiple columns are handled can be set on the options page. The AND option means all columns must match, whilst the OR option means only one column has to match. If rows pass their tests then rows are passed down the successful match output. Rows that fail are directed down the alternate output
正则表达式转换揭示了管道中正则表达式匹配的功能。 可以选择一列或多列,并且可以为每一列应用单独的表达式。 可以在选项页面上设置处理多列的方式。 AND选项表示所有列必须匹配,而OR选项表示仅一列必须匹配。 如果行通过其测试,则行将向下传递到成功匹配输出。 失败的行直接向下输出
Character | Description | Usage |
% | Matches a string of zero or more characters. | The following example returns the any values that matches the string ‘Kim’
|
Underscore (_) | To Match a single character of given string | The following example returns the any values whose first characters is unknown followed by ‘im’
|
[ …] | Matches any character within a specified range | The following example returns the values whose first characters is in the range A to S followed by ‘im’
|
[^…] | Matches any character not within a specified range | The following example returns the values whose first characters are unknown; second-and-third character is ‘im’ and fourth character that matches not within l to Z range.
|
字符 | 描述 | 用法 |
% | 匹配零个或多个字符的字符串。 | 以下示例返回与字符串“ Kim”匹配的所有值 |
下划线(_) | 匹配给定字符串的单个字符 | 以下示例返回前几个字符未知的所有值,后跟“ im” |
[…] | 匹配指定范围内的任何字符 | 以下示例返回其第一个字符在A到S范围内的值,后跟“ im” |
[^…] | 匹配不在指定范围内的任何字符 | 以下示例返回其第一个字符未知的值; 第二个和第三个字符是'im',第四个字符不在l到Z范围内。 |
This article is an effort to showcase the available SQL string functions to manipulate the raw data to make it more meaningful data-set for the data scientist to perform the data analysis using SQL Server. Hopefully these SQL string functions can save you both time and money!
本文是为了展示可用SQL字符串函数来处理原始数据,使其变得更有意义,从而使数据科学家可以使用SQL Server进行数据分析。 希望这些SQL字符串函数可以节省您的时间和金钱!
翻译自: https://www.sqlshack.com/sql-string-functions-for-data-munging-wrangling/
sql用于字符串的聚合函数