Flink复习3-2-4-6-1(v1.17.0)：应用开发 - DataStream API - 状态和容错 - 数据类型&序列化 - 概述

Data Types & Serialization

Supported Data Types（支持的数据类型）
- Tuples and Case Classes
- POJOs
- Primitive Types（基本数据类型）
- General Class Types（一般类型）
- Values
- Hadoop Writables
- Special Types（特殊类型）
- Type Erasure & Type Inference（类型擦除和类型推断）
Type handling in Flink（Flink中的类型处理）
Most Frequent Issues（最常见问题）
Flink’s TypeInformation class（Flink的TypeInformation类）
- Rules for POJO types（POJO类型的规则）
- Creating a TypeInformation or TypeSerializer（创建TypeInformation或TypeSerializer）
Type Information in the Scala API（Scala API中的类型信息）
- No Implicit Value for Evidence Parameter Error（证据参数错误没有隐式值）
- Generic Methods（泛型方法）
Type Information in the Java API（Java API中的类型信息）
- Type Hints in the Java API（Java API中的类型提示）
- Type extraction for Java 8 lambdas（Java 8 lambda的类型提取）
- Serialization of POJO types（POJO类型的序列化）
Disabling Kryo Fallback（禁用Kryo Fallback）
Defining Type Information using a Factory（使用Factory定义类型信息）

Apache Flink handles data types and serialization in a unique way, containing its own type descriptors, generic type extraction, and type serialization framework. This document describes the concepts and the rationale behind them.

Apache Flink以独特的方式处理数据类型和序列化，包含自己的类型描述符、泛型类型提取和类型序列化框架。本文档描述了这些概念及其背后的基本原理。

Supported Data Types（支持的数据类型）

Flink places some restrictions on the type of elements that can be in a DataStream. The reason for this is that the system analyzes the types to determine efficient execution strategies.
Flink对数据流中的元素类型进行了一些限制。这方便系统分析类型来确定高效的执行策略。

There are seven different categories of data types:
有七种不同的数据类型：

Java Tuples and Scala Case Classes
Java POJOs
Primitive Types
Regular Classes
Values
Hadoop Writables
Special Types

Tuples and Case Classes

Tuples are composite types that contain a fixed number of fields with various types. The Java API provides classes from Tuple1 up to Tuple25. Every field of a tuple can be an arbitrary Flink type including further tuples, resulting in nested tuples. Fields of a tuple can be accessed directly using the field’s name as tuple.f4, or using the generic getter method tuple.getField(int position). The field indices start at 0. Note that this stands in contrast to the Scala tuples, but it is more consistent with Java’s general indexing.
元组是包含固定数量的具有各种类型的字段的复合类型。Java API提供从Tuple1到Tuple25的类。元组的每个字段都可以是任意的Flink类型，包括更多的元组，从而产生嵌套的元组。元组的字段可以使用字段名称tuple.f4直接访问，也可以使用通用getter方法tuple.getField(int position)访问。字段索引从0开始。请注意，这与Scala元组相反，但它与Java的通用索引更一致。

DataStream<Tuple2<String, Integer>> wordCounts = env.fromElements(
    new Tuple2<String, Integer>("hello", 1),
    new Tuple2<String, Integer>("world", 2));

wordCounts.map(new MapFunction<Tuple2<String, Integer>, Integer>() {
    @Override
    public Integer map(Tuple2<String, Integer> value) throws Exception {
        return value.f1;
    }
});

wordCounts.keyBy(value -> value.f0);

POJOs

Java and Scala classes are treated by Flink as a special POJO data type if they fulfill the following requirements:
如果Java和Scala类满足以下要求，Flink会将它们视为一种特殊的POJO数据类型：

The class must be public.
类必须是公共的。
It must have a public constructor without arguments (default constructor).
它必须有一个没有参数的公共构造函数(默认构造函数)。
All fields are either public or must be accessible through getter and setter functions. For a field called foo the getter and setter methods must be named getFoo() and setFoo().
所有字段要么是公共的，要么必须通过getter和setter函数访问。对于名为foo的字段，getter和setter方法必须命名为getFoo()和setFoo()。
The type of a field must be supported by a registered serializer.
已注册的序列化程序必须支持字段的类型。

POJOs are generally represented with a PojoTypeInfo and serialized with the PojoSerializer (using Kryo as configurable fallback). The exception is when the POJOs are actually Avro types (Avro Specific Records) or produced as “Avro Reflect Types”. In that case the POJO’s are represented by an AvroTypeInfo and serialized with the AvroSerializer. You can also register your own custom serializer if required; see Serialization for further information.
POJO通常用PojoTypeInfo表示，并用PojoSerializer序列化（使用Kryo作为可配置的回退）。例外情况是POJO实际上是Avro类型（Avro特定记录）或作为“Avro反射类型”生成。在这种情况下，POJO由AvroTypeInfo表示，并使用AvroSerializer进行序列化。如果需要，您还可以注册自己的自定义序列化程序；有关详细信息，请参阅序列化。

Flink analyzes the structure of POJO types, i.e., it learns about the fields of a POJO. As a result POJO types are easier to use than general types. Moreover, Flink can process POJOs more efficiently than general types.
Flink分析POJO类型的结构，即它了解POJO的字段。因此，POJO类型比一般类型更易于使用。此外，Flink相较于一般类型能更高效地处理POJO。

You can test whether your class adheres to the POJO requirements via org.apache.flink.types.PojoTestUtils#assertSerializedAsPojo() from the flink-test-utils. If you additionally want to ensure that no field of the POJO will be serialized with Kryo, use assertSerializedAsPojoWithoutKryo() instead.
您可以通过flink-test-utils中的org.apache.flink.types.PojoTestUtils#assertSerializedAsPojo()来测试您的类是否符合POJO要求。如果您还想确保POJO的任何字段都不会使用Kryo进行序列化，请改用assertSerializedAsPojoWithoutKryo()来代替。

The following example shows a simple POJO with two public fields.
下面的示例展示了一个具有两个公共字段的简单POJO。

public class WordWithCount {

    public String word;
    public int count;

    public WordWithCount() {}

    public WordWithCount(String word, int count) {
        this.word = word;
        this.count = count;
    }
}

DataStream<WordWithCount> wordCounts = env.fromElements(
    new WordWithCount("hello", 1),
    new WordWithCount("world", 2));

wordCounts.keyBy(value -> value.word);

Primitive Types（基本数据类型）

Flink supports all Java and Scala primitive types such as Integer, String, and Double.
Flink支持所有Java和Scala基本类型，如Integer, String和Double。

General Class Types（一般类型）

Flink supports most Java and Scala classes (API and custom). Restrictions apply to classes containing fields that cannot be serialized, like file pointers, I/O streams, or other native resources. Classes that follow the Java Beans conventions work well in general.
Flink支持大多数Java和Scala类（API和自定义）。限制包含无法序列化的字段的类，如文件指针、I/O流或其他本机资源。遵循Java bean约定的类通常工作得很好。

All classes that are not identified as POJO types (see POJO requirements above) are handled by Flink as general class types. Flink treats these data types as black boxes and is not able to access their content (e.g., for efficient sorting). General types are de/serialized using the serialization framework Kryo.
Flink将所有未标识为POJO类型的类（请参阅上面的POJO要求）作为一般类型进行处理。Flink将这些数据类型视为黑盒，无法访问其内容（例如，高效排序时需要访问(我理解的)）。一般类型使用序列化框架Kryo进行反序列化。

Values

Value types describe their serialization and deserialization manually. Instead of going through a general purpose serialization framework, they provide custom code for those operations by means of implementing the org.apache.flink.types.Value interface with the methods read and write. Using a Value type is reasonable when general purpose serialization would be highly inefficient. An example would be a data type that implements a sparse vector of elements as an array. Knowing that the array is mostly zero, one can use a special encoding for the non-zero elements, while the general purpose serialization would simply write all array elements.
值类型手动描述它们的序列化和反序列化。它们没有经过通用的序列化框架，而是通过实现带有read和write方法的org.apache.flink.types.Value接口，为这些操作提供自定义代码。当通用序列化效率极低时，使用Value类型是合理的。一个例子是将元素的稀疏向量实现为数组的数据类型。知道数组大部分为零后，可以对非零元素使用特殊编码，而通用序列化将简单地写入所有数组元素。

The org.apache.flink.types.CopyableValue interface supports manual internal cloning logic in a similar way.
org.apache.flink.types.CopyableValue接口以类似的方式支持手动内部克隆逻辑。

Flink comes with pre-defined Value types that correspond to basic data types. (ByteValue, ShortValue, IntValue, LongValue, FloatValue, DoubleValue, StringValue, CharValue, BooleanValue). These Value types act as mutable variants of the basic data types: Their value can be altered, allowing programmers to reuse objects and take pressure off the garbage collector.
Flink附带了与基本数据类型相对应的预定义值类型。（ByteValue、ShortValue、IntValue、LongValue、FloatValue、DoubleValue、StringValue、CharValue、BooleanValue）。这些值类型充当基本数据类型的可变变体：它们的值可以更改，允许程序员重用对象并减轻垃圾收集器的压力。

Hadoop Writables

You can use types that implement the org.apache.hadoop.Writable interface. The serialization logic defined in the write()and readFields() methods will be used for serialization.
可以使用实现org.apache.hadoop.Writable接口的类型。在write()和readFields()方法中定义用于序列化的逻辑。

Special Types（特殊类型）

You can use special types, including Scala’s Either, Option, and Try. The Java API has its own custom implementation of Either. Similarly to Scala’s Either, it represents a value of two possible types, Left or Right. Either can be useful for error handling or operators that need to output two different types of records.
可以使用特殊类型，包括Scala的Either、Option和Try。Java API有自己的Either自定义实现。类似于Scala的Either，它表示两种可能类型的值，Left或Right。Either可用于错误处理或需要输出两种不同类型记录的操作符。

Type Erasure & Type Inference（类型擦除和类型推断）

Note: This Section is only relevant for Java.
注：本节仅与Java相关。

The Java compiler throws away much of the generic type information after compilation. This is known as type erasure in Java. It means that at runtime, an instance of an object does not know its generic type any more. For example, instances of DataStream and DataStream look the same to the JVM.
Java编译器在编译后丢弃了许多泛型类型信息。这在Java中被称为类型擦除。这意味着在运行时，对象的实例不再知道其泛型类型。例如，DataStream＜String＞和DataStream＜Long＞的实例在JVM中看起来是相同的。

Flink requires type information at the time when it prepares the program for execution (when the main method of the program is called). The Flink Java API tries to reconstruct the type information that was thrown away in various ways and store it explicitly in the data sets and operators. You can retrieve the type via DataStream.getType(). The method returns an instance of TypeInformation, which is Flink’s internal way of representing types.
Flink在准备程序执行时(当程序的主方法被调用时)需要类型信息。Flink Java API试图重建以各种方式丢弃的类型信息，并将其显式地存储在数据集和操作符中。可以通过DataStream.getType()检索类型。该方法返回TypeInformation的实例，TypeInformation是Flink表示类型的内部方式。

The type inference has its limits and needs the “cooperation” of the programmer in some cases. Examples for that are methods that create data sets from collections, such as StreamExecutionEnvironment.fromCollection(), where you can pass an argument that describes the type. But also generic functions like MapFunction may need extra type information.
类型推断有其局限性，在某些情况下需要程序员的“配合”。例如从集合中创建数据集的方法，例如StreamExecutionEnvironment.fromCollection()，可以在其中传递描述类型的参数。但是像MapFunction这样的泛型函数可能需要额外的类型信息。

The ResultTypeQueryable interface can be implemented by input formats and functions to tell the API explicitly about their return type. The input types that the functions are invoked with can usually be inferred by the result types of the previous operations.
ResultTypeQueryable接口可以通过输入格式和函数来实现，以显式地告诉API它们的返回类型。调用函数时使用的输入类型通常可以通过前面操作的结果类型推断出来。

Type handling in Flink（Flink中的类型处理）

Flink tries to infer a lot of information about the data types that are exchanged and stored during the distributed computation. Think about it like a database that infers the schema of tables. In most cases, Flink infers all necessary information seamlessly by itself. Having the type information allows Flink to do some cool things:
Flink试图推断出许多关于在分布式计算过程中交换和存储的数据类型的信息。把它想象成一个推断表模式的数据库。在大多数情况下，Flink会自己无缝地推断出所有必要的信息。有了类型信息，Flink可以做一些很酷的事情：

The more Flink knows about data types, the better the serialization and data layout schemes are. That is quite important for the memory usage paradigm in Flink (work on serialized data inside/outside the heap where ever possible and make serialization very cheap).
Flink对数据类型了解得越多，序列化和数据布局方案就越好。这对于Flink中的内存使用模式非常重要（尽可能在堆内/堆外处理序列化数据，使序列化变得非常便宜）。
Finally, it also spares users in the majority of cases from worrying about serialization frameworks and having to register types.
最后，在大多数情况下，它还使用户不必担心序列化框架和必须注册类型。

In general, the information about data types is needed during the pre-flight phase - that is, when the program’s calls on DataStream are made, and before any call to execute(), print(), count(), or collect().
一般来说，有关数据类型的信息是在预运行阶段（有时会被翻译成飞行阶段）需要的 —— 也就是说，当程序对DataStream进行调用时，以及在调用execute()、print()、count()或collect()之前。

Most Frequent Issues（最常见问题）

The most frequent issues where users need to interact with Flink’s data type handling are:
用户需要与Flink的数据类型处理进行交互的最常见问题是:

Registering subtypes: If the function signatures describe only the supertypes, but they actually use subtypes of those during execution, it may increase performance a lot to make Flink aware of these subtypes. For that, call .registerType(clazz) on the StreamExecutionEnvironment for each subtype.
注册子类型：如果函数签名只描述超类型，但它们在执行过程中实际使用了超类型的子类型，那么让Flink知道这些子类型可以大大提高性能。为此，请在StreamExecutionEnvironment上为每个子类型调用.registerType(clazz)。
Registering custom serializers: Flink falls back to Kryo for the types that it does not handle transparently by itself. Not all types are seamlessly handled by Kryo (and thus by Flink). For example, many Google Guava collection types do not work well by default. The solution is to register additional serializers for the types that cause problems. Call .getConfig().addDefaultKryoSerializer(clazz, serializer) on the StreamExecutionEnvironment. Additional Kryo serializers are available in many libraries. See 3rd party serializer for more details on working with external serializers.
注册自定义序列化器：Flink因其自身无法透明处理的类型而求助于Kryo。但并非所有类型都可以由Kryo无缝处理（Flink也是如此）。例如，默认情况下，许多Google Guava集合类型不能很好地工作。解决方案是为导致问题的类型注册额外的序列化程序。在StreamExecutionEnvironment上调用.getConfig().addDefaultKryoSerializer(clazz, serializer)。许多库中提供了额外的Kryo序列化程序。有关使用外部序列化程序的更多详细信息，请参阅第三方序列化程序。
Adding Type Hints: Sometimes, when Flink cannot infer the generic types despite all tricks, a user must pass a type hint. That is generally only necessary in the Java API. The Type Hints Section describes that in more detail.
添加类型提示：有时，当Flink无法推断出泛型类型时，用户必须传递类型提示。这通常只在Java API中是必需的。类型提示部分对此进行了更详细的描述。
Manually creating a TypeInformation: This may be necessary for some API calls where it is not possible for Flink to infer the data types due to Java’s generic type erasure. See Creating a TypeInformation or TypeSerializer for details.
手动创建TypeInformation：对于某些API调用来说，这可能是必需的，因为Java的泛型类型擦除导致Flink无法推断数据类型。有关详细信息，请参阅创建TypeInformation或TypeSerializer。

Flink’s TypeInformation class（Flink的TypeInformation类）

The class TypeInformation is the base class for all type descriptors. It reveals some basic properties of the type and can generate serializers and, in specializations, comparators for the types. (Note that comparators in Flink do much more than defining an order - they are basically the utility to handle keys)
类TypeInformation是所有类型描述符的基类。它展示了类型的一些基本属性，并可以为类型生成序列化程序，在特殊化中，还可以生成比较器。（请注意，Flink中的比较器所做的远不止定义顺序 —— 它们大体上是处理keys的实用程序）

Internally, Flink makes the following distinctions between types:
在内部，Flink对类型进行了以下区分：

Basic types: All Java primitives and their boxed form, plus void, String, Date, BigDecimal, and BigInteger.
基本类型：所有Java基元及其装箱形式，加上void、String、Date、BigDecimal和BigInteger。
Primitive arrays and Object arrays
基本数组和对象数组
Composite types
复合类型
- Flink Java Tuples (part of the Flink Java API): max 25 fields, null fields not supported
  Flink Java元组(Flink Java API的一部分)：最多25个字段，不支持空字段
- Scala case classes (including Scala tuples): null fields not supported
  Scala case类(包括Scala元组)：不支持空字段
- Row: tuples with arbitrary number of fields and support for null fields
  Row：具有任意数量字段的元组，并支持空字段
- POJOs: classes that follow a certain bean-like pattern
  POJO：遵循某种类似bean模式的类
Auxiliary types (Option, Either, Lists, Maps, …)
辅助类型(Option, Either, Lists, Maps，…)
Generic types: These will not be serialized by Flink itself, but by Kryo.
泛型类型：Flink本身不会序列化这些类型，而是由Kryo序列化。

POJOs are of particular interest, because they support the creation of complex types. They are also transparent to the runtime and can be handled very efficiently by Flink.
POJO特别有趣，因为它们支持创建复杂类型。它们对运行时也是透明的，Flink可以非常有效地处理它们。

Rules for POJO types（POJO类型的规则）

Flink recognizes a data type as a POJO type (and allows “by-name” field referencing) if the following conditions are fulfilled:
如果满足以下条件，Flink将数据类型识别为POJO类型（并允许“by-name”字段引用）：

The class is public and standalone (no non-static inner class)
类是公共且独立的(没有非静态内部类)
The class has a public no-argument constructor
该类有一个公共的无参数构造函数
All non-static, non-transient fields in the class (and all superclasses) are either public (and non-final) or have a public getter- and a setter- method that follows the Java beans naming conventions for getters and setters.
类（和所有超类）中的所有非静态、非瞬态字段要么是公共的（也是非最终的），要么有一个公共getter和setter方法，该方法遵循getter和seter的Java bean命名约定。

Note that when a user-defined data type can’t be recognized as a POJO type, it must be processed as GenericType and serialized with Kryo.
请注意，当用户定义的数据类型无法识别为POJO类型时，必须将其处理为GenericType并使用Kryo进行序列化。

Creating a TypeInformation or TypeSerializer（创建TypeInformation或TypeSerializer）

To create a TypeInformation object for a type, use the language specific way:
要为类型创建TypeInformation对象，请使用特定于语言的方式:

Because Java generally erases generic type information, you need to pass the type to the TypeInformation construction:
因为Java通常会擦除泛型类型信息，所以需要将类型传递给TypeInformation构造:

For non-generic types, you can pass the Class:
对于非泛型类型，可以传递Class:

TypeInformation<String> info = TypeInformation.of(String.class);

For generic types, you need to “capture” the generic type information via the TypeHint:
对于泛型类型，你需要通过TypeHint“捕获”泛型类型信息:

TypeInformation<Tuple2<String, Double>> info = TypeInformation.of(new TypeHint<Tuple2<String, Double>>(){});

Internally, this creates an anonymous subclass of the TypeHint that captures the generic information to preserve it until runtime.
在内部，这将创建TypeHint的一个匿名子类，该子类捕获泛型信息并将其保存到运行时。

To create a TypeSerializer, simply call typeInfo.createSerializer(config) on the TypeInformation object.
要创建TypeSerializer，只需在TypeInformation对象上调用typeInfo.createSerializer(config)。

The config parameter is of type ExecutionConfig and holds the information about the program’s registered custom serializers. Where ever possibly, try to pass the programs proper ExecutionConfig. You can usually obtain it from DataStream via calling getExecutionConfig(). Inside functions (like MapFunction), you can get it by making the function a Rich Function and calling getRuntimeContext().getExecutionConfig().
config参数的类型是ExecutionConfig，并保存有关程序注册的自定义序列化器的信息。在任何可能的地方，尝试向程序传递正确的ExecutionConfig。通常可以通过调用getExecutionConfig()从DataStream获得它。在函数内部(如MapFunction)，可以通过将函数设置为Rich函数并调用getRuntimeContext().getexecutionconfig()来获得它。

Type Information in the Scala API（Scala API中的类型信息）

Scala has very elaborate concepts for runtime type information though type manifests and class tags. In general, types and methods have access to the types of their generic parameters - thus, Scala programs do not suffer from type erasure as Java programs do.
Scala通过类型清单和类标签为运行时类型信息提供了非常详细的概念。一般来说，类型和方法可以访问其泛型参数的类型——因此，Scala程序不会像Java程序那样遭受类型擦除。

In addition, Scala allows to run custom code in the Scala Compiler through Scala Macros - that means that some Flink code gets executed whenever you compile a Scala program written against Flink’s Scala API.
此外，Scala允许通过Scala宏在Scala编译器中运行自定义代码，这意味着每当编译针对Flink的Scala API编写的Scala程序时，就会执行一些Flink代码。

We use the Macros to look at the parameter types and return types of all user functions during compilation - that is the point in time when certainly all type information is perfectly available. Within the macro, we create a TypeInformation for the function’s return types (or parameter types) and make it part of the operation.
我们使用宏来查看编译期间所有用户函数的参数类型和返回类型——这是所有类型信息都完全可用的时间点。在宏中，我们为函数的返回类型(或参数类型)创建TypeInformation，并使其成为操作的一部分。

No Implicit Value for Evidence Parameter Error（证据参数错误没有隐式值）

In the case where TypeInformation could not be created, programs fail to compile with an error stating “could not find implicit value for evidence parameter of type TypeInformation”.
在无法创建TypeInformation的情况下，程序编译失败，并出现“无法找到TypeInformation类型的证据参数的隐式值”的错误。

A frequent reason if that the code that generates the TypeInformation has not been imported. Make sure to import the entire flink.api.scala package.
一个常见的原因是生成TypeInformation的代码尚未导入。请确保导入整个flink.api.scala包。

import org.apache.flink.api.scala._

Another common cause are generic methods, which can be fixed as described in the following section.
另一个常见的原因是泛型方法，可以按照下一节中的描述进行修复。

Generic Methods（泛型方法）

Consider the following case below:
考虑以下情况：

def selectFirst[T](input: DataStream[(T, _)]) : DataStream[T] = {
  input.map { v => v._1 }
}

val data : DataStream[(String, Long) = ...

val result = selectFirst(data)

For such generic methods, the data types of the function parameters and return type may not be the same for every call and are not known at the site where the method is defined. The code above will result in an error that not enough implicit evidence is available.
对于此类泛型方法，函数参数的数据类型和返回类型可能在每次调用中都不相同，并且在定义方法的地方是未知的。上面的代码将导致一个错误（即：没有足够的隐式证据可用）。

In such cases, the type information has to be generated at the invocation site and passed to the method. Scala offers implicit parameters for that.
在这种情况下，必须在调用的地方生成类型信息并将其传递给方法。Scala为此提供了隐式参数。

The following code tells Scala to bring a type information for T into the function. The type information will then be generated at the sites where the method is invoked, rather than where the method is defined.
下面的代码告诉Scala将T的类型信息带入函数中。然后，类型信息将在调用方法的位置生成，而不是在定义方法的位置生成。

def selectFirst[T : TypeInformation](input: DataStream[(T, _)]) : DataStream[T] = {
  input.map { v => v._1 }
}

Type Information in the Java API（Java API中的类型信息）

In the general case, Java erases generic type information. Flink tries to reconstruct as much type information as possible via reflection, using the few bits that Java preserves (mainly function signatures and subclass information). This logic also contains some simple type inference for cases where the return type of a function depends on its input type:
在一般情况下，Java会擦除泛型类型信息。Flink尝试通过反射重建尽可能多的类型信息，使用Java保留的少量信息(主要是函数签名和子类信息)。对于函数的返回类型取决于其输入类型的情况，该逻辑还会包含一些简单的类型推断:

public class AppendOne<T> implements MapFunction<T, Tuple2<T, Long>> {

    public Tuple2<T, Long> map(T value) {
        return new Tuple2<T, Long>(value, 1L);
    }
}

There are cases where Flink cannot reconstruct all generic type information. In that case, a user has to help out via type hints.
在某些情况下，Flink无法重建所有泛型类型信息。在这种情况下，用户必须通过类型提示提供帮助。

Type Hints in the Java API（Java API中的类型提示）

In cases where Flink cannot reconstruct the erased generic type information, the Java API offers so called type hints. The type hints tell the system the type of the data stream or data set produced by a function:
在Flink无法重建被擦除的泛型类型信息的情况下，Java API提供了所谓的类型提示。类型提示告诉系统由函数产生的数据流或数据集的类型:

DataStream<SomeType> result = stream
    .map(new MyGenericNonInferrableFunction<Long, SomeType>())
        .returns(SomeType.class);

The returns statement specifies the produced type, in this case via a class. The hints support type definition via
returns语句指定生成的类型，在本例中是通过一个类指定的。提示支持通过以下两种定义类型

Classes, for non-parameterized types (no generics)
类，用于非参数化类型(无泛型)
TypeHints in the form of returns(new TypeHint>(){}). The TypeHint class can capture generic type information and preserve it for the runtime (via an anonymous subclass).
returns(new TypeHint>(){}) 形式的类型提示。TypeHint类可以捕获泛型类型信息并为运行时保留它（通过匿名子类）。

Type extraction for Java 8 lambdas（Java 8 lambda的类型提取）

Type extraction for Java 8 lambdas works differently than for non-lambdas, because lambdas are not associated with an implementing class that extends the function interface.
Java 8 lambda的类型提取与非lambda不同，因为lambda与扩展函数接口的实现类没有关联。

Currently, Flink tries to figure out which method implements the lambda and uses Java’s generic signatures to determine the parameter types and the return type. However, these signatures are not generated for lambdas by all compilers. If you observe unexpected behavior, manually specify the return type using the returns method.
目前，Flink会试图找出哪个方法实现了lambda，并使用Java的泛型签名来确定参数类型和返回类型。但是，并非所有编译器都为lambda生成这些签名。如果观察到意外行为，请使用return方法手动指定返回类型。

Serialization of POJO types（POJO类型的序列化）

The PojoTypeInfo is creating serializers for all the fields inside the POJO. Standard types such as int, long, String etc. are handled by serializers we ship with Flink. For all other types, we fall back to Kryo.
PojoTypeInfo为POJO中的所有字段创建序列化器。int、long、String等标准类型由Flink附带的序列化器处理。对于其他类型，求助于Kryo。

If Kryo is not able to handle the type, you can ask the PojoTypeInfo to serialize the POJO using Avro. To do so, you have to call
如果Kryo无法处理该类型，可以要求PojoTypeInfo使用Avro序列化POJO。要做到这一点，必须调用

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().enableForceAvro();

Note that Flink is automatically serializing POJOs generated by Avro with the Avro serializer.
注意，Flink使用Avro序列化器自动序列化Avro生成的pojo。

If you want your entire POJO Type to be treated by the Kryo serializer, set
如果希望整个POJO类型由Kryo序列化器处理，请设置

final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.getConfig().enableForceKryo();

If Kryo is not able to serialize your POJO, you can add a custom serializer to Kryo, using
如果Kryo不能序列化POJO，可以向Kryo添加一个自定义序列化器，使用

env.getConfig().addDefaultKryoSerializer(Class<?> type, Class<? extends Serializer<?>> serializerClass);

There are different variants of these methods available.
这些方法有不同的变体。

Disabling Kryo Fallback（禁用Kryo Fallback）

There are cases when programs may want to explicitly avoid using Kryo as a fallback for generic types. The most common one is wanting to ensure that all types are efficiently serialized either through Flink’s own serializers, or via user-defined custom serializers.
在某些情况下，程序可能希望显式地避免使用Kryo作为泛型类型的后备。最常见的一个是希望通过Flink自己的序列化器或用户定义的自定义序列化器确保所有类型都有效地序列化。

The setting below will raise an exception whenever a data type is encountered that would go through Kryo:
每当遇到要通过Kryo的数据类型时，以下设置都会引发异常：

env.getConfig().disableGenericTypes();

Defining Type Information using a Factory（使用Factory定义类型信息）

A type information factory allows for plugging-in user-defined type information into the Flink type system. You have to implement org.apache.flink.api.common.typeinfo.TypeInfoFactory to return your custom type information. The factory is called during the type extraction phase if either the corresponding type or a POJO’s field using this type has been annotated with the @org.apache.flink.api.common.typeinfo.TypeInfo annotation.
类型信息工厂允许将用户定义的类型信息插入Flink类型系统。您必须实现org.apache.flink.api.common.typeinfo.TypeInfoFactory才能返回您的自定义类型信息。如果相应的类型或使用该类型的POJO字段已使用@org.apache.flink.api.common.typeinfo.TypeInfo注解进行了注释，则会在类型提取阶段调用工厂。

Type information factories can be used in both the Java and Scala API.
类型信息工厂可以在Java和Scala API中使用。

In a hierarchy of types the closest factory will be chosen while traversing upwards, however, a built-in factory has highest precedence. A factory has also higher precedence than Flink’s built-in types, therefore you should know what you are doing.
在类型层次结构中，向上遍历时将选择最近的工厂，但是，内置工厂具有最高优先级。工厂也比Flink的内置类型具有更高的优先级，因此您应该知道自己在做什么。

The following example shows how to annotate a custom type MyTuple and supply custom type information for it using a factory in Java.
以下示例显示如何使用Java中的工厂对自定义类型MyTuple进行注释并为其提供自定义类型信息。

The annotated custom type:
带注释的自定义类型：

@TypeInfo(MyTupleTypeInfoFactory.class)
public class MyTuple<T0, T1> {
  public T0 myfield0;
  public T1 myfield1;
}

The factory supplying custom type information:
提供自定义类型信息的工厂：

public class MyTupleTypeInfoFactory extends TypeInfoFactory<MyTuple> {

  @Override
  public TypeInformation<MyTuple> createTypeInfo(Type t, Map<String, TypeInformation<?>> genericParameters) {
    return new MyTupleTypeInfo(genericParameters.get("T0"), genericParameters.get("T1"));
  }
}

Instead of annotating the type itself, which may not be possible for third-party code, you can also annotate the usage of this type inside a valid Flink POJO like this:
您也可以在有效的Flink POJO中注释此类型的用法，而不是注释类型本身（这对于第三方代码来说也许是不可能的），如下所示：

public class MyPojo {
  public int id;

  @TypeInfo(MyTupleTypeInfoFactory.class)
  public MyTuple<Integer, String> tuple;
}

The method createTypeInfo(Type, Map>) creates type information for the type the factory is targeted for. The parameters provide additional information about the type itself as well as the type’s generic type parameters if available.

方法createTypeInfo(Type, Map>)为工厂的目标类型创建类型信息。参数提供了关于类型本身的附加信息，以及类型的泛型类型参数（如果可用）。

If your type contains generic parameters that might need to be derived from the input type of a Flink function, make sure to also implement org.apache.flink.api.common.typeinfo.TypeInformation#getGenericParameters for a bidirectional mapping of generic parameters to type information.
如果您的类型包含可能需要从Flink函数的输入类型派生的泛型参数，请确保也实现了org.apache.flink.api.common.typeinfo.TypeInformation#getGenericParameters，以实现泛型参数到类型信息的双向映射。

你可能感兴趣的:(flink复习,flink,大数据)

Flink中的SQL Client和SQL Gateway BigDataMLApplication flink flink sql gateway
Flink中的SQLClient和SQLGateway对比目录定义基本原理适用场景主要区别常用运维命令示例官方链接正文1.定义SQLClient：FlinkSQLClient是一种用于提交和执行FlinkSQL语句的命令行界面或图形界面工具。SQLGateway：FlinkSQLGateway是一个独立的服务，它允许客户端通过RESTfulAPI将SQL查询提交到Flink集群。2.基本原理SQL
数据分析：低代码平台助力大数据时代的飞跃发展快乐非自愿数据分析低代码大数据
随着信息技术的突飞猛进，我们身处于一个数据量空前增长的时代——大数据时代。在这个时代背景下，数据分析已经成为企业决策、政策制定、科学研究等众多领域不可或缺的重要工具。然而，面对海量的数据和日益复杂多变的分析需求，传统的数据分析方法往往捉襟见肘，难以应对。幸运的是，低代码平台的兴起为大数据分析注入了新的活力，成为推动大数据时代发展的重要力量。低代码平台，顾名思义，是一种通过少量甚至无需编写代码，就能
没有如释重负君远近
虽然只有短短的一个多月的努力复习时间，但今天的整个考试经过，还是发现了效果的，题目做的比较自如，没有慌里慌张，而且提前五分钟完成。至于考试成绩，没有实足的把握，60分都不敢保证。但绝对相信自己，比去年肯定要好！今天早早的赶到考场，见到了刘老师，谈起来学习情况，坦率的说，真的是自己不够重视。总以为会很难，没有信心。其实不是的，只要认真对待，树立足够的信心，绝对可以通过考试的。还向老师询问了，后续再报
2022-03-10 花满三春
梦想花开六月的风吹在我的脸上，在我的心里留下了遗憾，看着这惨不忍睹的分数，我收起了我的年少轻狂。天气很热，但我的心很冷，我盯着镜子中的自己，握紧了拳头，眼睛红红的，突然，一行晶莹的液体从我脸上滑落，那些晶莹的液体不断地在我脸上落下，唉？镜子中的我脸怎么有泪痕？哦，我原来哭了,我笑了，我不知道我到底在笑什么，是笑我怎么这么懦弱，还是笑我这么不争气。努力复习了这么久，小考才考这么点分，我放任我的泪水，
Apache Kafka的伸缩性探究：实现高性能、弹性扩展的关键 i289292951 kafka kafka
引言ApacheKafka作为当今最流行的消息中间件之一，以其强大的伸缩性著称。在大数据处理、流处理和实时数据集成等领域，Kafka的伸缩性为其在面临急剧增长的数据流量和多样化业务需求时提供了无与伦比的扩展能力。本文将深入探讨Kafka如何通过其独特的架构设计实现高水平的伸缩性，以及在实际部署中如何优化和利用这一特性。一、Kafka伸缩性的核心设计分区（Partitioning）与水平扩展Kafk
Flink算子通用状态应用测试样例公子乂 flink java servlet
Flink算子通用状态应用测试样例1.获取Flink执行环境finalStreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();env.setParallelism(1);2.创建数据源，生成随机数据DataStream>source=env.addSource(newSourceFunct
山东省大数据局副局长禹金涛一行莅临聚合数据走访调研聚合数据 API 大数据人工智能 API
3月19日，山东省大数据局党组成员、副局长禹金涛莅临聚合数据展开考察调研。山东省大数据局数据应用管理与安全处处长杨峰，副处长都海明参加调研，苏州市大数据局副局长汤晶陪同。聚合数据董事长左磊等人接待来访。调研组一行参观了聚合数据展厅，了解了聚合数据的发展历程、数据产品、应用案例、奖项荣誉等情况。并就企业在数据处理和应用方面取得的成绩进行了深入交流。作为最早一批进入大数据行业的企业，聚合数据深耕行业十
智慧公厕的先进技术应用中期科技ZONTREE 智慧厕所智慧公厕智慧城市
公共厕所一直以来都是城市管理中一个重要的工作，但设施老化、环境脏乱、服务质量低下等问题一直困扰着城市居民。然而，随着科技的进步和数字技术的应用，智慧公厕的建设正在改变这一现状。智慧公厕通过对所在辖区内所有公共厕所的全域感知、全网协同、全业务融合和全场景智慧的赋能，“千厕一云”的公共厕所云管理模式应运而生。智慧公厕的云端多屏管理，将各个公厕连接在一起，实现信息的共享和管理的集中化。通过大数据、云计算
亲子日记37 HelloFox_447c
2018年10月31星期三天气晴马上要考试了，孩子们也进入了复习，晚上接到儿子，问了儿子今天在学校都学了什么，儿子说：我们复习了好多好多。回到家，吃过晚饭，让儿子玩了一会开始检查作业，我检查作业的同时让儿子把今天的字练完，然后做了两份前几天买的试题，其他都还好，就是这个阅读题目有待提高王子轩妈妈一年级五班
关于HDP的20道高级运维面试题编织幻境的妖运维
1.描述HDP的主要组件及其作用。HDP（HortonworksDataPlatform）的主要组件包括Hadoop框架、HDFS、MapReduce、YARN以及Hadoop生态系统中的其他关键工具，如Spark、Flink、Hive、HBase等。以下是对这些组件及其作用的具体描述：Hadoop框架:Hadoop是一个开源的分布式计算框架，用Java语言编写，用于存储和处理大规模数据集。它广义
今日所学席全红
今天语文复习了曹冲称象，玲玲的画，一封信，妈妈睡了复习了这些，数学复习了从第一单元复习到了第五单元，音乐课。
【Hadoop】使用Scala与Spark连接ClickHouse进行数据处理音乐学家方大刚 Scala Hadoop hadoop scala spark
风不懂不懂得叶的梦月不听不听闻窗里琴声意难穷水不见不曾见绿消红霜不知不知晓将别人怎道珍重落叶有风才敢做一个会飞的梦孤窗有月才敢登高在夜里从容桃花有水才怕身是客身是客此景不能久TieYann(铁阳)、薄彩生《不知晓》在大数据分析和处理领域，ApacheSpark是一个广泛使用的高性能、通用的计算框架，而ClickHouse作为一个高性能的列式数据库，特别适合在线分析处理（OLAP）。结合Scala语
感恩单3.9 krysdracula
1.感谢妈妈，今天为我们做的好吃的饭菜2.感谢今天下的一场雨，一场春雨一场暖，春天来了3.感谢我的电脑和手机，今天也辛苦工作4.感谢教材，复习了中诊知识5.感谢网上的简历模版，做了一个简历6.感谢今天给全国大学生上的思想政治课，从政治角度解读疫情…真的让我感触很大7.感谢徐涛老师的历史中的别样人生课，今天学习了陈天华和唐绍仪8.感谢电影《婚姻故事》，虽然没什么大的感触，但是却很真实，就像老师说的，
2023-4-11夜间日记一山一水一草一木
今天是什么日子起床：八点就寝：十二点天气：晴天心情：一般般纪念日：今天在贵州医科大学调剂笔试任务清单昨日完成的任务，最重要的三件事：笔试复习，来贵阳参加资格复查，一个人出门改进：总是想打麻将习惯养成：每天记单词，周目标·完成进度去复试，去走向人群，去锻炼自己的勇气，遇事不慌处变不惊，胸有成竹学习·信息·阅读你当像鸟飞向你的山健康·饮食·锻炼吃了两块面包人际·家人·朋友和同学聊天，好久没和妈妈打电话
2021年 10月23日奥尔巴尼阴一生守望一人
今天复习了一下午宪法，效率不算低。吃饭的时候在B站发现了好些知识区的宝藏UP主，从艺术到哲学，从电影到历史，感觉又可以学到不少有意思的东西了。看到有人讲哈耶克，我便跑到亚马逊上买了本罗素的《西方哲学史》来先复习复习。明天就能到，难得老美能做到这么快。
2022-8-29晨间日记明心279
今天是什么日子起床：5.40就寝：天气：多云心情：开心纪念日：任务清单昨日完成的任务，最重要的三件事：改进：习惯养成：周目标·完成进度学习·信息·阅读上午复习古书下午看教材第二册健康·饮食·锻炼站桩40早上吃大米粥煮花生米中午麻辣米线晚上青菜豆腐火锅人际·家人·朋友转100红包给流浪猫安置点，有一只猫咪被领养后又被弃养，发情期，需要做绝育，听说后有些动了恻隐之心，转了一个红包用于资助流浪猫孩子晚上
今天不知道写什么了信天翁_
每天100字，看起来蛮容易，但是今天到底要写些什么呢，真的感觉没有什么好写的了。今天没有听课，没有复习，玩也没有好好地玩。好在还是带孩子出去遛了弯，也按照辅食训练营教的内容给孩子做了种类丰富的辅食，虽然卖相很难看，但味道也许还可以，小家伙都吃光了。什么时候才能找回良好的状态呢？今天又没能早睡，希望明天不要这么颓了
vue动态获取本地图片浅墨_东 vue.js javascript 前端
最近在复习插槽有关知识的时候，遇到一个问题。我想把本地图片动态的展示到页面上，却实现不了。后来搜索了很多有关这方面的知识，才得到了解决。展示错误情况（一）直接使用相对路径importChildfrom'./components/child.vue'exportdefault{data(){return{list:[{name:'小明',avatar:'@/assets/剑士.jpg'},{name
17岁，请继续向阳而生 Harper_Y
再见了，16岁。当明天的第一缕晨曦洒入，我便已经17岁了。这是个准成年的年纪了，要真正准备好承担责任而独立。这一年我将经历小高考，进入高三，备战高考……密集的安排，节奏极快的复习，我想我已经准备好了。心中有梦想，才能行驶到更远的地方。恰当的计划，复盘将助我一臂之力。积极的心态是我航海的罗盘，而无论是老师还是亲人都是我最得力的助手，我会向着灯塔，向着彼岸，扬帆起航，劈波斩浪。回首我的16岁，是跌宕起
高三物理复习之我见芦荟葱葱
附上高三总结会上因成绩优异做的经验分享，希望得到各位同仁的批评指正!尊敬的领导、老师们：下午好！感谢领导和老师给我这次机会！在李校长和王校长的领导下，高三级部上下一盘棋，工作讲方法，办事讲效率。每一位高三老师秉承“山高我为峰，人优我英雄”的英雄品质，克己奉公，锐意进取，为学校更快更好发展贡献着自己的力量。作为高三的一份子，下面汇报一下个人工作，希望得到领导和老师们的批评指正，以求在以后的教学工作中
复习笔记队列李不存笔记
约瑟夫问题：约瑟夫问题：有N个人围成一圈，每个人都有一个编号，编号由入圈的顺序决定，第一个入圈的人编号为1，最后一个为N，从第k(1queue=newLinkedList<>();Scannersc=newScanner(System.in);intn=sc.nextInt();intm=sc.nextInt();for(inti=1;i<=n;i++){queue.add(i);}intcoun
中原焦点团队石丽焦点解决网络班坚持原创分享第317天2020年4月17日点石成金的尘世纷扰
《焦点生活化生活焦点话是一件很难的事情》今天晚上对白皮书上的部分内容做了复习，在焦点的学习上理论知识很好理解。对于一些基础的理论当下就能接受，也能很好的明白，但是往往在运用的过程当中却用不上做不到。就说我们要保持在当事人身后一小步来说，假如说把自己的孩子当成当事人，我往往站在母亲的角色上，却很难保持在她身后一小步。在生活当中我们处处为孩子安排，设计未来，总是在焦虑他多少年以后的事情，从来没有给孩子
最后的冲刺复习策略感恩遇见0331
初四中考目前已进入最后冲刺阶段，怎样把有限的时间充分合理利用起来，帮助学生积极备战，迫在眉睫。上午学校领导专门就如何在临近中考的这段时间的如何开展有效复习，召开了专题会议。学校乔校长，教务处张秀贞主任，语数外三大教研组的组长以及初四统考学科的备课组长参加会议。现将会议纪要梳理如下：数学最后这个阶段，建议以基础知识的梳理、夯实为主，避免再去钻研偏难怪题。以近几次学生们的考试成绩来看，拔尖学生不多，建
一文详解大数据时代与低代码开发应用快乐非自愿大数据低代码
随着信息技术的飞速发展，我们迎来了一个崭新的时代——大数据时代。在这个时代，数据成为了一种新的资源，大数据技术的应用成为了推动社会进步的关键力量。而在大数据技术的浪潮中，低代码开发应用也逐渐崭露头角，以其高效、灵活的特点，成为大数据时代的重要支撑。大数据时代的来临随着科技的飞速发展和互联网的广泛普及，我们迎来了一个被称为“大数据时代”的全新时代。这个时代，数据无处不在，无时不刻不在增长，其规模之大
Spark面试整理-Spark是什么？不务正业的猿面试 Spark spark 大数据分布式
ApacheSpark是一个开源的分布式计算系统，它提供了一个用于大规模数据处理的快速、通用、易于使用的平台。它最初是在加州大学伯克利分校的AMPLab开发的，并于2010年开源。自那时起，Spark已经成为大数据处理中最受欢迎和广泛使用的框架之一。下面是Spark的一些关键特点：速度：Spark使用了先进的DAG（有向无环图）执行引擎，可以支持循环数据流和内存计算。这使得Spark在数据处理方面
请介绍一下大数据主要是干什么的？决策支持预测分析用户行为分析个性化服务操作优化风险管理创新与产品开发加拿大卡尔加里大学历史背景学术结构研究和创新校园设施盛溪的猫猫感悟大数据英语加拿大
目录请介绍一下大数据主要是干什么的？决策支持预测分析用户行为分析个性化服务操作优化风险管理创新与产品开发加拿大卡尔加里大学历史背景学术结构研究和创新校园设施国际化学生生活大语言模型目前的问题卡尔加里经济地理和气候文化和活动教育交通绿色城市AVL树的旋转单右旋（LL旋转）单左旋（RR旋转）左右旋（LR旋转）右左旋（RL旋转）请介绍一下大数据主要是干什么的？大数据是一个涉及从极其庞大和复杂的数据集中提
Flink 面试题总结及答案 wending-Y Flink 入门到实践 flink 大数据
基础state的分类keystate和operatestatestate的重分布Flink状态管理详解：KeyedState和OperatorListState深度解析-掘金checkpoint和savepointhttps://zhuanlan.zhihu.com/p/79526638flinkjob的容错策略如果在没有持续消息输出的情况下，如何定时输出主要是现实有可能不会一直有消息输入，但是要
Flink 批作业消费kafka wending-Y Flink 入门到实践 flink kafka 大数据
文章目录示例代码原理总是kafka数据源可以是有界数据源，也可以是无界数据源示例代码publicstaticvoidmain(String[]args){StreamExecutionEnvironmentenv=StreamExecutionEnvironment.getExecutionEnvironment();env.setParallelism
Flink源码-6-JobMaster 启动任务 wending-Y Flink 入门到实践 flink
JobMasterjobmaster负责执行整个任务入口类org.apache.flink.runtime.jobmaster.JobMasterpublicCompletableFuturestart(finalJobMasterIdnewJobMasterId)throwsException{//makesurewereceiveRPCandasynccallsstart();returnca
GEE在灾害预警中的遥感云大数据应用及GPT模型辅助分析 AIzmjl GPT 生态遥感大数据 gpt gee 灾害预警水体湿地遥感
随着遥感技术的快速发展，云大数据在灾害、水体与湿地领域的应用日益广泛。通过遥感云大数据，我们能够实时获取灾害发生地的影像信息，为灾害预警、应急响应提供有力支持。同时，在水体与湿地监测方面，遥感云大数据也发挥着重要作用，帮助我们了解水体的分布、变化以及湿地的生态状况。近年来，GPT模型在自然语言处理领域取得了显著成果，其强大的文本生成和理解能力为遥感云大数据的应用提供了新的可能。通过将GPT模型与遥
深入浅出Java Annotation(元注解和自定义注解） Josh_Persistence Java Annotation 元注解自定义注解
一、基本概述　　 Annontation是Java5开始引入的新特征。中文名称一般叫注解。它提供了一种安全的类似注释的机制，用来将任何的信息或元数据（metadata）与程序元素（类、方法、成员变量等）进行关联。　　更通俗的意思是为程序的元素（类、方法、成员变量）加上更直观更明了的说明，这些说明信息是与程序的业务逻辑无关，并且是供指定的工具或
mysql优化特定类型的查询 annan211 java 工作 mysql
本节所介绍的查询优化的技巧都是和特定版本相关的，所以对于未来mysql的版本未必适用。 1 优化count查询对于count这个函数的网上的大部分资料都是错误的或者是理解的都是一知半解的。在做优化之前我们先来看看真正的count()函数的作用到底是什么。 count()是一个特殊的函数，有两种非常不同的作用，他可以统计某个列值的数量，也可以统计行数。在统
MAC下安装多版本JDK和切换几种方式棋子chessman jdk
环境： MAC AIR,OS X 10.10,64位历史：过去 Mac 上的 Java 都是由 Apple 自己提供，只支持到 Java 6，并且OS X 10.7 开始系统并不自带（而是可选安装）（原自带的是1.6）。后来 Apple 加入 OpenJDK 继续支持 Java 6，而 Java 7 将由 Oracle 负责提供。在终端中输入jav
javaScript （1） Array_06 JavaScript java 浏览器
JavaScript 1、运算符　　运算符就是完成操作的一系列符号，它有七类：　　赋值运算符（=,+=,-=,*=,/=,%=,<<=,>>=,|=,&=）、算术运算符(+,-,*,/,++,--,%)、比较运算符(>,<,<=,>=,==,===,!=,!==)、逻辑运算符(||,&&,!)、条件运算(?:)、位
国内顶级代码分享网站袁潇含 java jdk oracle .net PHP
现在国内很多开源网站感觉都是为了利益而做的当然利益是肯定的,否则谁也不会免费的去做网站 &
Elasticsearch、MongoDB和Hadoop比较随意而生 mongodb hadoop 搜索引擎
IT界在过去几年中出现了一个有趣的现象。很多新的技术出现并立即拥抱了“大数据”。稍微老一点的技术也会将大数据添进自己的特性，避免落大部队太远，我们看到了不同技术之间的边际的模糊化。假如你有诸如Elasticsearch或者Solr这样的搜索引擎，它们存储着JSON文档，MongoDB存着JSON文档，或者一堆JSON文档存放在一个Hadoop集群的HDFS中。你可以使用这三种配
mac os 系统科研软件总结张亚雄 mac os
1.1 Microsoft Office for Mac 2011 大客户版，自行搜索。 1.2 Latex （MacTex）: 系统环境：https://tug.org/mactex/ &nb
Maven实战（四）生命周期 AdyZhang maven
1. 三套生命周期 Maven拥有三套相互独立的生命周期，它们分别为clean，default和site。每个生命周期包含一些阶段，这些阶段是有顺序的，并且后面的阶段依赖于前面的阶段，用户和Maven最直接的交互方式就是调用这些生命周期阶段。以clean生命周期为例，它包含的阶段有pre-clean, clean 和 post
Linux下Jenkins迁移 aijuans Jenkins
1. 将Jenkins程序目录copy过去源程序在/export/data/tomcatRoot/ofctest-jenkins.jd.com下面 tar -cvzf jenkins.tar.gz ofctest-jenkins.jd.com &
request.getInputStream()只能获取一次的问题 ayaoxinchao request Inputstream
问题：在使用HTTP协议实现应用间接口通信时，服务端读取客户端请求过来的数据，会用到request.getInputStream()，第一次读取的时候可以读取到数据，但是接下来的读取操作都读取不到数据原因： 1. 一个InputStream对象在被读取完成后，将无法被再次读取，始终返回-1； 2. InputStream并没有实现reset方法（可以重
数据库SQL优化大总结之百万级数据库优化方案 BigBird2012 SQL优化
网上关于SQL优化的教程很多，但是比较杂乱。近日有空整理了一下，写出来跟大家分享一下，其中有错误和不足的地方，还请大家纠正补充。这篇文章我花费了大量的时间查找资料、修改、排版，希望大家阅读之后，感觉好的话推荐给更多的人，让更多的人看到、纠正以及补充。 1.对查询进行优化，要尽量避免全表扫描，首先应考虑在 where 及 order by 涉及的列上建立索引。 2.应尽量避免在 where
jsonObject的使用 bijian1013 java json
在项目中难免会用java处理json格式的数据，因此封装了一个JSONUtil工具类。 JSONUtil.java package com.bijian.json.study; import java.util.ArrayList; import java.util.Date; import java.util.HashMap;
[Zookeeper学习笔记之六]Zookeeper源代码分析之Zookeeper.WatchRegistration bit1129 zookeeper
Zookeeper类是Zookeeper提供给用户访问Zookeeper service的主要API，它包含了如下几个内部类首先分析它的内部类，从WatchRegistration开始，为指定的znode path注册一个Watcher， /** * Register a watcher for a particular p
【Scala十三】Scala核心七：部分应用函数 bit1129 scala
何为部分应用函数？ Partially applied function: A function that’s used in an expression and that misses some of its arguments.For instance, if function f has type Int => Int => Int, then f and f(1) are p
Tomcat Error listenerStart 终极大法 ronin47 tomcat
Tomcat报的错太含糊了，什么错都没报出来，只提示了Error listenerStart。为了调试，我们要获得更详细的日志。可以在WEB-INF/classes目录下新建一个文件叫logging.properties，内容如下 Java代码 handlers = org.apache.juli.FileHandler, java.util.logging.ConsoleHa
不用加减符号实现加减法 BrokenDreams 实现
今天有群友发了一个问题，要求不用加减符号(包括负号)来实现加减法。分析一下，先看最简单的情况，假设1+1，按二进制算的话结果是10，可以看到从右往左的第一位变为0，第二位由于进位变为1。
读《研磨设计模式》-代码笔记-状态模式-State bylijinnan java 设计模式
声明：本文只为方便我个人查阅和理解，详细的分析以及源代码请移步原作者的博客http://chjavach.iteye.com/ /* 当一个对象的内在状态改变时允许改变其行为，这个对象看起来像是改变了其类状态模式主要解决的是当控制一个对象状态的条件表达式过于复杂时的情况把状态的判断逻辑转移到表示不同状态的一系列类中，可以把复杂的判断逻辑简化如果在
CUDA程序block和thread超出硬件允许值时的异常 cherishLC CUDA
调用CUDA的核函数时指定block 和 thread大小，该大小可以是dim3类型的（三维数组），只用一维时可以是usigned int型的。以下程序验证了当block或thread大小超出硬件允许值时会产生异常！！！GPU根本不会执行运算！！！所以验证结果的正确性很重要！！！在VS中创建CUDA项目会有一个模板，里面有更详细的状态验证。以下程序在K5000GPU上跑的。
诡异的超长时间GC问题定位 chenchao051 jvm cms GC hbase swap
HBase的GC策略采用PawNew+CMS, 这是大众化的配置，ParNew经常会出现停顿时间特别长的情况，有时候甚至长到令人发指的地步，例如请看如下日志： 2012-10-17T05:54:54.293+0800: 739594.224: [GC 739606.508: [ParNew: 996800K->110720K(996800K), 178.8826900 secs] 3700
maven环境快速搭建 daizj 安装 mavne 环境配置
一下载maven 安装maven之前，要先安装jdk及配置JAVA_HOME环境变量。这个安装和配置java环境不用多说。 maven下载地址：http://maven.apache.org/download.html，目前最新的是这个apache-maven-3.2.5-bin.zip，然后解压在任意位置，最好地址中不要带中文字符，这个做java 的都知道，地址中出现中文会出现很多
PHP网站安全，避免PHP网站受到攻击的方法 dcj3sjt126com PHP
对于PHP网站安全主要存在这样几种攻击方式:1、命令注入(Command Injection)2、eval注入(Eval Injection)3、客户端脚本攻击(Script Insertion)4、跨网站脚本攻击(Cross Site Scripting, XSS)5、SQL注入攻击(SQL injection)6、跨网站请求伪造攻击(Cross Site Request Forgerie
yii中给CGridView设置默认的排序根据时间倒序的方法 dcj3sjt126com GridView
public function searchWithRelated() { $criteria = new CDbCriteria; $criteria->together = true; //without th
Java集合对象和数组对象的转换 dyy_gusi java集合
在开发中，我们经常需要将集合对象（List，Set）转换为数组对象，或者将数组对象转换为集合对象。Java提供了相互转换的工具，但是我们使用的时候需要注意，不能乱用滥用。 1、数组对象转换为集合对象最暴力的方式是new一个集合对象，然后遍历数组，依次将数组中的元素放入到新的集合中，但是这样做显然过
nginx同一主机部署多个应用 geeksun nginx
近日有一需求，需要在一台主机上用nginx部署2个php应用，分别是wordpress和wiki，探索了半天，终于部署好了，下面把过程记录下来。 1. 在nginx下创建vhosts目录，用以放置vhost文件。 mkdir vhosts 2. 修改nginx.conf的配置，在http节点增加下面内容设置，用来包含vhosts里的配置文件 #
ubuntu添加admin权限的用户账号 hongtoushizi ubuntu useradd
ubuntu创建账号的方式通常用到两种：useradd 和adduser . 本人尝试了useradd方法，步骤如下： 1:useradd 使用useradd时，如果后面不加任何参数的话，如：sudo useradd sysadm 创建出来的用户将是默认的三无用户：无home directory ,无密码,无系统shell。顾应该如下操作：
第五章常用Lua开发库2-JSON库、编码转换、字符串处理 jinnianshilongnian nginx lua
JSON库在进行数据传输时JSON格式目前应用广泛，因此从Lua对象与JSON字符串之间相互转换是一个非常常见的功能；目前Lua也有几个JSON库，本人用过cjson、dkjson。其中cjson的语法严格（比如unicode \u0020\u7eaf），要求符合规范否则会解析失败（如\u002），而dkjson相对宽松，当然也可以通过修改cjson的源码来完成
Spring定时器配置的两种实现方式OpenSymphony Quartz和java Timer详解 yaerfeng1989 timer quartz 定时器
原创整理不易，转载请注明出处：Spring定时器配置的两种实现方式OpenSymphony Quartz和java Timer详解代码下载地址：http://www.zuidaima.com/share/1772648445103104.htm 有两种流行Spring定时器配置：Java的Timer类和OpenSymphony的Quartz。 1.Java Timer定时首先继承jav
Linux下df与du两个命令的差别？ pda158 linux
　一、df显示文件系统的使用情况，与du比較，就是更全盘化。　　最经常使用的就是 df -T，显示文件系统的使用情况并显示文件系统的类型。　　举比例如以下：　　[root@localhost ~]# df -T 　　Filesystem Type &n
[转]SQLite的工具类 ---- 通过反射把Cursor封装到VO对象 ctfzh VO android sqlite 反射 Cursor
在写DAO层时，觉得从Cursor里一个一个的取出字段值再装到VO(值对象)里太麻烦了，就写了一个工具类，用到了反射，可以把查询记录的值装到对应的VO里，也可以生成该VO的List。使用时需要注意：考虑到Android的性能问题，VO没有使用Setter和Getter，而是直接用public的属性。表中的字段名需要和VO的属性名一样，要是不一样就得在查询的SQL中
该学习笔记用到的Employee表 vipbooks oracle sql 工作
这是我在学习Oracle是用到的Employee表，在该笔记中用到的就是这张表，大家可以用它来学习和练习。 drop table Employee; -- 员工信息表 create table Employee( -- 员工编号 EmpNo number(3) primary key, -- 姓

Flink复习3-2-4-6-1(v1.17.0)： 应用开发 - DataStream API - 状态和容错 - 数据类型&序列化 - 概述