孤鸿子_

pandas 数据处理

Pandas

*pandas* is a Python library for data analysis. It offers a number of data exploration, cleaning and transformation operations that are critical in working with data in Python. *pandas* build upon *numpy* and *scipy* providing easy-to-use data structures and data manipulation functions with integrated indexing. The main data structures *pandas* provides are *Series* and *DataFrames*. After a brief introduction to these two data structures and data ingestion, the key features of *pandas* this notebook covers are: * Generating descriptive statistics on data * Data cleaning using built in pandas functions * Frequent data operations for subsetting, filtering, insertion, deletion and aggregation of data * Merging multiple datasets using dataframes * Working with timestamps and time-series data **Additional Recommended Resources:** * *pandas* Documentation: http://pandas.pydata.org/pandas-docs/stable/ * *Python for Data Analysis* by Wes McKinney * *Python Data Science Handbook* by Jake VanderPlas Let’s get started with our first *pandas* notebook! Import Libraries

import pandas as pd

Introduction to pandas Data Structures

*pandas* has two main data structures it uses, namely, *Series* and *DataFrames*.

pandas Series

*pandas Series* one-dimensional labeled array.

ser = pd.Series(data = [100, 'foo', 300, 'bar', 500], index = ['tom', 'bob', 'nancy', 'dan', 'eric'])

ser

tom 100 bob foo nancy 300 dan bar eric 500 dtype: object

ser.index

Index([‘tom’, ‘bob’, ‘nancy’, ‘dan’, ‘eric’], dtype=’object’)

ser.loc[['nancy','bob']]

nancy 300 bob foo dtype: object

ser[[4, 3, 1]]

eric 500 dan bar bob foo dtype: object

ser.iloc[2]

300

'bob' in ser

True

ser

tom 100 bob foo nancy 300 dan bar eric 500 dtype: object

ser * 2

tom 200 bob foofoo nancy 600 dan barbar eric 1000 dtype: object

ser[['nancy', 'eric']] ** 2

nancy 90000 eric 250000 dtype: object

pandas DataFrame

*pandas DataFrame* is a 2-dimensional labeled data structure.

Create DataFrame from dictionary of Python Series

d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
     'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill', 'dancy'])}

df = pd.DataFrame(d)
print(df)

one two apple 100.0 111.0 ball 200.0 222.0 cerill NaN 333.0 clock 300.0 NaN dancy NaN 4444.0

df.index

Index([‘apple’, ‘ball’, ‘cerill’, ‘clock’, ‘dancy’], dtype=’object’)

df.columns

Index([‘one’, ‘two’], dtype=’object’)

pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

	one	two
dancy	NaN	4444.0
ball	200.0	222.0
apple	100.0	111.0

pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])

	two	five
dancy	4444.0	NaN
ball	222.0	NaN
apple	111.0	NaN

Create DataFrame from list of Python dictionaries

data = [{'alex': 1, 'joe': 2}, {'ema': 5, 'dora': 10, 'alice': 20}]

pd.DataFrame(data)

	alex	alice	dora	ema	joe
0	1.0	NaN	NaN	NaN	2.0
1	NaN	20.0	10.0	5.0	NaN

pd.DataFrame(data, index=['orange', 'red'])

	alex	alice	dora	ema	joe
orange	1.0	NaN	NaN	NaN	2.0
red	NaN	20.0	10.0	5.0	NaN

pd.DataFrame(data, columns=['joe', 'dora','alice'])

	joe	dora	alice
0	2.0	NaN	NaN
1	NaN	10.0	20.0

Basic DataFrame operations

df

	one	two
apple	100.0	111.0
ball	200.0	222.0
cerill	NaN	333.0
clock	300.0	NaN
dancy	NaN	4444.0

df['one']

apple 100.0 ball 200.0 cerill NaN clock 300.0 dancy NaN Name: one, dtype: float64

df['three'] = df['one'] * df['two']
df

	one	two	three
apple	100.0	111.0	11100.0
ball	200.0	222.0	44400.0
cerill	NaN	333.0	NaN
clock	300.0	NaN	NaN
dancy	NaN	4444.0	NaN

df['flag'] = df['one'] > 250
df

	one	two	three	flag
apple	100.0	111.0	11100.0	False
ball	200.0	222.0	44400.0	False
cerill	NaN	333.0	NaN	False
clock	300.0	NaN	NaN	True
dancy	NaN	4444.0	NaN	False

three = df.pop('three')

three

apple 11100.0 ball 44400.0 cerill NaN clock NaN dancy NaN Name: three, dtype: float64

df

	one	two	flag
apple	100.0	111.0	False
ball	200.0	222.0	False
cerill	NaN	333.0	False
clock	300.0	NaN	True
dancy	NaN	4444.0	False

del df['two']

df

	one	flag
apple	100.0	False
ball	200.0	False
cerill	NaN	False
clock	300.0	True
dancy	NaN	False

df.insert(2, 'copy_of_one', df['one'])
df

	one	flag	copy_of_one
apple	100.0	False	100.0
ball	200.0	False	200.0
cerill	NaN	False	NaN
clock	300.0	True	300.0
dancy	NaN	False	NaN

df['one_upper_half'] = df['one'][:2]
df

	one	flag	copy_of_one	one_upper_half
apple	100.0	False	100.0	100.0
ball	200.0	False	200.0	200.0
cerill	NaN	False	NaN	NaN
clock	300.0	True	300.0	NaN
dancy	NaN	False	NaN	NaN

Case Study: Movie Data Analysis

This notebook uses a dataset from the MovieLens website. We will describe the dataset further as we explore with it using *pandas*. ## Download the Dataset Please note that **you will need to download the dataset**. Although the video for this notebook says that the data is in your folder, the folder turned out to be too large to fit on the edX platform due to size constraints. Here are the links to the data source and location: * **Data Source: ** MovieLens web site (filename: ml-20m.zip) * **Location:** https://grouplens.org/datasets/movielens/ Once the download completes, please make sure the data files are in a directory called *movielens* in your *Week-3-pandas* folder. Let us look at the files in this dataset using the UNIX command ls.

# Note: Adjust the name of the folder to match your local directory
#linux 使用
!ls ./movielens

!cat ./movielens/movies.csv | wc -l

!head -5 ./movielens/ratings.csv

Use Pandas to Read the Dataset

In this notebook, we will be using three CSV files:

ratings.csv : userId,movieId,rating, timestamp
tags.csv : userId,movieId, tag, timestamp
movies.csv : movieId, title, genres

Using the read_csv function in pandas, we will ingest these three files.

movies = pd.read_csv('./movielens/movies.csv', sep=',')
print(type(movies))
movies.head(15)

# Timestamps represent seconds since midnight Coordinated Universal Time (UTC) of January 1, 1970

tags = pd.read_csv('./movielens/tags.csv', sep=',')
tags.head()

	userId	movieId	tag	timestamp
0	18	4141	Mark Waters	1240597180
1	65	208	dark hero	1368150078
2	65	353	dark hero	1368150079
3	65	521	noir thriller	1368149983
4	65	592	dark hero	1368150078

ratings = pd.read_csv('./movielens/ratings.csv', sep=',', parse_dates=['timestamp'])
ratings.head()

	userId	movieId	rating	timestamp
0	1	2	3.5	1112486027
1	1	29	3.5	1112484676
2	1	32	3.5	1112484819
3	1	47	3.5	1112484727
4	1	50	3.5	1112484580

# For current analysis, we will remove timestamp (we will come back to it!)

del ratings['timestamp']
del tags['timestamp']

Data Structures

Series

#Extract 0th row: notice that it is infact a Series

row_0 = tags.iloc[0]
type(row_0)

pandas.core.series.Series

print(row_0)

userId 18 movieId 4141 tag Mark Waters Name: 0, dtype: object

row_0.index

Index([‘userId’, ‘movieId’, ‘tag’], dtype=’object’)

row_0['userId']

'rating' in row_0

False

row_0.name

row_0 = row_0.rename('first_row')
row_0.name

‘first_row’

DataFrames

tags.head()

	userId	movieId	tag
0	18	4141	Mark Waters
1	65	208	dark hero
2	65	353	dark hero
3	65	521	noir thriller
4	65	592	dark hero

tags.index

RangeIndex(start=0, stop=465564, step=1)

tags.columns

Index([‘userId’, ‘movieId’, ‘tag’], dtype=’object’)

# Extract row 0, 11, 2000 from DataFrame

tags.iloc[ [0,11,2000] ]

	userId	movieId	tag
0	18	4141	Mark Waters
11	65	1783	noir thriller
2000	910	68554	conspiracy theory

Descriptive Statistics

Let’s look how the ratings are distributed!

ratings['rating'].describe()

count 2.000026e+07 mean 3.525529e+00 std 1.051989e+00 min 5.000000e-01 25% 3.000000e+00 50% 3.500000e+00 75% 4.000000e+00 max 5.000000e+00 Name: rating, dtype: float64

ratings.describe()

	userId	movieId	rating
count	2.000026e+07	2.000026e+07	2.000026e+07
mean	6.904587e+04	9.041567e+03	3.525529e+00
std	4.003863e+04	1.978948e+04	1.051989e+00
min	1.000000e+00	1.000000e+00	5.000000e-01
25%	3.439500e+04	9.020000e+02	3.000000e+00
50%	6.914100e+04	2.167000e+03	3.500000e+00
75%	1.036370e+05	4.770000e+03	4.000000e+00
max	1.384930e+05	1.312620e+05	5.000000e+00

ratings['rating'].mean()

3.5255285642993797

ratings.mean()

userId 69045.872583 movieId 9041.567330 rating 3.525529 dtype: float64

ratings['rating'].min()

0.5

ratings['rating'].max()

5.0

ratings['rating'].std()

1.051988919275684

ratings['rating'].mode()

0 4.0 dtype: float64

ratings.corr()

	userId	movieId	rating
userId	1.000000	-0.000850	0.001175
movieId	-0.000850	1.000000	0.002606
rating	0.001175	0.002606	1.000000

filter_1 = ratings['rating'] > 5
print(filter_1)
filter_1.any()

0 False 1 False 2 False 3 False 4 False 5 False 6 False 7 False 8 False 9 False 10 False 11 False 12 False 13 False 14 False 15 False 16 False 17 False 18 False 19 False 20 False 21 False 22 False 23 False 24 False 25 False 26 False 27 False 28 False 29 False … 20000233 False 20000234 False 20000235 False 20000236 False 20000237 False 20000238 False 20000239 False 20000240 False 20000241 False 20000242 False 20000243 False 20000244 False 20000245 False 20000246 False 20000247 False 20000248 False 20000249 False 20000250 False 20000251 False 20000252 False 20000253 False 20000254 False 20000255 False 20000256 False 20000257 False 20000258 False 20000259 False 20000260 False 20000261 False 20000262 False Name: rating, dtype: bool False

filter_2 = ratings['rating'] > 0
filter_2.all()

True

Data Cleaning: Handling Missing Data

movies.shape

(27278, 3)

#is any row NULL ?

movies.isnull().any()

movieId False title False genres False dtype: bool Thats nice ! No NULL values !

ratings.shape

(20000263, 3)

#is any row NULL ?

ratings.isnull().any()

userId False movieId False rating False dtype: bool Thats nice ! No NULL values !

tags.shape

(465564, 3)

#is any row NULL ?

tags.isnull().any()

userId False movieId False tag True dtype: bool We have some tags which are NULL.

tags = tags.dropna()

#Check again: is any row NULL ?

tags.isnull().any()

userId False movieId False tag False dtype: bool

tags.shape

(465548, 3) Thats nice ! No NULL values ! Notice the number of lines have reduced.

Data Visualization

%matplotlib inline

ratings.hist(column='rating', figsize=(15,10))

array([[

ratings.boxplot(column='rating', figsize=(15,20))

tags['tag'].head()

0 Mark Waters 1 dark hero 2 dark hero 3 noir thriller 4 dark hero Name: tag, dtype: object

movies[['title','genres']].head()

	title	genres
0	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	Jumanji (1995)	Adventure\|Children\|Fantasy
2	Grumpier Old Men (1995)	Comedy\|Romance
3	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	Father of the Bride Part II (1995)	Comedy

ratings[-10:]

	userId	movieId	rating
20000253	138493	60816	4.5
20000254	138493	61160	4.0
20000255	138493	65682	4.5
20000256	138493	66762	4.5
20000257	138493	68319	4.5
20000258	138493	68954	4.5
20000259	138493	69526	4.5
20000260	138493	69644	3.0
20000261	138493	70286	5.0
20000262	138493	71619	2.5

tag_counts = tags['tag'].value_counts()
tag_counts[-10:]

Venice Film Festival Winner 2002 1 based on the life of Buford Pusser 1 but no way as good as the other two 1 see 1 Jeffrey Kimball 1 tolerable 1 Fake History - Don’t Believe a Thing 1 Boy 1 urlaub 1 conservative 1 Name: tag, dtype: int64

tag_counts[:10].plot(kind='bar', figsize=(15,10))

is_highly_rated = ratings['rating'] >= 4.0

ratings[is_highly_rated][30:50]

	userId	movieId	rating
68	1	2021	4.0
69	1	2100	4.0
70	1	2118	4.0
71	1	2138	4.0
72	1	2140	4.0
73	1	2143	4.0
74	1	2173	4.0
75	1	2174	4.0
76	1	2193	4.0
79	1	2288	4.0
80	1	2291	4.0
81	1	2542	4.0
82	1	2628	4.0
90	1	2762	4.0
92	1	2872	4.0
94	1	2944	4.0
96	1	2959	4.0
97	1	2968	4.0
101	1	3081	4.0
102	1	3153	4.0

is_animation = movies['genres'].str.contains('Animation')

movies[is_animation][5:15]

	movieId	title	genres
310	313	Swan Princess, The (1994)	Animation\|Children
360	364	Lion King, The (1994)	Adventure\|Animation\|Children\|Drama\|Musical\|IMAX
388	392	Secret Adventures of Tom Thumb, The (1993)	Adventure\|Animation
547	551	Nightmare Before Christmas, The (1993)	Animation\|Children\|Fantasy\|Musical
553	558	Pagemaster, The (1994)	Action\|Adventure\|Animation\|Children\|Fantasy
582	588	Aladdin (1992)	Adventure\|Animation\|Children\|Comedy\|Musical
588	594	Snow White and the Seven Dwarfs (1937)	Animation\|Children\|Drama\|Fantasy\|Musical
589	595	Beauty and the Beast (1991)	Animation\|Children\|Fantasy\|Musical\|Romance\|IMAX
590	596	Pinocchio (1940)	Animation\|Children\|Fantasy\|Musical
604	610	Heavy Metal (1981)	Action\|Adventure\|Animation\|Horror\|Sci-Fi

movies[is_animation].head(15)

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
12	13	Balto (1995)	Adventure\|Animation\|Children
47	48	Pocahontas (1995)	Animation\|Children\|Drama\|Musical\|Romance
236	239	Goofy Movie, A (1995)	Animation\|Children\|Comedy\|Romance
241	244	Gumby: The Movie (1995)	Animation\|Children
310	313	Swan Princess, The (1994)	Animation\|Children
360	364	Lion King, The (1994)	Adventure\|Animation\|Children\|Drama\|Musical\|IMAX
388	392	Secret Adventures of Tom Thumb, The (1993)	Adventure\|Animation
547	551	Nightmare Before Christmas, The (1993)	Animation\|Children\|Fantasy\|Musical
553	558	Pagemaster, The (1994)	Action\|Adventure\|Animation\|Children\|Fantasy
582	588	Aladdin (1992)	Adventure\|Animation\|Children\|Comedy\|Musical
588	594	Snow White and the Seven Dwarfs (1937)	Animation\|Children\|Drama\|Fantasy\|Musical
589	595	Beauty and the Beast (1991)	Animation\|Children\|Fantasy\|Musical\|Romance\|IMAX
590	596	Pinocchio (1940)	Animation\|Children\|Fantasy\|Musical
604	610	Heavy Metal (1981)	Action\|Adventure\|Animation\|Horror\|Sci-Fi

Group By and Aggregate

ratings_count = ratings[['movieId','rating']].groupby('rating').count()
ratings_count

	movieId
rating
0.5	239125
1.0	680732
1.5	279252
2.0	1430997
2.5	883398
3.0	4291193
3.5	2200156
4.0	5561926
4.5	1534824
5.0	2898660

average_rating = ratings[['movieId','rating']].groupby('movieId').mean()
average_rating.head()

	rating
movieId
1	3.921240
2	3.211977
3	3.151040
4	2.861393
5	3.064592

movie_count = ratings[['movieId','rating']].groupby('movieId').count()
movie_count.head()

	rating
movieId
1	49695
2	22243
3	12735
4	2756
5	12161

movie_count = ratings[['movieId','rating']].groupby('movieId').count()#选择某一个维度，然后根据维度group by，和sql操作类似
movie_count.tail()

	rating
movieId
131254	1
131256	1
131258	1
131260	1
131262	1

Merge Dataframes

tags.head()

	userId	movieId	tag
0	18	4141	Mark Waters
1	65	208	dark hero
2	65	353	dark hero
3	65	521	noir thriller
4	65	592	dark hero

movies.head()

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

t = movies.merge(tags, on='movieId', how='inner')
t.head()
#?movies.merge 详细说明

	movieId	title	genres	userId	tag
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	1644	Watched
1	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	1741	computer animation
2	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	1741	Disney animated feature
3	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	1741	Pixar animation
4	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	1741	TÃ©a Leoni does not star in this movie

More examples: http://pandas.pydata.org/pandas-docs/stable/merging.html

Combine aggreagation, merging, and filters to get useful analytics

avg_ratings = ratings.groupby('movieId', as_index=False).mean().rename(columns={'rating':'avg_rating'})#指定columns field 就能重命名了
del avg_ratings['userId']
avg_ratings.head()

	movieId	avg_rating
0	1	3.921240
1	2	3.211977
2	3	3.151040
3	4	2.861393
4	5	3.064592

avg_ratings = avg_ratings.rename({'ratings':'avg_rating'})
box_office = movies.merge(avg_ratings, on='movieId', how='inner')
box_office.head()

	movieId	title	genres	avg_rating
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	3.921240
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy	3.211977
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3.151040
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	2.861393
4	5	Father of the Bride Part II (1995)	Comedy	3.064592

is_highly_rated = box_office['avg_rating'] >= 4.0

box_office[is_highly_rated][-5:]

	movieId	title	genres	avg_rating
26737	131250	No More School (2000)	Comedy	4.0
26738	131252	Forklift Driver Klaus: The First Day on the Jo…	Comedy\|Horror	4.0
26739	131254	Kein Bund für’s Leben (2007)	Comedy	4.0
26740	131256	Feuer, Eis & Dosenbier (2002)	Comedy	4.0
26743	131262	Innocence (2014)	Adventure\|Fantasy\|Horror	4.0

is_comedy = box_office['genres'].str.contains('Comedy')

box_office[is_comedy][:5]

	movieId	title	genres	avg_rating
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy	3.921240
2	3	Grumpier Old Men (1995)	Comedy\|Romance	3.151040
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance	2.861393
4	5	Father of the Bride Part II (1995)	Comedy	3.064592
6	7	Sabrina (1995)	Comedy\|Romance	3.366484

box_office[is_comedy & is_highly_rated][-5:]

	movieId	title	genres	avg_rating
26736	131248	Brother Bear 2 (2006)	Adventure\|Animation\|Children\|Comedy\|Fantasy	4.0
26737	131250	No More School (2000)	Comedy	4.0
26738	131252	Forklift Driver Klaus: The First Day on the Jo…	Comedy\|Horror	4.0
26739	131254	Kein Bund für’s Leben (2007)	Comedy	4.0
26740	131256	Feuer, Eis & Dosenbier (2002)	Comedy	4.0

Vectorized String Operations

movies.head()

	movieId	title	genres
0	1	Toy Story (1995)	Adventure\|Animation\|Children\|Comedy\|Fantasy
1	2	Jumanji (1995)	Adventure\|Children\|Fantasy
2	3	Grumpier Old Men (1995)	Comedy\|Romance
3	4	Waiting to Exhale (1995)	Comedy\|Drama\|Romance
4	5	Father of the Bride Part II (1995)	Comedy

Split ‘genres’ into multiple columns

movie_genres = movies['genres'].str.split('|', expand=True)

movie_genres[:10]

	0	1	2	3	4	5	6	7	8	9
0	Adventure	Animation	Children	Comedy	Fantasy	None	None	None	None	None
1	Adventure	Children	Fantasy	None	None	None	None	None	None	None
2	Comedy	Romance	None	None	None	None	None	None	None	None
3	Comedy	Drama	Romance	None	None	None	None	None	None	None
4	Comedy	None	None	None	None	None	None	None	None	None
5	Action	Crime	Thriller	None	None	None	None	None	None	None
6	Comedy	Romance	None	None	None	None	None	None	None	None
7	Adventure	Children	None	None	None	None	None	None	None	None
8	Action	None	None	None	None	None	None	None	None	None
9	Action	Adventure	Thriller	None	None	None	None	None	None	None

Add a new column for comedy genre flag

movie_genres['isComedy'] = movies['genres'].str.contains('Comedy')

movie_genres[:10]

	0	1	2	3	4	5	6	7	8	9	isComedy
0	Adventure	Animation	Children	Comedy	Fantasy	None	None	None	None	None	True
1	Adventure	Children	Fantasy	None	None	None	None	None	None	None	False
2	Comedy	Romance	None	None	None	None	None	None	None	None	True
3	Comedy	Drama	Romance	None	None	None	None	None	None	None	True
4	Comedy	None	None	None	None	None	None	None	None	None	True
5	Action	Crime	Thriller	None	None	None	None	None	None	None	False
6	Comedy	Romance	None	None	None	None	None	None	None	None	True
7	Adventure	Children	None	None	None	None	None	None	None	None	False
8	Action	None	None	None	None	None	None	None	None	None	False
9	Action	Adventure	Thriller	None	None	None	None	None	None	None	False

Extract year from title e.g. (1995)

movies['year'] = movies['title'].str.extract('.*\((.*)\).*', expand=True)#接受正则表达式

movies.tail()

	movieId	title	genres	year
27273	131254	Kein Bund für’s Leben (2007)	Comedy	2007
27274	131256	Feuer, Eis & Dosenbier (2002)	Comedy	2002
27275	131258	The Pirates (2014)	Adventure	2014
27276	131260	Rentun Ruusu (2001)	(no genres listed)	2001
27277	131262	Innocence (2014)	Adventure\|Fantasy\|Horror	2014

More here: http://pandas.pydata.org/pandas-docs/stable/text.html#text-string-methods

Parsing Timestamps

Timestamps are common in sensor data or other time series datasets. Let us revisit the *tags.csv* dataset and read the timestamps!

tags = pd.read_csv('./movielens/tags.csv', sep=',')

tags.dtypes

userId int64 movieId int64 tag object timestamp int64 dtype: object

Unix time / POSIX time / epoch time records time in seconds
since midnight Coordinated Universal Time (UTC) of January 1, 1970

tags.head(5)

	userId	movieId	tag	timestamp
0	18	4141	Mark Waters	1240597180
1	65	208	dark hero	1368150078
2	65	353	dark hero	1368150079
3	65	521	noir thriller	1368149983
4	65	592	dark hero	1368150078

tags['parsed_time'] = pd.to_datetime(tags['timestamp'], unit='s')#解析时间

Data Type datetime64[ns] maps to either


tags['parsed_time'].dtype

dtype(‘

tags.head(2)

	userId	movieId	tag	timestamp	parsed_time
0	18	4141	Mark Waters	1240597180	2009-04-24 18:19:40
1	65	208	dark hero	1368150078	2013-05-10 01:41:18

Selecting rows based on timestamps

greater_than_t = tags['parsed_time'] > '2015-02-01'

selected_rows = tags[greater_than_t]

tags.shape, selected_rows.shape

((465564, 5), (12130, 5))

Sorting the table using the timestamps

tags.sort_values(by='parsed_time', ascending=True)[:10]

	userId	movieId	tag	timestamp	parsed_time
333932	100371	2788	monty python	1135429210	2005-12-24 13:00:10
333927	100371	1732	coen brothers	1135429236	2005-12-24 13:00:36
333924	100371	1206	stanley kubrick	1135429248	2005-12-24 13:00:48
333923	100371	1193	jack nicholson	1135429371	2005-12-24 13:02:51
333939	100371	5004	peter sellers	1135429399	2005-12-24 13:03:19
333922	100371	47	morgan freeman	1135429412	2005-12-24 13:03:32
333921	100371	47	brad pitt	1135429412	2005-12-24 13:03:32
333936	100371	4011	brad pitt	1135429431	2005-12-24 13:03:51
333937	100371	4011	guy ritchie	1135429431	2005-12-24 13:03:51
333920	100371	32	bruce willis	1135429442	2005-12-24 13:04:02

Average Movie Ratings over Time

## Are Movie ratings related to the year of launch?

average_rating = ratings[['movieId','rating']].groupby('movieId', as_index=False).mean()
average_rating.tail()

	movieId	rating
26739	131254	4.0
26740	131256	4.0
26741	131258	2.5
26742	131260	3.0
26743	131262	4.0

joined = movies.merge(average_rating, on='movieId', how='inner')
joined.head()
joined.corr()

	movieId	rating
movieId	1.000000	-0.090369
rating	-0.090369	1.000000

yearly_average = joined[['year','rating']].groupby('year', as_index=False).mean()#将asindex设为false会作为列名，否则作为行
yearly_average[:10]

	year	rating
0	1891	3.000000
1	1893	3.375000
2	1894	3.071429
3	1895	3.125000
4	1896	3.183036
5	1898	3.850000
6	1899	3.625000
7	1900	3.166667
8	1901	5.000000
9	1902	3.738189

import matplotlib.pyplot as plt
yearly_average[-20:].plot(x='year', y='rating', figsize=(15,10), grid=True)
plt.show()

Do some years look better for the boxoffice movies than others?

Does any data point seem like an outlier in some sense?

注意: 本文为edx上 UCSD 的课程py for data science笔记

你可能感兴趣的:(python,数据分析)

python 读excel每行替换_Python脚本操作Excel实现批量替换功能 weixin_39646695 python 读excel每行替换
Python脚本操作Excel实现批量替换功能大家好，给大家分享下如何使用Python脚本操作Excel实现批量替换。使用的工具Openpyxl，一个处理excel的python库，处理excel，其实针对的就是WorkBook，Sheet，Cell这三个最根本的元素~明确需求原始excel如下我们的目标是把下面excel工作表的sheet1表页A列的内容“替换我吧”批量替换为B列的“我用来替换的
最新阿里四面面试真题46道：面试技巧+核心问题+面试心得风平浪静如码
前言做技术的有一种资历，叫做通过了阿里的面试。这些阿里Java相关问题，都是之前通过不断优秀人才的铺垫总结的，先自己弄懂了再去阿里面试，不然就是去丢脸，被虐。希望对大家帮助，祝面试成功，有个更好的职业规划。一，阿里常见技术面1、微信红包怎么实现。2、海量数据分析。3、测试职位问的线程安全和非线程安全。4、HTTP2.0、thrift。5、面试电话沟通可能先让自我介绍。6、分布式事务一致性。7、ni
python笔记14介绍几个魔法方法抢公主的大魔王 python python
python笔记14介绍几个魔法方法先声明一下各位大佬，这是我的笔记。如有错误，恳请指正。另外，感谢您的观看，谢谢啦！(1).__doc__输出对应的函数，类的说明文档print(print.__doc__)print(value,...,sep='',end='\n',file=sys.stdout,flush=False)Printsthevaluestoastream,ortosys.std
Anaconda 和 Miniconda：功能详解与选择建议古月฿ python入门 python conda
Anaconda和Miniconda详细介绍一、Anaconda的详细介绍1.什么是Anaconda？Anaconda是一个开源的包管理和环境管理工具，在数据科学、机器学习以及科学计算领域发挥着关键作用。它以Python和R语言为基础，为用户精心准备了大量预装库和工具，极大地缩短了搭建数据科学环境的时间。对于那些想要快速开展数据分析、模型训练等工作的人员来说，Anaconda就像是一个一站式的“数
环境搭建 | Python + Anaconda / Miniconda + PyCharm 的安装、配置与使用
本文将分别介绍Python、Anaconda/Miniconda、PyCharm的安装、配置与使用，详细介绍Python环境搭建的全过程，涵盖Python、Pip、PythonLauncher、Anaconda、Miniconda、Pycharm等内容，以官方文档为参照，使用经验为补充，内容全面而详实。由于图片太多，就先贴一个无图简化版吧，详情请查看Python+Anaconda/Minicond
你竟然还在用克隆删除？Conda最新版rename命令全攻略！曦紫沐 Python基础知识 conda 虚拟环境管理
文章摘要Conda虚拟环境管理终于迎来革命性升级！本文揭秘Conda4.9+版本新增的rename黑科技，彻底告别传统“克隆+删除”的繁琐操作。从命令解析到实战案例，手把手教你如何安全高效地重命名Python虚拟环境，附带版本检测、环境迁移、故障排查等进阶技巧，助你提升开发效率10倍！一、颠覆认知：Conda居然自带重命名功能？很多开发者仍停留在“Conda无法直接重命名环境”的认知阶段，实际上自
centos7安装配置 Anaconda3
Anaconda是一个用于科学计算的Python发行版,Anaconda于Python，相当于centos于linux。下载[root@testsrc]#mwgethttps://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.2.0-Linux-x86_64.shBegintodownload:Anaconda3-5.2.0-L
Pandas：数据科学的超级瑞士军刀科技林总 DeepSeek学AI 人工智能
**——从零基础到高效分析的进化指南**###**一、Pandas诞生：数据革命的救世主****2010年前的数据分析噩梦**：```python#传统Python处理表格数据data=[]forrowincsv_file:ifrow[3]>100androw[2]=="China":data.append(float(row[5])#代码冗长易错！```**核心痛点**：-Excel处理百万行崩
【Jupyter】个人开发常见命令 TIM老师 #Pycharm &VSCode python Jupyter
1.查看python版本importsysprint(sys.version)2.ipynb/py文件转换jupyternbconvert--topythonmy_file.ipynbipynb转换为mdjupyternbconvert--tomdmy_file.ipynbipynb转为htmljupyternbconvert--tohtmlmy_file.ipynbipython转换为pdfju
用 Python 开发小游戏：零基础也能做出《贪吃蛇》
本文专为零基础学习者打造，详细介绍如何用Python开发经典小游戏《贪吃蛇》。无需复杂编程知识，从环境搭建到代码编写、功能实现，逐步讲解核心逻辑与操作。涵盖Pygame库的基础运用、游戏界面设计、蛇的移动与食物生成规则等，让新手能按步骤完成开发，同时融入SEO优化要点，帮助读者轻松入门Python游戏开发，体验从0到1做出游戏的乐趣。一、为什么选择用Python开发《贪吃蛇》对于零基础学习者来说，
基于Python的AI健康助手：开发与部署全攻略 AI算力网络与通信 AI算力网络与通信原理 AI人工智能大数据架构 python 人工智能开发语言 ai
基于Python的AI健康助手：开发与部署全攻略关键词：Python、AI健康助手、机器学习、自然语言处理、Flask、部署、健康管理摘要：本文将详细介绍如何使用Python开发一个AI健康助手，从需求分析、技术选型到核心功能实现，再到最终部署上线的完整过程。我们将使用自然语言处理技术理解用户健康咨询，通过机器学习模型提供个性化建议，并展示如何用Flask框架构建Web应用接口。文章包含大量实际代
数据分析领域中AI人工智能的发展前景展望 AI大模型应用工坊 AI大模型开发实战数据分析人工智能数据挖掘 ai
数据分析领域中AI人工智能的发展前景展望关键词：数据分析、人工智能、机器学习、深度学习、数据挖掘、预测分析、自动化摘要：本文深入探讨了人工智能在数据分析领域的发展现状和未来趋势。我们将从核心技术原理出发，分析AI如何改变传统数据分析范式，详细讲解机器学习算法在数据分析中的应用，并通过实际案例展示AI驱动的数据分析解决方案。文章还将探讨行业应用场景、工具生态以及未来发展面临的挑战和机遇，为数据分析师
AI人工智能中的数据挖掘：提升智能决策能力
AI人工智能中的数据挖掘：提升智能决策能力关键词：数据挖掘、人工智能、机器学习、智能决策、数据分析、特征工程、模型优化摘要：本文深入探讨了数据挖掘在人工智能领域中的核心作用，重点分析了如何通过数据挖掘技术提升智能决策能力。文章从基础概念出发，详细介绍了数据挖掘的关键算法、数学模型和实际应用场景，并通过Python代码示例展示了数据挖掘的全流程。最后，文章展望了数据挖掘技术的未来发展趋势和面临的挑战
lesson20：Python函数的标注你的电影很有趣 python 开发语言
目录引言：为什么函数标注是现代Python开发的必备技能一、函数标注的基础语法1.1参数与返回值标注1.2支持的标注类型1.3Python3.9+的重大改进：标准集合泛型二、高级标注技巧与最佳实践2.1复杂参数结构标注2.2函数类型与回调标注2.3变量注解与类型别名三、静态类型检查工具应用3.1mypy：最流行的类型检查器3.2Pyright与IDE集成3.3运行时类型验证四、函数标注的工程价值与
Jupyter Notebook：数据科学的“瑞士军刀” a小胡哦机器学习基础人工智能机器学习
在数据科学的世界里，JupyterNotebook是一个不可或缺的工具，它就像是数据科学家手中的“瑞士军刀”，功能强大且灵活多变。今天，就让我们一起深入了解这个神奇的工具。一、JupyterNotebook是什么？JupyterNotebook是一个开源的Web应用程序，它允许你创建和共享包含实时代码、方程、可视化和解释性文本的文档。它支持多种编程语言，其中Python是最常用的语言之一。Jupy
Django学习笔记（一）
学习视频为：pythondjangoweb框架开发入门全套视频教程一、安装pipinstalldjango==****检查是否安装成功django.get_version()二、django新建项目操作1、新建一个项目django-adminstartprojectproject_name2、新建APPcdproject_namedjango-adminstartappApp注：一个project
Python 程序设计讲义（26）：字符串的用法——字符的编码睿思达DBA_WGX Python 讲义 python 开发语言
Python程序设计讲义（26）：字符串的用法——字符的编码目录Python程序设计讲义（26）：字符串的用法——字符的编码一、字符的编码二、`ASCII`编码三、`Unicode`编码四、使用`ord()`函数查询一个字符对应的`Unicode`编码五、使用`chr()`函数查询一个`Unicode`编码对应的字符六、`Python`字符串的特征一、字符的编码计算机默认只能处理二进制数，而不能处
【Python】pypinyin-汉字拼音转换工具鸟哥大大 Python python 自然语言处理
文章目录1.主要功能2.安装3.常用API3.1拼音风格3.2核心API3.2.1pypinyin.pinyin()3.2.2pypinyin.lazy_pinyin()3.2.3pypinyin.load_single_dict()3.2.4pypinyin.load_phrases_dict()3.2.5pypinyin.slug()3.3注册新的拼音风格4.基本用法4.1库导入4.2基本汉字
python编程第十四课：数据可视化小小源助手 Python代码实例信息可视化 python 开发语言
Python数据可视化：让数据“开口说话”在当今数据爆炸的时代，数据可视化已成为探索数据规律、传达数据信息的关键技术。Python凭借其丰富的第三方库，为数据可视化提供了强大而灵活的解决方案。本文将带你深入了解Matplotlib库的基础绘图、Seaborn库的高级可视化以及交互式可视化工具Plotly，帮助你通过图表清晰地展示数据背后的故事。一、Matplotlib库基础绘图Matplotlib
Python数据可视化：用代码绘制数据背后的故事 AAEllisonPang Python 信息可视化 python 开发语言
引言：当数据会说话在数据爆炸的时代，可视化是解锁数据价值的金钥匙。Python凭借其丰富的可视化生态库，已成为数据科学家的首选工具。本文将带您从基础到高级，探索如何用Python将冰冷数字转化为引人入胜的视觉叙事。一、基础篇：二维可视化的艺术表达1.1Matplotlib：可视化领域的瑞士军刀importmatplotlib.pyplotaspltimportnumpyasnpx=np.linsp
python学习笔记（汇总）朕的剑还未配妥 python学习笔记整理 python 学习开发语言
文章目录一.基础知识二.python中的数据类型三.运算符四.程序的控制结构五.列表六.字典七.元组八.集合九.字符串十.函数十一.解决bug一.基础知识print函数字符串要加引号，数字可不加引号，如print(123.4)print('小谢')print("洛天依")还可输入表达式，如print(1+3)如果使用三引号，print打印的内容可不在同一行print("line1line2line
PDF转Markdown - Python 实现方案与代码 Eiceblue Python Python PDF pdf python 开发语言 vscode
PDF作为广泛使用的文档格式，转换为轻量级标记语言Markdown后，可无缝集成到技术文档、博客平台和版本控制系统中，提高内容的可编辑性和可访问性。本文将详细介绍如何使用国产Spire.PDFforPython库将PDF文档转换为Markdown格式。技术优势：精准保留原始文档结构（段落/列表/表格）完整提取文本和图像内容无需Adobe依赖的纯Python实现支持Linux/Windows/mac
使用Python和Gradio构建实时数据可视化工具 PythonAI编程架构实战家信息可视化 python 开发语言 ai
使用Python和Gradio构建实时数据可视化工具关键词：Python、Gradio、数据可视化、实时数据、Web应用、交互式界面、数据科学摘要：本文将详细介绍如何使用Python和Gradio框架构建一个实时数据可视化工具。我们将从基础概念开始，逐步深入到核心算法实现，包括数据处理、可视化技术以及Gradio的交互式界面设计。通过实际项目案例，读者将学习如何创建一个功能完整、响应迅速的实时数据
Python Gradio：实现交互式图像编辑 PythonAI编程架构实战家 Python编程之道 python 开发语言 ai
PythonGradio：实现交互式图像编辑关键词：Python,Gradio,交互式图像编辑,计算机视觉,深度学习,图像处理,Web应用摘要：本文将深入探讨如何使用Python的Gradio库构建交互式图像编辑应用。我们将从基础概念开始，逐步介绍Gradio的核心功能，并通过实际代码示例展示如何实现各种图像处理功能。文章将涵盖图像滤镜应用、对象检测、风格迁移等高级功能，同时提供完整的项目实战案例
数据可视化：数据世界的直观呈现卢政权1 信息可视化数据分析数据挖掘
在当今数字化浪潮中，数据呈爆炸式增长。数据可视化作为一种强大的技术手段，能够将复杂的数据转化为直观的图形、图表等形式，让数据背后的信息一目了然。无论是在商业决策、科学研究还是日常数据分析中，数据可视化都发挥着极为重要的作用。它帮助我们快速理解数据的分布、趋势、关联等特征，从而为进一步的分析和行动提供有力支持。接下来，我们将深入探讨数据可视化的奥秘，并通过代码示例展示其实际应用。一、Python数据
Python 程序设计讲义（25）：循环结构——嵌套循环
Python程序设计讲义（25）：循环结构——嵌套循环目录Python程序设计讲义（25）：循环结构——嵌套循环一、嵌套循环的执行流程二、嵌套循环对应的几种情况1、内循环和外循环互不影响2、外循环迭代影响内循环的条件3、外循环迭代影响内循环的循环体嵌套循环是指在一个循环体中嵌套另一个循环。while循环中可以嵌入另一个while循环或for循环。反之，也可以在for循环中嵌入另一个for循环或wh
基于Python引擎的PP-OCR模型库推理张欣-男 python ocr 开发语言 PaddleOCR PaddlePaddle
基于Python引擎的PP-OCR模型库推理1.文本检测模型推理#下载超轻量中文检测模型：wgethttps://paddleocr.bj.bcebos.com/PP-OCRv3/chinese/ch_PP-OCRv3_det_infer.tartarxfch_PP-OCRv3_det_infer.tarpython3tools/infer/predict_det.py--image_dir=".
一个开源AI牛马神器 | AiPy，平替Manus，装完直接上手写Python！ Agent加载失败人工智能 python 开源算法 AI编程
还记得三个月前那个在闲鱼被炒到万元邀请码的Manus吗？现在你点官网，直接提示「所在地区不可用」了它走了，但更香的国产开源项目出现了：AiPy（爱派）。主打一个极致简化的AIAgent理念：别搞什么插件市场、Agent路由，直接给AI一个Python解释器，让它用自然语言写代码干活。听起来狠活？实际体验更狠：•完全本地化，界面傻瓜式操作，支持自然语言生成&执行Python任务；•数据清洗、文档总结
零数学基础理解AI核心概念：梯度下降可视化实战九章云极AladdinEdu 人工智能 gpu算力深度学习 pytorch python 语言模型 opencv
点击“AladdinEdu，同学们用得起的【H卡】算力平台”，H卡级别算力，按量计费，灵活弹性，顶级配置，学生专属优惠。用Python动画演示损失函数优化过程，数学公式具象化读者收获：直观理解模型训练本质，破除"数学恐惧症"当盲人登山者摸索下山路径时，他本能地运用了梯度下降算法。本文将用动态可视化技术，让你像感受重力一样理解AI训练的核心原理——无需任何数学公式推导。一、梯度下降：AI世界的"万有
写完作业的感觉很爽乡村算卦师
今天终于一口气把一个数据分析课的作业写完了。明天还要继续写一个，写完，就可以暂时轻松一下了。想想还是很开心的，哈哈哈。刚出去跑了一圈，结果下雨了，虽然不是很大，可是没办法跑，怕下大。现在在小区门口，吹吹风，也是极好的。希望一些都变的越来越好，加油！
关于旗正规则引擎下载页面需要弹窗保存到本地目录的问题何必如此 jsp 超链接文件下载窗口
生成下载页面是需要选择“录入提交页面”，生成之后默认的下载页面<a>标签超链接为：<a href="<%=root_stimage%>stimage/image.jsp?filename=<%=strfile234%>&attachname=<%=java.net.URLEncoder.encode(file234filesourc
【Spark九十八】Standalone Cluster Mode下的资源调度源代码分析 bit1129 cluster
在分析源代码之前，首先对Standalone Cluster Mode的资源调度有一个基本的认识：首先，运行一个Application需要Driver进程和一组Executor进程。在Standalone Cluster Mode下，Driver和Executor都是在Master的监护下给Worker发消息创建(Driver进程和Executor进程都需要分配内存和CPU，这就需要Maste
linux上独立安装部署spark daizj linux 安装 spark 1.4 部署
下面讲一下linux上安装spark，以 Standalone Mode 安装 1）首先安装JDK 下载JDK：jdk-7u79-linux-x64.tar.gz ，版本是1.7以上都行，解压 tar -zxvf jdk-7u79-linux-x64.tar.gz 然后配置 ~/.bashrc&nb
Java 字节码之解析一周凡杨 java 字节码 javap
一： Java 字节代码的组织形式类文件 { OxCAFEBABE ，小版本号，大版本号，常量池大小，常量池数组，访问控制标记，当前类信息，父类信息，实现的接口个数，实现的接口信息数组，域个数，域信息数组，方法个数，方法信息数组，属性个数，属性信息数组 } &nbs
java各种小工具代码 g21121 java
1.数组转换成List import java.util.Arrays; Arrays.asList(Object[] obj); 2.判断一个String型是否有值 import org.springframework.util.StringUtils; if (StringUtils.hasText(str)) 3.判断一个List是否有值 import org.spring
加快FineReport报表设计的几个心得体会老A不折腾 finereport
一、从远程服务器大批量取数进行表样设计时，最好按“列顺序”取一个“空的SQL语句”，这样可提高设计速度。否则每次设计时模板均要从远程读取数据，速度相当慢！！二、找一个富文本编辑软件（如NOTEPAD+）编辑SQL语句，这样会很好地检查语法。有时候带参数较多检查语法复杂时，结合FineReport中生成的日志，再找一个第三方数据库访问软件（如PL/SQL）进行数据检索，可以很快定位语法错误。
mysql linux启动与停止墙头上一根草
如何启动/停止/重启MySQL一、启动方式1、使用 service 启动：service mysqld start2、使用 mysqld 脚本启动：/etc/inint.d/mysqld start3、使用 safe_mysqld 启动：safe_mysqld&二、停止1、使用 service 启动：service mysqld stop2、使用 mysqld 脚本启动：/etc/inin
Spring中事务管理浅谈 aijuans spring 事务管理
Spring中事务管理浅谈 By Tony Jiang@2012-1-20 Spring中对事务的声明式管理拿一个XML举例 [html] view plain copy print ? <?xml version="1.0" encoding="UTF-8"?>&nb
php中隐形字符65279（utf-8的BOM头）问题 alxw4616
php中隐形字符65279（utf-8的BOM头）问题今天遇到一个问题. php输出JSON 前端在解析时发生问题:parsererror. 调试: 1.仔细对比字符串发现字符串拼写正确.怀疑是非打印字符的问题. 2.逐一将字符串还原为unicode编码. 发现在字符串头的位置出现了一个 65279的非打印字符.
调用对象是否需要传递对象(初学者一定要注意这个问题) 百合不是茶对象的传递与调用技巧
类和对象的简单的复习,在做项目的过程中有时候不知道怎样来调用类创建的对象,简单的几个类可以看清楚,一般在项目中创建十几个类往往就不知道怎么来看为了以后能够看清楚,现在来回顾一下类和对象的创建,对象的调用和传递(前面写过一篇) 类和对象的基础概念: JAVA中万事万物都是类类有字段(属性),方法,嵌套类和嵌套接
JDK1.5 AtomicLong实例 bijian1013 java thread java多线程 AtomicLong
JDK1.5 AtomicLong实例类 AtomicLong 可以用原子方式更新的 long 值。有关原子变量属性的描述，请参阅 java.util.concurrent.atomic 包规范。AtomicLong 可用在应用程序中（如以原子方式增加的序列号），并且不能用于替换 Long。但是，此类确实扩展了 Number，允许那些处理基于数字类的工具和实用工具进行统一访问。
自定义的RPC的Java实现 bijian1013 java rpc
网上看到纯java实现的RPC，很不错。 RPC的全名Remote Process Call，即远程过程调用。使用RPC，可以像使用本地的程序一样使用远程服务器上的程序。下面是一个简单的RPC 调用实例，从中可以看到RPC如何
【RPC框架Hessian一】Hessian RPC Hello World bit1129 Hello world
什么是Hessian The Hessian binary web service protocol makes web services usable without requiring a large framework, and without learning yet another alphabet soup of protocols. Because it is a binary p
【Spark九十五】Spark Shell操作Spark SQL bit1129 shell
在Spark Shell上，通过创建HiveContext可以直接进行Hive操作 1. 操作Hive中已存在的表 [hadoop@hadoop bin]$ ./spark-shell Spark assembly has been built with Hive, including Datanucleus jars on classpath Welcom
F5　往header加入客户端的ip ronin47
when HTTP_RESPONSE {if {[HTTP::is_redirect]}{ HTTP::header replace Location [string map {:port/ /} [HTTP::header value Location]]HTTP::header replace Lo
java-61-在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差. 求所有数对之差的最大值。例如在数组{2, 4, 1, 16, 7, 5, bylijinnan java
思路来自： http://zhedahht.blog.163.com/blog/static/2541117420116135376632/ 写了个java版的 public class GreatestLeftRightDiff { /** * Q61.在数组中，数字减去它右边(注意是右边)的数字得到一个数对之差。 * 求所有数对之差的最大值。例如在数组
mongoDB 索引开窍的石头 mongoDB索引
在这一节中我们讲讲在mongo中如何创建索引得到当前查询的索引信息 db.user.find(_id:12).explain(); cursor: basicCoursor 指的是没有索引 &
[硬件和系统]迎峰度夏 comsci 系统
从这几天的气温来看，今年夏天的高温天气可能会维持在一个比较长的时间内所以，从现在开始准备渡过炎热的夏天。。。。每间房屋要有一个落地电风扇，一个空调(空调的功率和房间的面积有密切的关系) 坐的，躺的地方要有凉垫，床上要有凉席电脑的机箱
基于ThinkPHP开发的公司官网 cuiyadll 行业系统
后端基于ThinkPHP，前端基于jQuery和BootstrapCo.MZ 企业系统轻量级企业网站管理系统运行环境:PHP5.3+, MySQL5.0 系统预览系统下载：http://www.tecmz.com 预览地址：http://co.tecmz.com 各种设备自适应响应式的网站设计能够对用户产生友好度，并且对于
Transaction and redelivery in JMS (JMS的事务和失败消息重发机制) darrenzhu jms 事务承认 MQ acknowledge
JMS Message Delivery Reliability and Acknowledgement Patterns http://wso2.com/library/articles/2013/01/jms-message-delivery-reliability-acknowledgement-patterns/ Transaction and redelivery in
Centos添加硬盘完全教程 dcj3sjt126com linux centos hardware
Linux的硬盘识别: sda 表示第1块SCSI硬盘 hda 表示第1块IDE硬盘 scd0 表示第1个USB光驱一般使用“fdisk -l”命
yii2 restful web服务路由 dcj3sjt126com PHP yii2
路由随着资源和控制器类准备，您可以使用URL如 http://localhost/index.php?r=user/create访问资源，类似于你可以用正常的Web应用程序做法。在实践中，你通常要用美观的URL并采取有优势的HTTP动词。例如，请求POST /users意味着访问user/create动作。这可以很容易地通过配置urlManager应用程序组件来完成如下所示
MongoDB查询(4)——游标和分页[八] eksliang mongodb MongoDB游标 MongoDB深分页
转载请出自出处：http://eksliang.iteye.com/blog/2177567 一、游标数据库使用游标返回find的执行结果。客户端对游标的实现通常能够对最终结果进行有效控制，从shell中定义一个游标非常简单，就是将查询结果分配给一个变量（用var声明的变量就是局部变量），便创建了一个游标，如下所示： > var
Activity的四种启动模式和onNewIntent() gundumw100 android
Android中Activity启动模式详解　　在Android中每个界面都是一个Activity，切换界面操作其实是多个不同Activity之间的实例化操作。在Android中Activity的启动模式决定了Activity的启动运行方式。　　Android总Activity的启动模式分为四种： Activity启动模式设置： <acti
攻城狮送女友的CSS3生日蛋糕 ini html Web html5 css css3
在线预览：http://keleyi.com/keleyi/phtml/html5/29.htm 代码如下： <!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>攻城狮送女友的CSS3生日蛋糕-柯乐义<
读源码学Servlet（1）GenericServlet 源码分析 jzinfo tomcat Web servlet 网络应用网络协议
Servlet API的核心就是javax.servlet.Servlet接口，所有的Servlet 类（抽象的或者自己写的）都必须实现这个接口。在Servlet接口中定义了5个方法，其中有3个方法是由Servlet 容器在Servlet的生命周期的不同阶段来调用的特定方法。先看javax.servlet.servlet接口源码： package
JAVA进阶：VO(DTO)与PO(DAO)之间的转换 snoopy7713 java VO Hibernate po
PO即 Persistence Object　　VO即 Value Object 　VO和PO的主要区别在于：　　VO是独立的Java Object。　　PO是由Hibernate纳入其实体容器（Entity Map）的对象，它代表了与数据库中某条记录对应的Hibernate实体，PO的变化在事务提交时将反应到实际数据库中。　实际上，这个VO被用作Data Transfer
mongodb group by date 聚合查询日期统计每天数据（信息量） qiaolevip 每天进步一点点学习永无止境 mongodb 纵观千象
/* 1 */ { "_id" : ObjectId("557ac1e2153c43c320393d9d"), "msgType" : "text", "sendTime" : ISODate("2015-06-12T11:26:26.000Z")
java之18天常用的类(一) Luob. Math Date System Runtime Rundom
System类 import java.util.Properties; /** * System: * out:标准输出,默认是控制台 * in:标准输入,默认是键盘 * * 描述系统的一些信息 * 获取系统的属性信息:Properties getProperties(); * * * */ public class Sy
maven wuai maven
1、安装maven：解压缩、添加M2_HOME、添加环境变量path 2、创建maven_home文件夹，创建项目mvn_ch01,在其下面建立src、pom.xml，在src下面简历main、test、main下面建立java文件夹 3、编写类，在java文件夹下面依照类的包逐层创建文件夹，将此类放入最后一级文件夹 4、进入mvn_ch01 4.1、mvn compile ,执行后会在