MongoDB in action
1.
The code examples are written in JavaScript, the language of the MongoDB shell,
and Ruby, a popular scripting language.
You can download the book’s source code, with some sample data, from the book’s
site at http://mongodb-book.com
2.
A JSON document needs double quotes everywhere except for numeric values.
The MongoDB shell uses JavaScript and gets documents in JSON.
3.
Indexes in MongoDB are implemented as a B-tree data structure. B-tree indexes,
also used in many relational databases, are optimized for a variety of queries, including
range scans and queries with sort clauses. But WiredTiger has support for logstructured
merge-trees (LSM) that’s expected to be available in the MongoDB 3.2 production
release.
Because MongoDB and most RDBMSs use the same data structure for their indexes,
advice for managing indexes in both of these systems is similar.
4.
MongoDB provides database replication via a topology known as a replica set. Replica
sets distribute data across two or more machines for redundancy and automate
failover in the event of server and network outages. Additionally, replication is used
to scale database reads.
5.
Journaling is enabled by default since MongoDB v2.0
With journaling, every write
is flushed to the journal file every 100 ms. If the server is ever shut down uncleanly
(say, in a power outage), the journal will be used to ensure that MongoDB’s data files
are restored to a consistent state when you restart the server. This is the safest way to
run MongoDB.
It’s possible to run the server without journaling as a way of increasing performance
for some write loads. The downside is that the data files may be corrupted after
an unclean shutdown. As a consequence, anyone planning to disable journaling should
run with replication, preferably to a second datacenter, to increase the likelihood that
a pristine copy of the data will still exist even if there’s a failure.
6.
MongoDB was designed to make horizontal scaling manageable. It does so via a
range-based partitioning mechanism, known as sharding, which automatically manages
the distribution of data across nodes. There’s also a hash- and tag-based sharding
mechanism, but it’s just another form of the range-based sharding mechanism.
MongoDB v1.0 was released in November 2009.
(shard is NA in TD right now)
7.
MongoDB is bundled with several command-line utilities:
mongodump and mongorestore—Standard utilities for backing up and restoring
a database. mongodump saves the database’s data in its native BSON format and
thus is best used for backups only.
mongoexport and mongoimport—Export and import JSON, CSV, and TSV7 data;
this is useful if you need your data in widely supported formats. mongoimport
can also be good for initial imports of large data sets.
mongotop—Similar to top, this utility polls MongoDB and shows the amount of
time it spends reading and writing data in each collection.
8.
All collections in a database are grouped in the
same files, so it makes sense, from a memory perspective, to keep related collections
in the same database.
9.
db.users.update({username:"smith"},{$set: {country:"Canada"}});//add or set fields
db.users.update({username: "smith"}, {country: "Canada"})//replace the entire document
db.users.update({username: "smith"}, {$unset: {country: 1}})
db.users.update( {username: "jones"},
... {
... $set: {
... favorites: {
... movies: ["Casablanca", "Rocky"]
... }
... }
... })
db.users.find({"favorites.movies": "Casablanca"})
db.numbers.find( {num: {"$gt": 19995 }} )
db.numbers.find( {num: {"$gt": 20, "$lt": 25 }} )
Others include $gte for greater than or equal to, $lte for less than or equal to, and $ne for not equal to.
To exclude them, add those fields to the projection with
a value of 0:
db.users.find({}, {'addresses': 0, 'payment_methods': 0})
10.
$push or $addToSet. Both operators add
an item to an array, but the second does so uniquely, preventing a duplicate addition.
db.users.update( {"favorites.movies": "Casablanca"},
... {$addToSet: {"favorites.movies": "The Maltese Falcon"} },
... false,
... true )
The third argument, false, controls whether an upsert is allowed.
The fourth argument, true, indicates that this is a multi-update.
11.
remove() operation doesn’t actually delete the collection; it merely
removes documents from a collection. You can think of it as being analogous to SQL’s
DELETE command.
db.foo.remove()
db.users.remove({"favorites.cities": "Cheyenne"})
If your intent is to delete the collection along with all of its indexes, use the drop()
method:
> db.users.drop()
> help
$ mongo --help
12.
db.numbers.find({num: {"$gt": 19995}}).explain("executionStats")
//docsExamined, nReturned, totalKeysExamined, db.numbers.getIndexes()
db.numbers.createIndex({num: 1})
db.numbers.getIndexes()
db.stats() //same with db.runCommand( {dbstats: 1} )
db.numbers.stats() //same with db.runCommand( {collstats: "numbers"} )
The getIndexes() Java-
Script method can be replaced by the db.runCommand( {"listIndexes": "numbers"} )
shell command.
13.
Run the below without parentheses to see the internal implementation
> db.runCommand
>db.users.find
>db.numbers.save
14.
press the Tab key twice to see a list of all matching methods
> db.numbers.get
15.
Ruby 20-minute tutorial at http://mng.bz/THR3
16.
MongoDB also allows you to expire documents from a collection after a certain
amount of time has passed. These are sometimes called time-to-live (TTL) collections,
though this functionality is actually implemented using a special kind of index. Here’s
how you would create such a TTL index:
> db.reviews.createIndex({time_field: 1}, {expireAfterSeconds: 3600})
->db.reviews.insert({time_field:new Date()});
17.
db.system.namespaces.find();
db.system.indexes.find();
18.
All string values must be encoded as UTF-8.
BSON specifies three numeric types: double, int, and long.
The BSON datetime type is used to store temporal values. Time values are represented
using a signed 64-bit integer marking milliseconds since the Unix epoch. A negative
value marks milliseconds prior to the epoch.
if you’re creating dates in JavaScript, keep in
mind that months in JavaScript dates are 0-based.
19.
BSON documents in MongoDB v2.0 and later are limited to 16 MB in size.
First, it’s there to prevent developers from creating ungainly data models.
The second reason for the 16 MB limit is performance-related.
MongoDB documents are also limited to a maximum nesting depth of 100.
20.
Users commonly ask what the ideal bulk insert size is, but the answer to this is
dependent on too many factors to respond concretely, and the ideal number can
range from 10 to 200. Benchmarking will be the best counsel in this case.
21.
The findOne method is similar to the following,
though a cursor is returned even when you apply a limit:
db.products.find({'slug': 'wheel-barrow-9092'}).limit(1)
22.
SET OPERATORS
Three query operators—$in, $all, and $nin—take a list of one or more values as
their predicate, so these are called set operators.
MongoDB’s Boolean operators include $ne, $not, $or, $and, $nor, and $exists.
db.users.find({'addresses': {$size: 3}})
23.
You can use the special $where operator to pass a JavaScript expression
to any query, as summarized here:
■ $where Execute some arbitrary JavaScript to select a document
db.reviews.find({
'$where': "function() { return this.helpful_votes > 3; }"
})
There’s also an abbreviated form for simple expressions like this one:
db.reviews.find({'$where': "this.helpful_votes > 3"})
This query works, but you’d never want to use it because you can easily express it using
other query operators. The problem is that JavaScript expressions can’t use an index,
and they incur substantial overhead because they must be evaluated within a Java-
Script interpreter context and are single-threaded. For these reasons, you should issue
JavaScript queries only when you can’t express your query using other query operators.
If you do need JavaScript, try to combine the JavaScript expression with at least
one other query operator.
24.
The aggregation framework is MongoDB’s
advanced query language, and it allows you to transform and combine data from
multiple documents to generate new information not available in any single document.
you can think of the aggregation framework as MongoDB’s equivalent to
the SQL GROUP BY clause.
25.
map-reduce was MongoDB’s first attempt at providing a flexible aggregation capability.
With map-reduce, you have the ability to use JavaScript in defining your entire
process. This provides a great deal of flexibility but generally performs much slower
than the aggregation framework.
26.
the use of targeted updates frequently means
less time spent serializing and transmitting data.
27.
The multi parameter {multi: true} is easy to
understand; it enables multi-updates causing the update to affect all documents matching
the selector—without {multi: true} an update will only affect the first matching
document.
28.
with compound
indexes, order matters.
Only one single-key index will be used to resolve a query.1 For queries containing
multiple keys (say, ingredient and recipe name), a compound index containing
those keys will best resolve the query.
29.
With sufficient RAM, all the data files in use will eventually be loaded into memory.
At a minimum, you need to make
sure that your indexes will fit in RAM. This is one reason why it’s important to avoid creating
any unneeded indexes.
A covering index is one where the entire query can be
satisfied from reading only the index, making queries very fast.
30.
To create a unique index, specify the unique option:
db.users.createIndex({username: 1}, {unique: true})
In a sparse index, only those documents having some value for the indexed key will
appear.
db.products.createIndex({sku: 1}, {unique: true, sparse: true})
use green
db.users.dropIndex("zip_1")
For large data sets, building an index can take hours, even days. But you can monitor
the progress of an index build from the MongoDB logs.
The index builds in two steps. In the first step, the values to be indexed are sorted.
For step two, the sorted values are inserted into the index.
In addition to examining the MongoDB log, you can check the index build progress
by running the shell’s currentOp() method.
->db.currentOp()
31.
If you’re running in production and can’t afford to halt access to the database, you
can specify that an index be built in the background. Although the index build will
still take a write lock, the job will yield to allow other readers and writers to access the
database.
db.values.createIndex({open: 1, close: 1}, {background: true})
32.
Building an index in the background may still put an unacceptable amount of load on
a production server. If this is the case, you may need to index the data offline.
when you run mongorestore, all the indexes declared
for any collections you’ve backed up will be re-created.
33.
Be careful about reindexing: the command will take out a write lock for the duration
of the rebuild, temporarily rendering your MongoDB instance unusable.
Though the requirements vary per application, it’s safe to
assume that for most apps, queries shouldn’t take much longer than 100 ms.
The
MongoDB logger has this assumption ingrained because it prints a warning whenever
any operation, including a query, takes longer than 100 ms.
34.
download http://mng.bz/ii49
unzip stocks.zip
mongorestore -d stocks dump/stocks
->
use stocks
db.values.find({"stock_symbol": "GOOG"}).sort({date: -1}).limit(1)
grep -E '[0-9]+ms' mongod.log
If 100 ms is too high a threshold, you can lower it with the --slowms server option
when you start MongoDB. If you define slow as taking longer than 50 ms, then start
mongod with --slowms 50.
For identifying slow queries, you can’t beat the built-in profiler. Profiling is disabled
by default, so let’s get started by enabling it. From the MongoDB shell, enter
the following:
use stocks
db.setProfilingLevel(2)
db.system.profile.find({millis: {$gt: 150}}).pretty();
The explain() command displays more information when used with the execution-
Stats option.
pass true to the explain() method, which will include the list of plans the
query optimizer attempts.
35.
It provides functions that MongoDB needs to use to store data.
MongoDB 3.0 comes bundled with an alternative to MMAPv1, which is WiredTiger.
36.
We highly recommend running a production MongoDB instance with both replication
and journaling, unless you’re prepared to lose data;
As another form of redundancy,
replicated nodes can also be delayed by a constant number of seconds, minutes,
or even hours behind the primary.
Because index builds are expensive, you may opt to build on a
secondary node first, swap the secondary with the existing primary, and then build
again on the new secondary.
The minimum recommended replica set configuration consists of three nodes,
because in a replica set with only two nodes you can’t have a majority in case the primary
server goes down.
In the minimal configuration, two of these three nodes serve as first-class, persistent
mongod instances. Either can act as the replica set primary, and both have a full
copy of the data. The third node in the set is an arbiter, which doesn’t replicate data
but merely acts as a kind of observer. Arbiters are lightweight mongod servers that participate
in the election of a primary but don’t replicate any of the data.
37.
db.isMaster()
rs.status()
db.getReplicationInfo()
db.oplog.rs.findOne({op: "i"}) ?
38.
The default oplog sizes on 64-bit systems, the oplog will be the larger of 1 GB or 5% of free
disk space
the default size won’t be ideal for all applications.
mongod --replSet myapp --oplogSize 1024
39.
operations on a single document are always atomic with MongoDB
databases, but operations that involve multiple documents aren’t atomic as a whole.
You now know that a replica set can consist of up to 50 nodes in MongoDB v3.0
40.
Sharding is the process of partitioning a large dataset into smaller, more manageable
pieces.
It’s a complex system that adds administrative and performance
overhead, so make absolutely sure it’s what your application needs.
41.
Once you mount your fast filesystem, you can achieve another performance gain by
disabling updates to files’ last access time: atime. Normally, the operating system will
update a file’s atime every time the file is read or written. In a database environment,
this amounts to a lot of unnecessary work.
42.
You can check the current limit temporarily with the
ulimit command:
ulimit -Hn
43.
When journaling is enabled, MongoDB will commit all writes to a journal
before writing to the core data files. This allows the MongoDB server to come back
online quickly and cleanly in the event of an unclean shutdown.
44.
Global server statistics: db.serverStatus()
Stats for currently running operation: db.currentOp()
Include stats for idle system operations: db.currentOp(true)
Per database counters and activity stats: db.runCommand({top:1})
Memory and disk usage statistics: db.stats()
45.
mongostat—Global system statistics
■ mongotop—Global operation statistics
■ mongosniff (advanced)—Dump MongoDB network traffic
■ bsondump—Display BSON files as JSON
46.
Three general strategies for backing up a MongoDB database are as follows:
■ Using mongodump and mongorestore
■ Copying the raw data files
■ Using MMS Backups
47.
There are two ways to import and export data with MongoDB:
■ Use the included tools, mongoimport and mongoexport.
■ Write a simple program using one of the drivers.
48.
You can use mongoimport to import JSON, CSV, and TSV files. This is
frequently useful for loading data from relational databases into MongoDB:
$ mongoimport -d stocks -c values --type csv --headerline stocks.csv
The --headerline flag indicates that the first line of the CSV
contains the field names. You can see all the import options by running mongoimport
--help.
49.
Use mongoexport to export all of a collection’s data to a JSON or CSV file:
$ mongoexport -d stocks -c values -o stocks.csv
50.
--rest—This flag enables a simple REST interface that enhances the server’s
default web console. The web console is always available 1000 port numbers
above the port the server listens on. Thus if the server is listening at localhost
on port 27017, then the web console will be available at http://localhost:28017.
Spend some time exploring the web console and the commands
No comments:
Post a Comment