MongoDB Workshop Summary
MongoDB is a source-available cross-platform document-oriented database program. Classified as a NoSQL database program, MongoDB uses JSON-like documents with optional schemas.
π Normally we stored the data in the file and the files where manages inside the folder finally it is manage inside the storage where it is store the data permanent.
π While user doing the I/O operation on the top of storage the performance should be superfast otherwise it will decrease the user experience.
π More and more data dumping to the storage it will impact the I/O performance where we face a challenge of latency.
π Latency can be minimize with the game of managing, planning and how data is organized come in play where the storage same and size of the data same but the I/O performance will be increased where known as Data model i.e how we modeling the data? how we managing the data? and how data is orgranizing?.
π Where the data organize in a structure way i.e SQL where each database have they won data organization and where database sloving tons of things Nowadays the managing the data was changed it was NOSQL way .
π In the Database world the data is store inside table(like files) and tables are store inside Database(Folder). In the entire table each one row wise data is one record and the name of the column is the field. The Tables are manage inside the database.
π In the realworld we canβt work with constrist the schema where it is not flexible this kind of database is known as SQL database for this kind of challenge we want schema less where user can add the field on the fly for that MongoDB come and play a role.
π In the MongoDB world where record is know as Document and table is known as collection.
π Thus why MongoDB is known as flexible document oriented database.
π MongoDB is a database server and to interact with the server to do something, retrive something and create something we need database client the program running behind the server is mangod and to connect the database server with client we need the mango command.
Installing MongoDB in the ubuntu
sudo apt-get install mongodb -y
Checking status of the mongodb by default it will comes with active mode otherwise start the services
starting service of the mongodb
systemctl start mongodb
Checking service of the mongodb
systemctl status mongodb
Connecting the mongo client with the mongod server
> show dbs is the command to retrive the all databases
> use <database_name> is the command to create the database and go inside the database
> db is the command tell about the in which current database we have
> db.createCollection("<collection_name>") is the command to create the collection> show collections is the command to retrive the all the collections
> db.<collection_name>.insert({"fieldname": <value>, "fieldname": <value> ... }) is the command to insert document inside the collection.> db.<collection_name>.find() is the command to retrive the all the documents inside the collection
> db.<collection_name>.find({"fieldname": <value>}) is the command to retrive the documents having the specific fieldname and value only inside the collection> db.<collection_name>.drop() is the command to delete the collection in the current database> db.dropDatabase() is the command to delete the current database
π Where MongoDB follows the JSON format where data in the form of key value pair and separated by comma where each document placed inside curly bases and documents seperated by comma inside closed brackets.
π Json formatter is the online tool we can validate the syantax of the json of our data
> db.<collection_name>.find({"fieldname": { $gt: <value> } }) is the command to retrive the all documents having greater than given the integer value inside the collection> db.<collection_name>.find({"fieldname": { $lt: <value> } }) is the command to retrive the all documents having less than given the integer value inside the collection
where gt and lt are the predefined value in the MongoDB
> db.<collection_name>.find({"fieldname": value},{"fieldname": 1 }) is the command to retrive the value of the fieldname where first document is the one in which field have to filter the given second document field name where 1 will display and 0 will not display.
π By default MongoDB maintains the specific id to the each document to vary the values same identical fields in the document we can add the id to the document while inserting the data if id exist will raise the error id must be unique.
> db.<collection_name>.insert({ "_id": <value>,"fieldname": <value>, "fieldname": <value> β¦ })
> db.<collection_name>.update({ "fieldname": <value>, {$set: {"fieldname": <value> } }) is the command helps us to update the existing data
> db.<collection_name>.deletedOne({ "fieldname": <value>}) is the command helps us to delete the entire document
π we have three ways to interact with the mongod as CLI, GUI and API where we use the API mostly in the realworld and Compass is the one provides the GUI of the mongod
π pymongo is the python libaray helps to connect to mango client for that we have to install pymongo with the help of pip
pip install pymongo
π From pymongo libary we have MongoClient module to use the MongoClient function.
>>> from pymongo import MongoClient
>>> client = MongoClient("mongodb://127.0.0.1:27017")
>>> output=client["<database>"]["<collection>"].find({ "name": "tom"})
>>> for i in output:
... print(i)
mangoimport filename.json -d <databasename> -c <collectionname> --jsonArray is the command helps us to upload dataset in MongoDB
> db.<collection_name>.count() is the command helps to count no of documents we have in the given collection> db.<collection_name>.pretty() is the command helps to retrive the output in structural format
> db.<collection_name>.findOne() is the command retrive the one random document by default with pretty output
π In the real world have huge data but the concerned is how quickly we can search the data in the document by writing one query with in less time to execution the query.
π Before minimizing the execution time of the query we have to analysis how much time taken to execute the query for that we have query analysis explain function in the mongodb. By default mongodb use the planning of collection scanning
> db.contacts.explain().find({"dob.age": { $gt: 60}})
π To explain the exection statistics of the query for that we can pass executionStats keyword to the explain function.
> db.contacts.explain("executionStats").find({ " dob.age": { $gt: 60}})
π Above command shows the total Docs Examined is 5000 and execution Time is 256 Milli seconds.
π By using the Indexing concept we can retrive the required data quickly from the huge data
π whenever we created the collection by default it will create a index for that collection.
> db.contacts.getIndexes()
π where id are unique and idβs are arranged in the ascending order if you know any id of the document we can easily retrive the data of id.
π you can see that β_idβ : 1 means id are arranged in the ascending order where -1 means id are arranged in the descending order.
> db.contacts.createIndex({"dob.age": 1}) is the command to create the index
> db.contacts.explain("executionStats").find({ " dob.age": { $gt: 60}})
command run again now it shows the total Docs Examined is 1222 and execution Time is 65 Milli seconds.
π aggregation pipeline is a concept in mongodb where we can write the code in stages
For example we have a requirement like from all documents retrive the female gender documents after that retrive their location and count the no of documents with respective locations and sort the document according to documents here we have three stages
db.contacts.aggregate([
{ $match: { gender: "female"}},
{ $group: { _id:{ state: "$location.state"}, totalfemale: {$sum: 1}}}])
here we find the output the female count with respective state.
db.contacts.aggregate([
{ $match: { gender: "female"}},
{ $group: { _id:{ state: "$location.state"}, totalfemale: {$sum: 1}}},
{ $sort: { totalfemale: 1 }}])
Here data is arrange in the ascending order
db.contacts.aggregate([
{ $match: { gender: "female"}},
{ $group: { _id:{ state: "$location.state"}, totalfemale: {$sum: 1}}},
{ $sort: { totalfemale: -1 }}])
Here data is arrange in the decending order
π In the real world we use cluster environment for high availbility we can configure own, aws and mongodb cloud