Data Science

MongoDB: A NoSQL Database

And how to use it using PyMongo.

ANKIT PRATAP SINGH

--

Index Of Contents
· Introduction
· How MongoDB stores data
· Why and When to use MongoDB
· Let’s talk about the CRUD operations in MongoDB (with python):
PyMongo
Create operations
Insert Operations
Query Inserted Documents
Query with Filter
Comparison Operator
String Filter
Update Data Collections
Delete or Drop
· Conclusion
· References

Introduction

MySQL/SQL is a Relational Database, were designed to store structured data in tabular format. MySQL was first launched in the year 1965 and is now managed by Oracle. SQL databases were the only viable database storage solution until the 2000s when the internet and web 2.0 booms started to generate a large amount of data. SQL databases failed to store all that unstructured data. Such unstructured data could not be mapped into a table-like schema and thus arose a need for another different class of database engine.

MongoDB is used to save unstructured data in JSON format. This is when NoSQL databases started arriving on the scene.

MongoDB is a No SQL database engine. It was first launched in 2009 and has since become one of the leading databases in the NoSQL space.

In this article, we will learn about how to create, read, update and delete (CRUD operations) data in the MongoDB database engine with python. But first, let’s dig a little bit deeper into how MongoDB stores data and in what cases it is more useful than MySQL databases.

How MongoDB stores data

In MySQL data is stored in tables, where each table consists of rows and columns, and the columns of the table represent the attribute, and the rows of the table represent a particular record in the table. These tables, in turn, reside inside the databases.

While in MongoDB, data is stored in collections of documents where the data is present in form of JSON format (i.e. key-value pair). A collection can consist of many documents, and there can be thousands of such collections inside of a MongoDB database.

SQL databases have a relational property where different tables are related to each other with foreign keys and primary keys. That’s why SQL databases are called Relational Databases, while in MongoDB we can not establish any relationship between the unstructured collections of data. And that’s why such databases are called Non-Relational Databases.

Why and When to use MongoDB

MongoDB is an attractive option for developers due to its data storage philosophy being simple and easy to understand for anybody having programming experience. MongoDB stores data in collections with no fixed schema, this flexible approach to storing data in such data collections makes it suitable for developers who may not be database experts, yet want to use a database to support the development of their applications.

MongoDB databases are commonly used in environments where flexibility is required with big and unstructured data with ever-changing schemas. One of the most common use cases of MongoDB database is in companies where tons of data is being generated day by day which is also called big data. Some of the other use cases of MongoDB databases are-Customer management systems, customer analytics, product management system, and real-time data integration that requires a large volume of high-speed data logging and aggregation.

Let’s talk about the CRUD operations in MongoDB (with python):

Similar to MySQL, MongoDB also comes with its different products wiz MongoDB Community Server, MongoDB Shell, MongoDB Compass, and MongoDB Atlas. Here I am writing a small introduction of each of its products.

  • MongoDB Community Server: MongoDB Community Server offers a flexible document data model along with support for ad-hoc queries, secondary indexing, and real-time aggregations to provide powerful ways to access and analyze your data. It’s the database engine that stores data in different data collections.
  • MongoDB Shell: The new MongoDB Shell lets you connect to MongoDB to work with your data and configure your database. With its enhanced usability features (intelligent autocomplete and syntax highlighting, easy-to-understand error messages, and contextual help), it’s the quickest way to work with MongoDB and Atlas.
  • MongoDB Compass: MongoDB compass is made for easy work with your data, the GUI is built by-and-for MongoDB. Compass provides everything from scheme analysis to index optimization to aggregation pipelines in a single, centralized interface.
  • MongoDB Atlas: MongoDB Atlas is the multi-cloud developer database storage platform. You can deploy your database across Microsoft Azure, AWS, and Google Cloud using MongoDB Atlas, it’s easy and secure to deploy your database.

PyMongo

PyMongo is a python distribution containing tools for working with MongoDB in python and is the recommended way to work with MongoDB from python. In this article, I will try to explain the CRUD operation of MongoDB from python.

You can read the detailed documentation of PyMongo by visiting their official documentation website.

To start this journey you first have to install PyMongo in your virtual environment. To install PyMongo you can run the command and then you can import it into your python script or notebook.

Once the PyMongo package gets imported into your notebook, now you can head towards the getting database and data collection, Before we dive into data pulling from the database, one thing I want to clarify to all readers is that the data tables in MongoDB are called Data-Collections and each data row is called a document in MongoDB, so don’t get confused with the terms as data collections and data document.

To connect to a database server you need to import MongoClient from pymongo, the function MongoClient() lets you connect with the database present in your system locally,

MongoDB notebook cell no 7

If you want to connect with a specific database then you have to pass the host and port in the following way:

client = MongoClient('mongodb://localhost:27017/')

Or you can pass the connection URL to connect your script with the MongoDB Atlas Cloud database. The MongoDB Atlas data cluster URL looks as follows:

mongodb://<username>:<password>@ac-ahziyw0-shard-00-00.quzykiv.mongodb.net:27017,ac-ahziyw0-shard-00-01.quzykiv.mongodb.net:27017,ac-ahziyw0-shard-00-02.quzykiv.mongodb.net:27017/?ssl=true&replicaSet=atlas-2tdw68-shard-0&authSource=admin&retryWrites=true&w=majority

In the above URL, you have to pass your username and password.

Once you get connected with your data cluster/database, now you can create a new data collection (data tables) or use existing data collection, insert the data rows, update the data rows, or you can delete the data rows, and delete the data collections.

Create operations

To create a new database, or new data tables let’s practice the below commands:

# to print a list of available databases
client.list_database_names()
# to select an existing database, or to create a new database if not exist
db = client.db_name
# to print a list of available data tables
db.list_collection_names()
# to select an existing data table, or to create a new data table if not exist
table = db.table_name

Insert Operations

Once you create a new data table, now you can insert data in the data table, as MongoDB stores data in JSON format, so we also have to parse the data in JSON/python dictionary format. Let’s practice the following commands:

# first create a data dictionary
name = {'Name':"Ankit Pratap Singh",
'Role':"Owner",
'company':""}
# to insert into table
table.insert_one(name)

Like SQL, MongoDB also requires a specific “key: value” pair, which identifies a particular data row or say data document. SQL requires a primary key column, if not mentioned then SQL automatically creates an index ID column. MongoDB also generates an ID, “key: valuepair, if it wasn’t provided, but if you provide it, MongoDB will save it as an identifier of that individual document. Like in the above cell we have inserted a data row in the table, and now if we fetch the inserted we will get the following output:

In the output we can see, that there is an extra “key: value” pair of “_id: ObjectId(‘62d8df307118397f5930be04’)”, this is a unique identifier of the inserted document in the table. Now if you provide it while inserting the data in the table, it won’t save an auto-generated Id itself. There is a way to get an auto-generated id without the “ObjectId” class, have a look at the following code snippet:

import bson# create a list of dictionary
data_list = [{'Name':"Ankit Pratap Singh",
'Role':"CEO",
'Comepany':"XYZ",
'_id': str(bson.ObjectId())},

{'Name':"Ankit Pratap Singh",
'Role':"CTO",
'Company':"ABC",
'_id': str(bson.ObjectId())}]
# To insert multiple data rows
result = table.insert_many(name)

Now let’s see the output of the above-inserted documents.

Now here we have auto-generated ObjectIds, but without ObjectId class mentioned in the document.

Query Inserted Documents

A query in a database is a request for information from a database. A query response usually returns data from different data tables within the database. Queries in MongoDB, are a little bit different from SQL queries, but they are simple and easy to learn, you can read the official documentation by visiting MongoDB queries webpage.

There are ways to query the data from data collections in MongoDB, we will focus here on data query in MongoDB using Python,

Let’s have a look at the following code snippets:

import pprint# to get one row
pprint.pprint(table.find_one())
# to get all data rows
for name in table.find():
pprint.pprint(name)

Here “pprint” is a module from PyMongo, which lets you print the queried document as a plain JSON document.

Query with Filter

Filtering data is an important part of querying data from the database because we don’t want all the data every time. So we need some filter methods in query sets so that we can fetch the required data. To show query filters I have created some tables and inserted some data from excel sheets. You can find the excel sheet that I am using in the notebook here.

PyMongo provides some methods for data filtering in queries, these filters are called Projection Operators in PyMongo, you can read them in detail by visiting the official page of MongoDB for Query and Projection Operators. There are classes of project operators available in MongoDB, some of them are as follows:

  • Comparison Operators
  • Logical Operators
  • Evaluation Operators
  • Element Operators
  • Geospatial Operators

I will not go into deep for each class, but I’ll describe 2–3 types of them.

Comparison Operator

PyMongo provides a number of comparison operators such as $gte, $gt, $in, and $nin, they are greater than equal to, greater than, in the given array, $nin matches none of the elements in the given array, etc.

Here I am writing some code to demonstrate how to apply these operators:

#in authors table,how many authors spend more than 10 hrs for author in authors_table.find({'Hrs Writing per Day':{'$gte':10.0}}):
pprint.pprint(author)

String Filter

Now let’s try to filter our query set with some string filters. Let’s have a look on the following code snippet

#in authors table, how many authors are from United States
for author in authors_table.find({'Country of Residence':'United States'}):
pprint.pprint(author)
# you can limit your query set as well
for author in authors_table.find({'Country of Residence':'United States'}).limit(4):
pprint.pprint(author)

Update Data Collections

Updating valuesis the third most important pillar of working with any database. Similar to SQL database we have methods to update data collections in MongoDB as well.

There are two main types of update in database:

  • To insert new fields/columns in any existing data table
  • To replace values for predefined column in any table

Sometime we need to add one or more new data columns in predefined data tables, Let’s add a new column in authors table. To update any data collection in MongoDB, PyMongo provides two methods as follows:

db.collection.update_one()
db.collection.update_many()

We will update many values with adding one new column.

column = authors_table.update_many({}, {'$set':{'Created':str(datetime.datetime.now(IST))}},)

The above code will add a new data column in the authors table. to verify let’s have a look on the following code embed

Now let’s try to replace any value from any column in an existing data collection. We will apply the following update in the edition table

  • Hardcover =>Hardcover Book,
edition_table.update_many({'Format':"Hardcover"},{'$set':{'Format':"Hardcover Book"}})

Delete or Drop

This is the fourth foundation pillar to learn working with database. Sometimes you need to delete some records from your database,like you need to delete some data rows only, or you want to delete some data columns, or tables, or want to delete entire database.

Note: Be careful about these operation. Delete operation in some cases are very sensitive, like if you made a wrong step and you can loose all of your data, so always have backup of your database.

There are operations to perform Delete or Drop

  • db.collection.delete_one(): It deletes only one which occurs at very first position in the collection,
  • db.collection.delete_many(): It deletes a number of document that matches the condition that you will provide in the argument of function, or if you don't provide the filter it will delete all the documents present in the data collection,
  • db.collection.drop(): It removes entire data collection from your database

Let’s have a look on following example,

And let’s verify the above if that worked on table,

As you can see that there is no document containing, Role:CTO. Now let’s have a look at the following code snippet

len(list(authors_table.find({'Country of Residence':"United States"})))
Output: 9
authors_table.delete_many({'Country of Residence':"United States"})len(list(authors_table.find({'Country of Residence':"United States"})))
Outut: 0

that’s done, we have deleted all the rows , where country of residence was the United States from the author's table.

You can find more examples on MongoDB's official documentation webpage.

Conclusion

That’s all about CRUD operations in MongoDB using PyMongo, although most of these operations are same to use in MongoShell, or in MongoDB Compass,

In the above entire article we have covered so far:

  • Connection to MongoDB Database
  • Creating database
  • Creating tables/data collections in database
  • Insert data rows, or we should say data documnets
  • Data query and how to print them in our terminal or output cell of a jupyter notebook.
  • Data query with filters
  • Update tables, update columns and replace values in a table
  • Delete data columns, rows, and delete data tables.

References

First of all a big thanks to Aakash N S, and Siddhant Ujjain, for helping me in this entire journey of leaning about data science. a big thanks to my Code Hunters team, who always guide me whenever I stuck at some points.

To learn more about MongoDB, you can follow along the following links

Thanks for reading the article. I hope, I helped you to learn MongoDB, and how to work with PyMongo. If you have any doubt you can reach out to me on my LinkedIn profile.

--

--

ANKIT PRATAP SINGH

Data Science | Data Analysis | Web-Scraping | Product Analysis