MongoDB Java Driver Batch Insert in MySQL: A Comprehensive Tutorial

cihandcatingmembcr
Aug 18, 2023
5 min read

Document can be either a Clojure map (in the majority of cases, it is)or an instance of com.mongodb.DBObject (referred to later asDBObject). In case your application obtains DBObjects from otherlibraries (for example), you can insert those:

If you insert a document without the :_id key, MongoDB Java driverthat Monger uses under the hood will generate one foryou. Unfortunately, it does so by mutating the document you passit. With Clojure's immutable data structures, that won't work the wayMongoDB Java driver authors expected.

mongodb java driver batch insert in mysql

Download Zip

Sometimes you need to insert a batch of documents all at once and you need it to be done efficiently. MongoDB supports batchinserts feature. To do it with Monger, use monger.collection/insert-batch function:

Spring Initializr creates a simple class for the application. The following listing shows the class that Initializr created for this example (in src/main/java/com/example/accessingdatamongodb/AccessingDataMongodbApplication.java):

Now you need to modify the simple class that the Initializr created for you. You need to set up some data and use it to generate output. The following listing shows the finished AccessingDataMongodbApplication class (in src/main/java/com/example/accessingdatamongodb/AccessingDataMongodbApplication.java):

Optimistic locking assumes a low likelihood of record contention. It typically meansinserting a timestamp column in each database table that is used concurrently by both batch andonline processing. When an application fetches a row for processing, it also fetches thetimestamp. As the application then tries to update the processed row, the update uses theoriginal timestamp in the WHERE clause. If the timestamp matches, the data and thetimestamp are updated. If the timestamp does not match, this indicates that anotherapplication has updated the same row between the fetch and the update attempt. Therefore,the update cannot be performed.

This scheme involves the addition of a hash column (key or index) to the database tablesused to retrieve the driver record. This hash column has an indicator to determine whichinstance of the batch application processes this particular row. For example, if thereare three batch instances to be started, an indicator of 'A' marks a row forprocessing by instance 1, an indicator of 'B' marks a row for processing by instance 2,and an indicator of 'C' marks a row for processing by instance 3.

ItemWriter is similar in functionality to an ItemReader but with inverse operations.Resources still need to be located, opened, and closed but they differ in that anItemWriter writes out, rather than reading in. In the case of databases or queues,these operations may be inserts, updates, or sends. The format of the serialization ofthe output is specific to each batch job.

For example, consider a batch job that reads a file containing three different types ofrecords: records to insert, records to update, and records to delete. If record deletionis not supported by the system, we would not want to send any deletable records tothe ItemWriter. However, since these records are not actually bad records, we would want tofilter them out rather than skip them. As a result, the ItemWriter would receive onlyinsertable and updatable records.

Consider an example of a batch job that reads from the database and writes to a flat file.The test method begins by setting up the database with test data. It clears the CUSTOMERtable and then inserts 10 new records. The test then launches the Job by using thelaunchJob() method. The launchJob() method is provided by the JobLauncherTestUtilsclass. The JobLauncherTestUtils class also provides the launchJob(JobParameters)method, which lets the test give particular parameters. The launchJob() methodreturns the JobExecution object, which is useful for asserting particular informationabout the Job run. In the following case, the test verifies that the Job ended witha status of COMPLETED.

BATCH_JOB_INSTANCE, BATCH_JOB_EXECUTION, and BATCH_STEP_EXECUTION each containcolumns ending in _ID. These fields act as primary keys for their respective tables.However, they are not database generated keys. Rather, they are generated by separatesequences. This is necessary because, after inserting one of the domain objects into thedatabase, the key it is given needs to be set on the actual object so that they can beuniquely identified in Java. Newer database drivers (JDBC 3.0 and up) support thisfeature with database-generated keys. However, rather than require that feature,sequences are used. Each variation of the schema contains some form of the followingstatements:

The JDBC API defines a set of interfaces and classes that all major database providers adhere to in order allow Java developers to seamlessly connect to many Relational Database Management Systems (RDBMS). All major vendors provide their own JDBC drivers which contain a set of java classes that enables you to connect to that particular database.

Even with BulkMode.ORDERED, there can be other failures during this batch insertion for ex, network blips, server crashes etc. Thus, this approach is fine if we are bulk inserting thousands of rows (which would probably take a second or two). However, for inserting millions of records, it is best idea to batch this process using Spring Batch or a custom batching logic. The idea is that you want to be able to resume the insertion in an event of the failure. How the failure is handled totally depends on the type of failure occurred.

Here is an example of setting up the plugin to fetch data from a MySQL database.First, we place the appropriate JDBC driver library in our currentpath (this can be placed anywhere on your filesystem). In this example, we connect tothe mydb database using the user: mysql and wish to input all rows in the songstable that match a specific artist. The following examples demonstrates a possibleLogstash configuration for this. The schedule option in this example willinstruct the plugin to execute this input statement on the minute, every minute.

Generally when you have a list of objects or lots of data then probably it is not a good idea to insert or update individual record or object into database, because it makes lots of database calls. So to avoid such too many database calls we insert or update the records into batch and commit the transaction at the end of the batch execution.

I am trying to upgrade our MongoDB 4.2.1 replica set to 4.4.2. Our Spring Boot 2.4 application with Java driver 4.1.1 uses multi-documents transactions and manipulates lots of documents in the same transaction (batch writting).

Lets say I have a CSV file that contain 100 records and 22 column. Is using batch update is suitable or is there another way to insert the data, 1 record per times,update it until it finish? I hope you understand what I want to say.Sorry for the trouble.

Hi Viral, First of all i have to thank for you valuable posts. Here is the problem i have faced while inserting csv file value line by line , insert into postgres table using java. Where i have the lakhs of records in my csv file. The postgres table headers was not the exact match of csv headers so i have parse each value and populated the table. This process will take very long time to insert single file into the table. Any suggestions would be greatly appreciated. thank you. 2ff7e9595c

MongoDB Java Driver Batch Insert in MySQL: A Comprehensive Tutorial

mongodb java driver batch insert in mysql

Recent Posts

Comments

Contact Us