Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
351 views
in Technique[技术] by (71.8m points)

c# - Azure Cosmos DB (SQL API) SDK bulk insert optimization and performance

There are two SQL tables without FK but have one common property which I use to join them into a list where I get 1 300 000 records. That column(property) is not unique inside both tables (can occur more than once). I need to insert those 1 300 000 records from that joined memory list inside Azure Cosmos DB using v3 SDK and using a bulk insert for that purpose. Problem is that this bulk insert lasts for more than 10-15 minutes even though I have set the throughput 50 000 or 100 000. I have tried few approaches and none of them gave me better results. The only time when I got it working was when I joined both tables with LINQ into the parent-children list then I got around 2900 documents and bulk insert was around 33 seconds but then the size of the documents even though I optimized JSON serialization (shortening strings, dates, ignoring null values, etc.) was for some cases more than 2MB. When inserting 1 300 000 documents with a throughput of 100 000(testing) I also get Service Unavailable or TooManyRequest exceptions. I am using an emulator and I am new to azure cosmos.

Creating Singleton instance:

serviceCollection.AddSingleton(x =>
        {
            var primaryConnectionString = _solutionConfiguration.AzureCosmosDb.PrimaryConnectionString;
            if (string.IsNullOrEmpty(primaryConnectionString))
            {
                throw new InvalidOperationException(
                    "Please specify a valid CosmosDBConnection in the solutionsettings.json file or your Azure Functions Settings.");
            }

            return new CosmosClientBuilder(primaryConnectionString)
            .WithSerializerOptions(new CosmosSerializationOptions() { PropertyNamingPolicy = CosmosPropertyNamingPolicy.CamelCase, IgnoreNullValues = true })
            .WithRequestTimeout(TimeSpan.FromMinutes(2))
            .WithBulkExecution(true).Build();
        });

Container creation:

public async static Task CreatContainerForDatabase(CosmosClient cosmosClient, string databaseId, string containerId, string partitionKey)
    {
        Console.WriteLine(">>>>> Create New Container >>>>>");

        var containerProperties = new ContainerProperties
        {
            Id = containerId,
            PartitionKeyPath = partitionKey
        };

        var result = await cosmosClient.GetDatabase(databaseId).CreateContainerAsync(containerProperties, 150000);


        Console.WriteLine($"New container with id: {result.Container.Id} has been created");
    }

Insert to Cosmos db

 Console.WriteLine("Bulk insert!");

        var cost = 0D;
        var errors = 0;

        var documentsToCreate = boatsWithPricesFirstYear.Count();

        var executionStarted = DateTime.Now;
        var container = _cosmosClient.GetContainer(_configuration.AzureCosmosDb.DatabaseId, _configuration.AzureCosmosDb.ContainerId);

        var tasks = new List<Task>(documentsToCreate);

        foreach (var boat in boatsWithPricesFirstYear)
        {

            var result = container.CreateItemAsync(boat, new PartitionKey(boat.BoatId));

            tasks.Add(result.ContinueWith(t =>
            {
                if (t.Status == TaskStatus.RanToCompletion)
                {
                    cost += t.Result.RequestCharge;
                }
                else
                {
                    Console.WriteLine($"Error creating boat with id: {boat.BoatId} document: {t.Exception.Message}");
                    errors++;
                }
            }));

        }

        await Task.WhenAll(tasks);

        Console.WriteLine($"{documentsToCreate - errors} documents has been created at cost of: {cost:0.##} RUs in {DateTime.Now.Subtract(executionStarted)}");

        Console.ReadLine();

Also when I console log Request charge (RU/s) I get around 9 million RU/s for all (1 300 000) documents is there any way to reduce this charge when inserting items (I am aware that SQL API for writing purposes takes around 10.2 RU/s for one item if I am right). When I make this list which I am trying to insert in cosmos as a parent-children list (boat->parent, prices for every day->children) I get around 2900 documents with a request charge of 850 000 RU/s.

I have tried Microsoft example as well with stream https://docs.microsoft.com/en-us/azure/cosmos-db/tutorial-sql-api-dotnet-bulk-import but I got almost the same results.

If I lack some information I will update this post. Thanks in advance.

P.S. My local memory is also dying when bulk insert is in process.

question from:https://stackoverflow.com/questions/66061811/azure-cosmos-db-sql-api-sdk-bulk-insert-optimization-and-performance

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
Waitting for answers

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to OStack Knowledge Sharing Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...