About us
Our services

Capabilities

Legacy Modernization
Data Platforms
AI & Advanced Analytics

Industries

Automotive
Finance
Manufacturing

Solutions

Databoostr

Data Sharing & Monetization Platform

Cloudboostr

Multicloud Enterprise Kubernetes

Looking for something else?

Contact us for tailored solutions and expert guidance.

Contact
Case studies
Resources

Resources

Blog

Read our blog and stay informed about the industry’s latest trends and technology.

Ready to find your breaking point?

Stay updated with our newsletter.

Subscribe

Insights

Ebooks

Explore our resources and learn about building modern software solutions from experts and practitioners.

Read more
Careers
Contact
Blog
Software development

Cloud development overview for non-cloud developers

Andrii Biehunov
Expert Software Engineer
October 21, 2025
•
5 min read

Table of contents

Heading 2
Heading 3
Heading 4
Heading 5
Heading 6

Schedule a consultation with software experts

Contact us

 Introduction

This article covers basic concepts of web applications that are designed to be run in Cloud environment and are intended for software engineers who are not familiar with  Cloud Native development but work with other programming concepts/technologies. The article gives an overview of the basics from the perspective of concepts that are already known to non-cloud developers including mobile and desktop software engineers.

 Basic concepts

Let’s start with something simple. Let’s imagine that we want to write a web application that allows users to create an account, order the products and write reviews on them. The simplest way is to have our backend app as a single app combining UI and code. Alternatively, we may split it frontend and into the backend, which just provides API.

Let’s focus on the backend part. The whole communication between its components happens inside of a single app, on a code level. From the executable file perspective, our app is a monolithic piece of code: it’s a single file or package. Everything looks simple and clean: the code is split into several logical components, each component has its own layers. The possible overall architecture may look as follows:

But as we try to develop our app we'll quickly figure out that the above approach is not enough in the modern world and modern web environment. To understand what's wrong with the app architecture we need to figure out the key specificity of web apps compared to desktop or mobile apps. Let’s describe quite simple yet very important points. While being obvious to some (even non-web) developers the points are crucial for understanding essential flaws of our app while running in the modern server environment.

Desktop or mobile app runs on the user's device. This means that each user has their own app copy running independently. For web apps, we have the opposite situation. In a simplified way, in order to use our app user connects to a server and utilizes an app instance that runs on that server. So, for web apps, all users are using a single instance of the app. Well, in real-world examples it's not strictly a single instance in most cases because of scaling. But the key point here is that the number of users, in a particular moment of time is way greater than the number of app instances. In consequence, app error or crash has incomparably bigger user impact for web apps. I.e., when a desktop app crashes, only a single user is impacted. Moreover, since the app runs on their device they may just restart the app and continue using it. In case of a web app crash, thousands of users may be impacted. This brings us to two important requirements to consider.

  1.  Reliability and testability  
     Since all the code is placed in a single (physical) app our changes to one component during development of the new features may impact any other existing app component. Hence, after implementing a single feature we have to retest the whole app. If we have some bug in our new code that leads to a crash, once the app crashes it becomes unavailable to all the users. Before we figure out the crash we have some downtime when users cannot use the app. Moreover to prevent further crashes we have to roll back to a previous app version. And if we delivered some fixes/updates along with the new feature we’ll lose those improvements.
  2.  Scalability  
     Consider the number of users is increased during a short period. In case of our example app, this may happen due to, e.g., discounts or new attractive products coming in. It quickly turns out that one app instance running is not enough. We have too many requests and app “times out” requests it cannot handle. We may just increase the number of running instances of the app. Hence, each instance will independently handle user orders. But after a closer look, it turns out that we actually don’t need to scale the whole app. The only part of the app that needs to handle more requests is creating and storing orders for a particular product. The rest of the app doesn’t need to be scaled. Scaling other components will result in unneeded memory growth. But since all the components are contained in a monolith (single binary) we can only scale all of them at once by launching new instances.

The other thing to consider is network latency which adds important limitations compared to mobile or desktop apps. Even though the UI layer itself runs directly in the browser (javascript), any heavy computation or CRUD operation requires http call. Since such network calls are relatively slow (compared to interactions between components in code) we should optimize the way we work with data and some server-side computations.

Let’s try to address the issues we described above.

Microservices

Let’s make a simple step and split our app into a set of smaller apps called microservices. The diagram below illustrates the general architecture of our app rethinks using microservices.

This helps us solve the problems of monolithic apps and has some additional advantages.

• Implementing a new feature (component) results in adding a new service or modifying the existing one. This reduces the complexity of the development and increases testability. If we have a critical bug we will simply disable that service while the other app parts will still work (excluding the parts that require interaction with the disabled service) and contain any other changes/fixes not related to the new feature.

• When we need to scale the app we may do it only for a particular component. E.g., if a number of purchases increase we may increment the number of running instances of Order Service without touching other ones.

• Developers in a team can work fully independently while developing separate microservices. We’re also not limited by a single language. Each microservice may be written in a different language.

• Deployment becomes easier. We may update and deploy each microservice independently. Moreover, we can use different server/cloud environments for different microservices. Each service can use its own third-party dependency services like a database or message broker.

Besides its advantages, microservice architecture brings additional complexity that is driven by the nature of microservice per se: instead of a single big app, we now have multiple small applications that have to communicate with each other through a network environment.

In terms of desktop apps, we may bring up here the example of inter-process communication, or IPC. Imagine that a desktop app is split into several smaller apps, running independently on our machine. Instead of calling methods of different app modules within a single binary we now have multiple binaries. We have to design a protocol of communication between them (e.g., based on OS native IPC API), we have to consider the performance of such communication, and so on. There may be several instances of a single app running at the same time on our machine. So, we should find out a way to determine the location of each app within the host OS.

The described specificity is very similar to what we have with microservices. But instead of running on a single machine microservice apps run in a network which adds even more complexity. On the other hand, we may use already existing solutions, like http for communicating between services (which is how microservices communicate in most cases) and RESTful API on top of it.

The key thing to understand here is that all the basic approaches described below are introduced mainly to solve the complexity resulting from splitting a single app into multiple microservices.

Locating microservices

Each microservice that calls API of another microservice (often called client service) should know its location. In terms of calling REST API using http the location consists of address and port. We can hardcode the location of the callee in the caller configuration files or code. But the problem is that can be instantiated, restarted, or moved independently of each other. So, hardcoding is not a solution as if the callee service location is changed the caller will have to be restarted or even recompiled. Instead, we may use Service Registry pattern.

To put it simply, Service Registry is a separate application that holds a table that maps a service id to its location. Each service is registered in Service Registry on startup and deregistered on shutdown. When client service needs to discover another service it gets the location of that service from the registry. So, in this model, each microservice doesn’t know the concrete location of its callee services but just their ids. Hence, if a certain service changes its location after restart the registry is updated and its client services will be able to get this new location.

Service discovery using a Service registry may be done in two ways.

1. Client-side service discovery. Service gets the location of other services by directly querying the registry. Then calls discovered the service’s API by sending a request to that location. In this case, each service should know the location of the Service Registry. Thus, its address and port should be fixed.

2. Server-side service discovery. Service may send API call requests along with service id to a special service called Router. Router retrieves the actual location of the target service and forwards the request to it. In this case, each service should know the location of the Router.

Communicating with microservices

So, our application consists of microservices that communicate. Each has its own API. The client of our microservices (e.g., frontend or mobile app) should use that API. But such usage becomes complicated even for several microservices. Another example, in terms of desktop interprocess communication, imagines a set of service apps/daemons that manage the file system. Some may run constantly in the background, some may be launched when needed. Instead of knowing details related to each service, e.g., functionality/interface, the purpose of each service, whether or not it runs, we may use a single facade daemon, that will have a consistent interface for file system management and will internally know which service to call.

Referring back to our example with the e-shop app consider a mobile app that wants to use its API. We have 5 microservices, each has its own location. Remember also, that the location can be changed dynamically. So, our app will have to figure out to which services particular

requests should be sent. Moreover, the dynamically changing location makes it almost impossible to have a reliable way for our client mobile app to determine the address and port of each service.

The solution is similar to our previous example with IPC on the desktop. We may deploy one service at a fixed known location, that will accept all the requests from clients and forward each request to the appropriate microservice. Such a pattern is called API Gateway.

Below is the diagram demonstrating how our example microservices may look like using Gateway:

Additionally, this approach allows unifying communication protocol. That is, different services may use different protocols. E.g., some may use REST, some AMQP, and so on. With API Gateway these details are hidden from the client: the client just queries the Gateway using a single protocol (usually, but not necessarily REST) and then the Gateway translates those requests into the appropriate protocol a particular microservice uses.

Configuring microservices

When developing a desktop or mobile app we have several devices the app should run on during its lifecycle. First, it runs on the local device (either computer or mobile device/simulator in case of mobile app) of the developers who work on the app. Then it’s usually run on some dev device to perform unit tests as part of CI/CD. After that, it’s installed on a test device/machine for either manual or automated testing. Finally, after the app is released it is installed on users’ machines/devices. Each type of device

(local, dev, test, user) implies its own environment. For instance, a local app usually uses dev backend API that is connected to dev database. In the case of mobile apps, you may even develop using a simulator, that has its own specifics, like lack or limitation of certain system API. The backend for the app’s test environment has DB with a configuration that is very close to the one used for the release app. So, each environment requires a separate configuration for the app, e.g., server address, simulator specific settings, etc. With a microservices-based web app, we have a similar situation. Our microservices usually run in different environments. Typically they are dev, test, staging, and production. Hardcoding configuration is no option for our microservices, as we typically move the same app package from one environment to another without rebuilding it. So, it’s natural to have the configuration external to the app. At a minimum, we may specify a configuration set per each environment inside the app. While such an approach is good for desktop/mobile apps it has provides a limitation for a web app. We typically move the same app package/file from one environment to another without recompiling it. A better approach is to externalize our configuration. We may store configuration data in database or external files that are available to our microservices. Each microservice reads its configuration on startup. The additional benefit of such an approach is that when the configuration is updated the app may read it on the fly, without the need for rebuilding and/or redeploying it.

Choosing cloud environment

We have our app developed with a microservices approach. The important thing to consider is where would we run our microservices. We should choose the environment that allows us to take advantage of microservice architecture. For cloud solutions, there are two basic types of environment: Infrastructure as a Service, or IaaS, and Platform as a Service, or PaaS. Both have ready-to-use solutions and features that allow scalability, maintainability, reliability which require much effort to achieve on on-premises. and Each of them has advantages compared to traditional on-premises servers.

Summary

In this article, we’ve described key features of microservices architecture for the cloud-native environment. The advantages of microservices are:

- app scalability;

- reliability;

- faster and easier development

- better testability.

To fully take advantage of microservice architecture we should use IaaS or PasS cloud environment type.

Grape Up guides enterprises on their data-driven transformation journey

Ready to ship? Let's talk.

Check our offer
Blog

Check related articles

Read our blog and stay informed about the industry's latest trends and solutions.

Software development

How to manage an M5Stack Core2 for AWS. Part 3 – best of Micropython and C meld together

In the first part of the article , we’ve covered Micropython usage with UiFlow and VS Code environments. In the second one , we use C/C++ with more granular device control. This time, we’ll try to run Micropython with an external library added.

Micropython with interactive console

There are two projects that enable Micropython for ESP32 to run external libraries. The first one is M5Stack-official https://github.com/m5stack/Core2forAWS-MicroPython , and the second one is unofficial https://github.com/mocleiri/tensorflow-micropython-examples . The common part is – you can run an interactive Micropython console on the controller using the serial port. Unfortunately, this is the only way to go. There is no IDE and you can’t upload a complex, multi-file application.

To open the console, you can simply use Putty and connect to the proper COM port (COM3, in my case) with 115200 speed.

To run the first project, the best way is to follow the official README documentation, but there is a bug in the code here:

https://github.com/m5stack/Core2forAWS-MicroPython/blob/master/ports/esp32/makelfs2.py#L20 .

One file is opened in ‘w’ (write) mode and another in ‘rb’ (read bytes). You need to change ‘w’ to ‘wb’ to run any example from the readme. It’s a good codebase because it’s small and M5Stack official. It contains upip, so you can include more official libraries after connecting to the Internet. You can also extend the codebase with more libraries before the build (some extra libraries are available in another official repository https://github.com/m5stack/micropython-lib ). However, TensorFlow is a complex library with multiple dependencies, so using the unofficial project is easier.

The Tensorflow Micropython Examples project offers pre-built images to download directly from GitHub. For our controller, you need the ESP32 version (no ESP32 S3) for 16MB memory.

Just open the GitHub Actions page https://github.com/mocleiri/tensorflow-micropython-examples/actions/workflows/build_esp32.yml , pick the newest green build and download the latest version.

Micropython and C++

Then extract the zip package and burn it using command ‘esptool.py -p /dev/ttyUSB0 -b 460800 –before default_reset –after hard_reset –chip esp32 write_flash –flash_mode dio –flash_size detect –flash_freq 40m 0x1000 bootloader/bootloader.bin 0x8000 partition_table/partition-table.bin 0x10000 micropython.bin’ . Ensure you have passed the USB port to WSL using usbip and allowed all users to use it with chmod (see the first part for details).

With this project, you can import the microlite library, which is exactly the one you need - TensorFlow Lite for Microcontrollers. If you want to extend this project with your own libraries, you can add those to the cloned source code and build it with the following commands:

git submodule init

git submodule update --recursive

cd micropython

git submodule update --init lib/axtls

git submodule update --init lib/berkeley-db-1.xx

cd ..

source ./micropython/tools/ci.sh && ci_esp32_setup_helper v4.3.1

source ./esp-idf/export.sh #as you can see, esp-idf is already included

pip3 install Pillow

pip3 install Wave

rm -rf ./micropython-modules/microlite/tflm

cd ./tensorflow

../micropython-modules/microlite/prepare-tflm-esp.sh

cd ../micropython

make -C mpy-cross V=1 clean all

cd ../boards/esp32/MICROLITE_SPIRAM_16M

rm -rf build

idf.py clean build

Your binaries are located under the boards/esp32/MICROLITE_SPIRAM_16M/build directory.

This way, you can add more libraries or your own program to the image, but you still need to use the interactive console to run it.

Summary

This three-part workshop aimed to utilize M5Stack Core2 for AWS controller with AWS IoT connection and ML included. Let’s try to sum up all possible ways to do that.

Language Management method Pros Cons Micropython UiFlow Python Low learning curve/easy UI OTA updates No Tensorflow available Not adequate for complex use cases VS Code with vscode-m5stack-mpi plugin Python Full IDE (Visual Studio Code) No Tensorflow available Tensorflow Micropython Examples Micropython with TensorFlow Lite for Microcontrollers Capability to build the project with more libraries or custom code included Necessity to run the code from interactive Python console. C/C++ VS Code with PlatformIO and FreeRTOS All libraries available Complex code (C/C++) Complex configuration Arduino IDE All libraries available Easy and powerful enough IDE Arduino control loop limitation ESP-IDF Small solution, without even a dedicated IDE (plugins for CS Code or Eclipse available) Complex dependency Management

As you can see, we’ve tried various approaches and discovered their advantages and disadvantages. Now, you can decide if you wish to have complete control and use pure C/C++, or maybe you prefer much more friendly Python. You can choose which IDE or at least plugin you’d like to use, and whether you want to utilize OTA to update an entire firmware or only to transfer data between your devices and the cloud.

Read more
Software development

Accelerating data projects with parallel computing

Inspired by Petabyte Scale Solutions from CERN

The Large Hadron Collider (LHC) accelerator is the biggest device humankind has ever created. Handling enormous amounts of data it produces has required one of the biggest computational infrastructures on the earth. However, it is quite easy to overwhelm even the best supercomputer with inefficient algorithms that do not correctly utilize the full power of underlying, highly parallel hardware. In this article, I want to share insights born from my meeting with the CERN people, particularly how to validate and improve parallel computing in the data-driven world.

Struggling with data on the scale of megabytes (10 6 ) to gigabytes (10 9 ) is the bread and butter for data engineers, data scientists, or machine learning engineers. Moving forward, the terabyte (10 12 ) and petabyte (10 15 ) scale is becoming increasingly ordinary, and the chances of dealing with it in everyday data-related tasks keep growing. Although the claim "Moore's law is dead!" is quite a controversial one, the fact is that single-thread performance improvement has slowed down significantly since 2005. This is primarily due to the inability to increase the clock frequency indefinitely. The solution is parallelization - mainly by an increase in the numbers of logical cores available for one processing unit.

Microprocessor trends
Source: https://www.researchgate.net/figure/50-years-of-microprocessor-trend-data-6_fig1_355494155

Knowing it, the ability to properly parallelize computations is increasingly important.

In a data-driven world, we have a lot of ready-to-use, very good solutions that do most of the parallel stuff on all possible levels for us and expose easy-to-use API. For example, on a large scale, Spark or Metaflow are excellent tools for distributed computing; at the other end, NumPy enables Python users to do very efficient matrix operations on the CPU, something Python is not good at all, by integrating C, C++, and Fortran code with friendly snake_case API. Do you think it is worth learning how it is done behind the scenes if you have packages that do all this for you? I honestly believe this knowledge can only help you use these tools more effectively and will allow you to work much faster and better in an unknown environment.

The LHC lies in a tunnel 27 kilometers (about 16.78 mi) in circumference, 175 meters (about 574.15 ft) under a small city built for that purpose on the France–Switzerland border. It has four main particle detectors that collect enormous amounts of data: ALICE, ATLAS, LHCb, and CMS. The LHCb detector alone collects about 40 TB of raw data every second. Many data points come in the form of images since LHCb takes 41 megapixels resolution photos every 25 ns. Such a huge amount of data must be somehow compressed and filtered before permanent storage. From the initial 40 TB/s, only 10G GB/s are saved on disk – the compression ratio is 1:4000!

It was a surprise for me that about 90% of CPU usage in LHCb is done on simulation. One may wonder why they simulate the detector. One of the reasons is that a particle detector is a complicated machine, and scientists at CERN use, i.e., Monte Carlo methods to understand the detector and the biases. Monte Carlo methods can be suitable for massively parallel computing in physics.

View of CERN
Source: cern.org

Let us skip all the sophisticated techniques and algorithms used at CERN and focus on such aspects of parallel computing, which are common regardless of the problem being solved. Let us divide the topic into four primary areas:

- SIMD,

- multitasking and multiprocessing,

- GPGPU,

- and distributed computing.

The following sections will cover each of them in detail.

SIMD

The acronym SIMD stands for Single Instruction Multiple Data and is a type of parallel processing in Flynn's taxonomy .

Single Instruction Multiple Data SIMD for parallel computing

In the data science world, this term is often so-called vectorization. In practice, it means simultaneously performing the same operation on multiple data points (usually represented as a matrix). Modern CPUs and GPGPUs often have dedicated instruction sets for SIMD; examples are SSE and MMX. SIMD vector size has significantly increased over time.

Publishers of the SIMD instruction sets often create language extensions (typically using C/C++) with intrinsic functions or special datatypes that guarantee vector code generation. A step further is abstracting them into a universal interface, e.g., std::experimental::simd from C++ standard library. LLVM's (Low Level Virtual Machine) libcxx implements it (at least partially), allowing languages based on LLVM (e.g., Julia, Rust) to use IR (Intermediate Representation – code language used internally for LLVM's purposes) code for implicit or explicit vectorization. For example, in Julia, you can, if you are determined enough, access LLVM IR using macro @code_llvm and check your code for potential automatic vectorization.

In general, there are two main ways to apply vectorization to the program:

- auto-vectorization handled by compilers,

- and rewriting algorithms and data structures.

For a dev team at CERN, the second option turned out to be better since auto-vectorization did not work as expected for them. One of the CERN software engineers claimed that "vectorization is a killer for the performance." They put a lot of effort into it, and it was worth it. It is worth noting here that in data teams at CERN, Python is the language of choice, while C++ is preferred for any performance-sensitive task.

How to maximize the advantages of SIMD in everyday practice? Difficult to answer; it depends, as always. Generally, the best approach is to be aware of this effect every time you run heavy computation. In modern languages like Julia or best compilers like GCC, in many cases, you can rely on auto-vectorization. In Python, the best bet is the second option, using dedicated libraries like NumPy. Here you can find some examples of how to do it.

Below you can find a simple benchmarking presenting clearly that vectorization is worth attention.

import numpy as np

from timeit import Timer



# Using numpy to create a large array of size 10**6

array = np.random.randint(1000, size=10**6)



# method that adds elements using for loop

def add_forloop():

new_array = [element + 1 for element in array]



# Method that adds elements using SIMD

def add_vectorized():

new_array = array + 1



# Computing execution time

computation_time_forloop = Timer(add_forloop).timeit(1)

computation_time_vectorized = Timer(add_vectorized).timeit(1)



# Printing results

print(execution_time_forloop) # gives 0.001202600

print(execution_time_vectorized) # gives 0.000236700

Multitasking and Multiprocessing

Let us start with two confusing yet important terms which are common sources of misunderstanding:

- concurrency: one CPU, many tasks,

- parallelism: many CPUs, one task.

Multitasking is about executing multiple tasks concurrently at the same time on one CPU. A scheduler is a mechanism that decides what the CPU should focus on at each moment, giving the impression that multiple tasks are happening simultaneously. Schedulers can work in two modes:

- preemptive,

- and cooperative.

A preemptive scheduler can halt, run, and resume the execution of a task. This happens without the knowledge or agreement of the task being controlled.

On the other hand, a cooperative scheduler lets the running process decide when the processes voluntarily yield control or when idle or blocked, allowing multiple applications to execute simultaneously.

Switching context in cooperative multitasking can be cheap because parts of the context may remain on the stack and be stored on the higher levels in the memory hierarchy (e.g., L3 cache). Additionally, code can stay close to the CPU for as long as it needs without interruption.

On the other hand, the preemptive model is good when a controlled task behaves poorly and needs to be controlled externally. This may be especially useful when working with external libraries which are out of your control.

Multiprocessing is the use of two or more CPUs within a single Computer system. It is of two types:

- Asymmetric - not all the processes are treated equally; only a master processor runs the tasks of the operating system.

- Symmetric - two or more processes are connected to a single, shared memory and have full access to all input and output devices.

I guess that symmetric multiprocessing is what many people intuitively understand as typical parallelism.

Below are some examples of how to do simple tasks using cooperative multitasking, preemptive multitasking, and multiprocessing in Python. The table below shows which library should be used for each purpose.

- Cooperative multitasking example:

import asyncio

import sys

import time



# Define printing loop

async def print_time():

while True:

print(f"hello again [{time.ctime()}]")

await asyncio.sleep(5)



# Define stdin reader

def echo_input():

print(input().upper())



# Main function with event loop

async def main():



asyncio.get_event_loop().add_reader(

sys.stdin,

echo_input

)

await print_time()



# Entry point

asyncio.run(main())





Just type something and admire the uppercase response.

- Preemptive multitasking example:

import threading

import time



# Define printing loop

def print_time():

while True:

print(f"hello again [{time.ctime()}]")

time.sleep(5)



# Define stdin reader

def echo_input():

while True:

message = input()

print(message.upper())



# Spawn threads

threading.Thread(target=print_time).start()

threading.Thread(target=echo_input).start()

The usage is the same as in the example above. However, the program may be less predictable due to the preemptive nature of the scheduler.

- Multiprocessing example:

import time

import sys

from multiprocessing import Process



# Define printing loop

def print_time():

while True:

print(f"hello again [{time.ctime()}]")

time.sleep(5)



# Define stdin reader

def echo_input():

sys.stdin = open(0)

while True:

message = input()

print(message.upper())



# Spawn processes

Process(target=print_time).start()

Process(target=echo_input).start()

Notice that we must open stdin for the echo_input process because this is an exclusive resource and needs to be locked.

In Python, it may be tempting to use multiprocessing anytime you need accelerated computations. But processes cannot share resources while threads / asyncs can. This is because a process works with many CPUs (with separate contexts) while threads / asyncs are stuck to one CPU. So, you must use synchronization primitives (e.g., mutexes or atomics), which complicates source code. No clear winner here; only trade-offs to consider.

Although that is a complex topic, I will not cover it in detail as it is uncommon for data projects to work with them directly. Usually, external libraries for data manipulation and data modeling encapsulate the appropriate code. However, I believe that being aware of these topics in contemporary software is particularly useful knowledge that can significantly accelerate your code in unconventional situations.

You may find other meanings of the terminology used here. After all, it is not so important what you call it but rather how to choose the right solution for the problem you are solving.

GPGPU

General-purpose computing on graphics processing units (GPGPU) utilizes shaders to perform massive parallel computations in applications traditionally handled by the central processing unit.

In 2006 Nvidia invented Compute Unified Device Architecture (CUDA) which soon dominated the machine learning models acceleration niche. CUDA is a computing platform and offers API that gives you direct access to parallel computation elements of GPU through the execution of computer kernels.

Returning to the LHCb detector, raw data is initially processed directly on CPUs operating on detectors to reduce network load. But the whole event may be processed on GPU if the CPU is busy. So, GPUs appear early in the data processing chain.

GPGPU's importance for data modeling and processing at CERN is still growing. The most popular machine learning models they use are decision trees (boosted or not, sometimes ensembled). Since deep learning models are harder to use, they are less popular at CERN, but their importance is still rising. However, I am quite sure that scientists worldwide who work with CERN's data use the full spectrum of machine learning models.

To accelerate machine learning training and prediction with GPGPU and CUDA, you need to create a computing kernel or leave that task to the libraries' creators and use simple API instead. The choice, as always, depends on what goals you want to achieve.

For a typical machine learning task, you can use any machine learning framework that supports GPU acceleration; examples are TensorFlow, PyTorch, or cuML , whose API mirrors Sklearn's. Before you start accelerating your algorithms, ensure that the latest GPU driver and CUDA driver are installed on your computer and that the framework of choice is installed with an appropriate flag for GPU support. Once the initial setup is done, you may need to run some code snippet that switches computation from CPU (typically default) to GPU. For instance, in the case of PyTorch, it may look like that:

import torch



torch.cuda.is_available()

def get_default_device():

if torch.cuda.is_available():

return torch.device('cuda')

else:

return torch.device('cpu')

device = get_default_device()

device

Depending on the framework, at this point, you can process as always with your model or not. Some frameworks may require, e. g. explicit transfer of the model to the GPU-specific version. In PyTorch, you can do it with the following code:

net = MobileNetV3()

net = net.cuda()

At this point, we usually should be able to run .fit(), .predict(), .eval(), or something similar. Looks simple, doesn't it?

Writing a computing kernel is much more challenging. However, there is nothing special about computing kernel in this context, just a function that runs on GPU.

Let's switch to Julia; it is a perfect language for learning GPU computing. You can get familiar with why I prefer Julia for some machine learning projects here . Check this article if you need a brief introduction to the Julia programming language.

Data structures used must have an appropriate layout to enable performance boost. Computers love linear structures like vectors and matrices and hate pointers, e. g. in linked lists. So, the very first step to talking to your GPU is to present a data structure that it loves.

using Cuda



# Data structures for CPU

N = 2^20

x = fill(1.0f0, N) # a vector filled with 1.0

y = fill(2.0f0, N) # a vector filled with 2.0



# CPU parallel adder

function parallel_add!(y, x)

Threads.@threads for i in eachindex(y, x)

@inbounds y[i] += x[i]

end

return nothing

end



# Data structures for GPU

x_d = CUDA.fill(1.0f0, N)

# a vector stored on the GPU filled with 1.0

y_d = CUDA.fill(2.0f0, N)

# a vector stored on the GPU filled with 2.0



# GPU parallel adder

function gpu_add!(y, x)

CUDA.@sync y .+= x

return

end

GPU code in this example is about 4x faster than the parallel CPU version. Look how simple it is in Julia! To be honest, it is a kernel imitation on a very high level; a more real-life example may look like this:

function gpu_add_kernel!(y, x)

index = (blockIdx().x - 1) * blockDim().x + threadIdx().x

stride = gridDim().x * blockDim().x

for i = index:stride:length(y)

@inbounds y[i] += x[i]

end

return

end

The CUDA analogs of threadid and nthreads are called threadIdx and blockDim. GPUs run a limited number of threads on each streaming multiprocessor (SM). The recent NVIDIA RTX 6000 Ada Generation should have 18,176 CUDA Cores (streaming processors). Imagine how fast it can be even compared to one of the best CPUs for multithreading AMD EPYC 7773X (128 independent threads). By the way, 768MB L3 cache (3D V-Cache Technology) is amazing.

Distributed Computing

The term distributed computing, in simple words, means the interaction of computers in a network to achieve a common goal. The network elements communicate with each other by passing messages (welcome back cooperative multitasking). Since every node in a network usually is at least a standalone virtual machine, often separate hardware, computing may happen simultaneously. A master node can split the workload into independent pieces, send them to the workers, let them do their job, and concatenate the resulting pieces into the eventual answer.

distributed computing model in parallel computing

The computer case is the symbolic border line between the methods presented above and distributed computing. The latter must rely on a network infrastructure to send messages between nodes, which is also a bottleneck. CERN uses thousands of kilometers of optical fiber to create a huge and super-fast network for that purpose. CERN's data center offers about 300,000 physical and hyper-threaded cores in a bare-metal-as-a-service model running on about seventeen thousand servers. A perfect environment for distributed computing.

Moreover, since most data CERN produces is public, LHC experiments are completely international - 1400 scientists, 86 universities, and 18 countries – they all create a computing and storage grid worldwide. That enables scientists and companies to run distributed computing in many ways.

CERN inside
Source: home.cern

Although this is important, I will not cover technologies and distributed computing methods here. The topic is huge and very well covered on the internet. An excellent framework recommended and used by one of the CERN scientists is Spark + Scala interface. You can solve almost every data-related task using Spark and execute the code in a cluster that distributes computation on nodes for you.

Spark interface framework for parallel computing
Source: Databricks

Ok, the only piece of advice: be aware of how much data you send to the cluster - transferring big data can ruin all the profit from distributing the calculations and cost you a lot of money.

Another excellent tool for distributed computation on the cloud is Metaflow. I wrote two articles about Metaflow: introduction and how to run a simple project . I encourage you to read and try it.

Conclusions

CERN researchers have convinced me that wise parallelization is crucial to achieving complex goals in the contemporary Big Data world. I hope I managed to infect you with this belief. Happy coding!

Read more
Software development

How to manage an M5Stack Core2 for AWS. Part 2 – C/C++

The first article discussed M5Stack management based on the Micropython language. Now, we need to dive much deeper into a rabbit hole. Let's try to use C and C++ only. The most important advantage of using C is the possibility of full, low-lever control of all controller aspects. The most important disadvantage of using C is the necessity of full, low-lever control of all controller aspects. Well… with great power comes great responsibility.

FreeRTOS

AWS FreeRTOS is a real-time operating system dedicated to AWS cloud and resource-constrained devices.There is a lot of code to write this time, so we'll use an example directly from AWS. There is no need to burn any firmware with the burning tool; however, we still need to pass the USB port to the WSL environment using usbip, as we've done in the "Micropython" sectionof the first chapter.You can download the code we're going to use from https://github.com/m5stack/Core2-for-AWS-IoT-EduKit.git. The only subdirectory we need is Blinky-Hello-World, but the repository is really small, so using a sparse checkout is pointless, and you can simply clone the entire repo.Open VSCode and install a plugin called PlatformIO. There is a bug in PlatformIO, so you can't see any files from your WSL environment using the PlatformIO browser ( Windows WSL: I can't open any files(projects) in the PIO open browser. · Issue #2316 · platformio/platformio-home · GitHub). To fix it, close VSCode, edit ~/.platformio/packages/contrib-piohome/main.*.min.jsfile in Windows, replace "\\": "/"with "/": "/", and open VSCode again.To verify the connection between PlatformIO and your controller, open PlatformIO from the very left menu and then pick "Devices" from the main left menu. You should see /dev/ttyUSB0in the center part of the screen. Please remember to pass the USB device to WSL using usbipand to allow all users to use the port with chmod.If everything looks good so far, you can open the Blinky-Hello-World directory (not the entire cloned repository) as a project from the PlatformIO home screen. Now you can follow the essential elements of the official instruction provided below.You need to have AWS CLI v2 installed on your machine. If you don't, you can install it using the official manual: Installing or updating the latest version of the AWS CLI - AWS Command Line Interface (amazon.com)Now ensure you have a valid token, and you can interact with your AWS account using CLI (I propose listing some resources as the verification, e.g., aws s3 ls).We will use the built-in script to create a Thing in AWS IoT. Just open a terminal using PlatformIO (standard bash terminal won't work, so you need to open it from Miscellaneous -> New Terminal from the main PlatformIO menu in VSC), make sure you're in Blinky-Hello-World directory, and run pio run -e core2foraws-device_reg -t register thing. The script will create the Thing and download the necessary certificate/key files. You can do it manually if you don't trust such scripts; however, this one is created by the AWS team, so I believe it's trustworthy.In the AWS IoT console, go to Manage -> All devices -> Things and see the new Thing created by the script. The Thing name is autogenerated. In my case, it's 0123FAA32AD40D8501.OK, the next step is to allow the device to connect to the Internet. There is another script to help you with this task. Call pio run ‐‐environment core2foraws ‐‐target menuconfig. You'll see a simple menu. Navigate to AWS IoT EduKit Configuration and set up WiFi SSID abd WiFi Password. Be aware that your network's SSID and password will be stored as plaintext in a few files in your code now.Let's build the application. Just call pio run ‐‐environment core2forawsfrom the PlatformIO terminal and then pio run ‐‐environment core2foraws ‐‐target upload ‐‐target monitorto run it on your device and monitor logs.Now you can use the MQTT test client from the AWS IoT console to send anything to <<thing name>>/blinktopic. In my case, it's 0123FAA32AD40D8501/blink. The message payload doesn't matter for this example. Just send something to start blinking and send anything again to stop it.As you can see, we have done a lot just to communicate between AWS Cloud and the controller. It was much simpler with Micropython and even more with UiFlow. However, C is much more powerful, and what's most important here, we can extend it with libraries.

TensorFlow Lite for Microcontrollers

TensorFlow is an end-to-end open-source platform for machine learning. TensorFlow Lite is a library for deploying models on mobile, microcontrollers, and other edge devices.TensorFlow Lite for Microcontrollers is just a lightweight version of TensorFlow Lite designed to run machine learning models on microcontrollers and other devices with only a few kilobytes of memory. The core runtime fits in 16 KB on an Arm Cortex M3 and can run many basic models. It doesn't require operating system support, any standard C or C++ libraries, or dynamic memory allocation.TensorFlow Lite is not designed to work on ESP32 processors, so the only one available for M5Stack is TensorFlow Lite for Microcontrollers. It has some limitations – it supports just a limited subset of TensorFlow operations and devices, it requires manual memory management in Low-level C++ API, and it doesn't support on-device training. Therefore, to build a "learning at the edge" solution, you need a more powerful IoT Edge device, e.g. Raspberry Pi.But you can still run ML models on the M5Stack controller.Now, let's try to modify our Blinky-Hello-World to add the TensorFlow Lite for the Microcontrollers library.

TensorFlow Lite for Microcontrollers in FreeRTOS

The first issue to solve is where to get the TensorFlow source code from. In the main TensorFlow repository, you can find information that it's moved to a standalone one ( https://github.com/tensorflow/tensorflow/tree/master/tensorflow/lite/micro) even if most of the documentation and examples still point there. The standalone repository used to contain a makefile for ESP32, but it seems to be deleted when moving examples to yet another repository ( https://github.com/tensorflow/tflite-micro/commit/66cfa623cbe1c1ae3fcc8a4903e9fed1a345548a). Today, the best source seems to be this repository: https://github.com/espressif/tflite-micro-esp-examples/tree/master/components.We'll need tfline-lib, but it doesn't work without esp-nn, so you should copy both to the components directory in your Blinky-Hello-World project.Let's modify our code, starting from including tensorflow headers at the beginning of the main.c file.[code]#include &quot;tensorflow/lite/micro/all_ops_resolver.h&quot;#include &quot;tensorflow/lite/micro/micro_error_reporter.h&quot;#include &quot;tensorflow/lite/micro/micro_interpreter.h&quot;#include &quot;tensorflow/lite/schema/schema_generated.h&quot;[/code]Now we can try to use it. For example, just before the void app_main()function, let's declare TF error reporter and use it in the function.[code]tflite::MicroErrorReporter micro_error_reporter;tflite::ErrorReporter* error_reporter = &amp;micro_error_reporter;void app_main(){Core2ForAWS_Init();Core2ForAWS_Display_SetBrightness(80);ui_init();TF_LITE_REPORT_ERROR(error_reporter,&quot;Hello TensorFlow&quot;&quot;This is just a test message/n&quot;);initialise_wifi();xTaskCreatePinnedToCore(&amp;aws_iot_task, &quot;aws_iot_task&quot;, 4096 * 2, NULL, 5, NULL, 1);xTaskCreatePinnedToCore(&amp;blink_task, &quot;blink_task&quot;, 4096 * 1, NULL, 2, &amp;xBlink, 1);}}[/code]Obviously, it's not an actual usage of TensorFlow, but it proves the library is linked and can be used whatever you need.In the main directory, you must also add new libraries tflite-liband esp-nnto the required components in CMakeLists.txt[code] set(COMPONENT_REQUIRES &quot;nvs_flash&quot; &quot;esp-aws-iot&quot; &quot;esp-cryptoauthlib&quot; &quot;core2forAWS&quot; &quot;tflite-lib&quot; &quot;esp-nn&quot;) [/code] It looks good, but it won't work yet. During compilation using pio run --environment core2foraws, you'll find out that the entire Blinky-Hello-World is made in pure C, and TensorFlow Lite for Microcontrollers library requires C++. The easiest way to convert it is as follows:

  1. Rename main.c to main.cc
  2. Change main.c to main.cc in the first line of main/CMakeList.txt
  3. Create extern "C" {} section for the entire main file code except for tensorflow imports.

It should look somehow like that:[code]#include &quot;tensorflow/lite/micro/all_ops_resolver.h&quot;#include &quot;tensorflow/lite/micro/micro_error_reporter.h&quot;#include &quot;tensorflow/lite/micro/micro_interpreter.h&quot;#include &quot;tensorflow/lite/schema/schema_generated.h&quot;extern &quot;C&quot; {######original main.c content goes here######tflite::MicroErrorReporter micro_error_reporter;tflite::ErrorReporter* error_reporter = &amp;micro_error_reporter;void app_main(){#the main function code from the listing above}}[/code]

  1. In main.cc , delete TaskHandle_t xBlink ; declaration because it's already declared in another file
  2. In platform.ini , in [env:core2foraws] section add build_flags = -fpermissive to change permissive compilation errors into warnings

Now you can build the project again. When running it with the target --monitor, you'll see the "Hello TensorFlow" message in logs, which means the TensorFlow library is included and working correctly.Now, you can do whatever you want with an out-of-the-box machine learning library and AWS integration.

Arduino

As you can see, C is much more powerful but requires much more work. Let's try to connect the same blocks (tensorflow, AWS IoT, and M5Stack library) but using a more user-friendly environment.Arduino is an open-source electronic prototyping platform enabling users to create interactive electronic objects. Let's try to combine the official M5Stack Core 2 for AWS with the Arduino IDE manual ( https://docs.m5stack.com/en/quick_start/core2_for_aws/arduino) with TensorFlow Lite for Microcontrollers ( https://github.com/tanakamasayuki/Arduino_TensorFlowLite_ESP32).

Hello world!

Firstly, install Arduino IDE from the official page https://www.arduino.cc/en/software. I assume you already have the CP210x driver installed, and the USB mode selected on your device.Open the IDE, go to File -> Preferences, and add the boards' management URL: https://m5stack.oss-cn-shenzhen.aliyuncs.com/resource/arduino/package_m5stack_index.json.Then open the Boards manager from the left menu and install M5Stack-Core2. Now connect the controller to the computer and choose it from the top drop-down menu.To use the M5Stack-specific library in the code, you need to open Sketch -> Include Libraries -> Library catalog and install M2Core2.Now you can write the simple "Hello World!" code and run it with the green arrow in the IDE top menu.[code]#include &lt;M5Core2.h&gt;void setup(){M5.begin();M5.Lcd.print(&quot;Hello World&quot;);}void loop() {}[/code]Sometimes, Arduino cannot reset the controller via an RTS pin, so you need to reboot it manually after writing a new code to it.So far, so good.

TensorFlow and AWS integration

The TensorFlow-official, Arduino_TensorFlowLite_ESP32 library is not designed to be used with M5Stack. Let's adapt it. Clone the library and copy the Hello World example to another directory. You can open it from Arduino IDE now. It's a fully working example of the usage of the TensorFlow model. Let's adapt it to use the M5Core2 library. To hello_world.inoyou need to add #include <M5Core2.h>at the beginning of the file and also M5.begin();at the beginning of void setup()function. You can also add M5.Axp.SetLed(true);after this line to turn on the small green led and ensure the device is running.Now, start the application. You can see TensorFlow output in the Serial Monitor tab. Just change the baud rate to 115200 to make it human-readable.Can we mix it with AWS IoT integration? Yes, we can.We will use the PubSubClient library by Nick O'Leary, so open the library catalog in Arduino IDE and install it, and then let's connect to AWS IoT and MQTT.Using Arduino IDE, create a new file secrets.h. We need a few declarations there:[code]#define AWS_IOT_PUBLISH_TOPIC &quot; m5stack/pub&quot;#define AWS_IOT_SUBSCRIBE_TOPIC &quot; m5stack/sub&quot;#define WIFI_SSID &quot;ThisIsMyWiFiSSID&quot;#define WIFI_PASSWORD &quot;Don't use so easy passwords!&quot;int8_t TIME_ZONE = 2;#define MQTT_HOST &quot;xxxx.iot.eu-west-1.amazonaws.com&quot;#define THINGNAME &quot;UiFlow_test&quot;static const char* ca_cert = R&quot;KEY(-----BEGIN CERTIFICATE-----…-----END CERTIFICATE-----)KEY&quot;;static const char* client_cert = R&quot;KEY(-----BEGIN CERTIFICATE-----…-----END CERTIFICATE-----)KEY&quot;;static const char* privkey = R&quot;KEY(-----BEGIN RSA PRIVATE KEY-----…-----END RSA PRIVATE KEY-----)KEY&quot;;[/code] AWS_IOT_PUBLISH_TOPICand AWS_IOT_SUBSCRIBE_TOPICare our test topics we're going to use in this example. WIFI_SSIDand WIFI_PASSWORDare our WiFi credentials. TIME_ZONEis the time zone offset. MQTT_HOSTis the public AWS IoT endpoint (the same as in the first UiFlow example). THINGNAMEis the name of Thing in AWS (I've used the same as in the UiFlow example). client_certand privkey, you need to copy from the secrets generated when creating Thing for the UiFlow example. ca_certis the public key of AWS certificate authority, so you can obtain it from the Thing creation wizard (certificate step) or from https://good.sca1a.amazontrust.com/).Now it’s time to adapt the main hello_world.inofile.We should add new imports (including our secret.hfile).[code]#include &lt;WiFiClientSecure.h&gt;#include &lt;PubSubClient.h&gt;#include &quot;secrets.h&quot;#include &lt;time.h&gt;[/code]Then we need a few new fields.[code]WiFiClientSecure net;PubSubClient client(net);time_t now;time_t nowish = 1510592825;[/code]The field nowishis just some timestamp in the past.In the setup()function, we need to open a WiFi connection with our local network and the Internet, set up the time to check certificates, install the certificates, set up the MQTT client, and open the AWS IoT connection.[code]delay(3000);WiFi.mode(WIFI_STA);WiFi.begin(WIFI_SSID, WIFI_PASSWORD);WiFi.waitForConnectResult();while (WiFi.status() != WL_CONNECTED){Serial.print(&quot;.&quot;);delay(1000);}M5.Lcd.println(String(&quot;Attempting to connect to SSID: &quot;) + String(WIFI_SSID));M5.Lcd.println(WiFi.localIP());M5.Lcd.print(&quot;Setting time using SNTP&quot;);configTime(TIME_ZONE * 3600, 0 * 3600, &quot;pool.ntp.org&quot;, &quot;time.nist.gov&quot;);now = time(nullptr);while (now &lt; nowish){delay(500);Serial.print(&quot;.&quot;);now = time(nullptr);}M5.Lcd.println(&quot;done!&quot;);struct tm timeinfo;gmtime_r(&amp;now, &amp;timeinfo);M5.Lcd.print(&quot;Current time: &quot;);M5.Lcd.print(asctime(&amp;timeinfo));net.setCACert(ca_cert);net.setCertificate(client_cert);net.setPrivateKey(privkey);client.setServer(MQTT_HOST, 8883);client.setCallback(messageReceived);M5.Lcd.println(&quot;Connecting to AWS IOT&quot;);while (!client.connect(THINGNAME)){Serial.print(&quot;.&quot;);delay(1000);}if (!client.connected()) {M5.Lcd.println(&quot;AWS IoT Timeout!&quot;);return;}client.subscribe(AWS_IOT_SUBSCRIBE_TOPIC);M5.Lcd.println(&quot;AWS IoT Connected!&quot;);[/code]This is an entire code needed to set up the application, but I propose splitting it into multiple smaller and more readable functions. As you can see, I use Serial output for debugging.To receive messages, we need a new function (the name matches the declaration in client.setCallback(messageReceived);)[code]void messageReceived(char *topic, byte *payload, unsigned int length){M5.Lcd.print(&quot;Received [&quot;);M5.Lcd.print(topic);M5.Lcd.print(&quot;]: &quot;);for (int i = 0; i &lt; length; i++){M5.Lcd.print((char)payload[i]);}M5.Lcd.println();}[/code]The last thing to do is to loop the client with the entire application. To do that, just add a one-liner to the loop()function:[code]client.loop(); [/code]You need another one-liner to send something to AWS, but I've added two more to make it visible on the controller's display.[code]M5.Lcd.println(&quot;Sending message&quot;);client.publish(AWS_IOT_PUBLISH_TOPIC, &quot;{\&quot;message\&quot;: \&quot;Hello from M5Stack\&quot;}&quot;);M5.Lcd.println(&quot;Sent&quot;);[/code]The communication works both ways. You can subscribe to m5stack/pubusing the MQTT Test client in the AWS console to read messages from the controller, and you can publish to m5stack/subto send messages to the controller.As you can see, using Arduino is easier than using FreeRTOS, but unfortunately, it's a little bit babyish. Now we'll try to avoid all IDE's and use pure console only.

Espressif IoT Development Framework

Basically, there are three ways to burn software to the controller from a Linux console – Arduino, esptool.py, and ESP-IDF. When you create a new project using PlatformIO, you can pick Arduino or ESP-IDF. Now, let's try to remove the IDE from the equation and use a pure bash.First of all, you need to install a few prerequisites and then download and install the library.[code]sudo apt install git wget flex bison gperf python3 python3-venv cmake ninja-build ccache libffi-dev libssl-dev dfu-util libusb-1.0-0mkdir -p ~/espcd ~/espgit clone --recursive &lt;a href=&quot;https://github.com/espressif/esp-idf.git&quot;&gt;https://github.com/espressif/esp-idf.git&lt;/a&gt;cd ~/esp/esp-idf./install.sh esp32./export.sh[/code]Please note you need to run install and export (last two commands) whenever you open a new WSL console. With the library, you also have some examples downloaded. Run one of them to check, does everything work.[code]cd examples/get-started/hello_world/idf.py set-target esp32set ESPPORT=/dev/ttyUSB0idf.py build flash monitor[/code]You should see the output like this one.[code]Hello world!This is an esp32 chip with 2 CPU core(s), WiFi/BT/BLE, silicon revision 300, 2MB external flashMinimum free heap size: 295868 bytesRestarting in 10 seconds...Restarting in 9 seconds...Restarting in 8 seconds...Restarting in 7 seconds...Restarting in 6 seconds...Restarting in 5 seconds...Restarting in 4 seconds...Restarting in 3 seconds...Restarting in 2 seconds...Restarting in 1 seconds...Restarting in 0 seconds...Restarting now.[/code]To stop the serial port monitor, press CRTL + ]. Be aware that the application is still running on the controller. You need to power off the device by the hardware button on the side to stop it.If you want to use TensorFlow Lite for Microcontrollers with ESP-IDF, you need to create a new project and add a proper library. You can use the command idf.py create-project <<project_name>>to create a project. My project name is hello_tf. The script creates a pure C project; we need to rename hello_tf.cfile to hello_tf.cc. Then, we can copy tflite-microand esp-nnlibraries from FreeRTOS example and place them in the components directory. The main/CMakeList.txtcontent should be like that.[code]set(COMPONENT_SRCS &quot;hello_tf.cc&quot;)set(COMPONENT_REQUIRES &quot;tflite-lib&quot; &quot;esp-nn&quot;)register_component()[/code]As you can see, the default components sources definition is changed, and new libraries are added.Now, let's see the main hello_tf.cc file content.[code]#include &quot;tensorflow/lite/micro/all_ops_resolver.h&quot;#include &quot;tensorflow/lite/micro/micro_error_reporter.h&quot;#include &quot;tensorflow/lite/micro/micro_interpreter.h&quot;#include &quot;tensorflow/lite/schema/schema_generated.h&quot;extern &quot;C&quot; {tflite::MicroErrorReporter micro_error_reporter;tflite::ErrorReporter* error_reporter = &amp;micro_error_reporter;void app_main(void){TF_LITE_REPORT_ERROR(error_reporter, &quot;Hello from TensorFlow\n&quot;);}}[/code]As you can see, we had to use extern "C" block again because, by default, ESP-IDF runs the void app_main()function from C, not C++ context.To run the application run idf.py build flash monitor.In the same way, you can add other libraries needed, but without PlatformIO, dependency management is tricky, especially for the core2forAWS library with multiple dependencies. Alternatively, you can use https://github.com/m5stack/M5Stack-IDFas a library with M5Stack dependencies to control the I/O devices of the controller.

Summary

As I wrote at the beginning of this article, with C++, you can do much more; however, you are forced to manage the entire device by yourself. Yes, you can use AWS integration, M5Stack I/O interfaces, and TensorFlow (TensorFlow Lite for Microcontrollers version only) library together, but it requires a lot of code. Can we do anything to join the advantages of using Micropython and C together? Let's try to do it in the last chapter.

Read more
View all
Connect

Interested in our services?

Reach out for tailored solutions and expert guidance.

Stay updated with our newsletter

Subscribe for fresh insights and industry analysis.

About UsCase studiesContactCareers
Capabilities:
Legacy ModernizationData PlatformsArtificial Intelligence
Industries:
AutomotiveFinanceManufacturing
Solutions:
DataboostrCloudboostr
Resources
BlogInsights
© Grape Up 2025
Cookies PolicyPrivacy PolicyTerms of use
Grape Up uses cookies

This website uses cookies to improve its user experience and provide personalized content for you. We use cookies for web analytics and advertising. You can accept these cookies by clicking "OK" or go to Details in order to manage your cookies preferences more precisely. To learn more, check out our Privacy and Cookies Policy

Accept allDetails
Grape Up uses cookies

Essential website cookies are necessary to provide you with services available through the website, autosave your settings and preferences, and to enhance the performance and security of the website - you have the right not to accept them through your web browser's settings, but your access to some functionality and areas of our website may be restricted.

Analytics cookies: (our own and third-party : Google, HotJar) – you can accept these cookies below:

Marketing cookies (third-party cookies: Hubspot, Facebook, LinkedIn) – you can accept these cookies below:

Ok