Trending February 2024 # Top 10 File Operations In Bash With Examples # Suggested March 2024 # Top 4 Popular

You are reading the article Top 10 File Operations In Bash With Examples updated in February 2024 on the website We hope that the information we have shared is helpful to you. If you find the content interesting and meaningful, please share it with your friends and continue to follow and support us for the latest updates. Suggested March 2024 Top 10 File Operations In Bash With Examples

Introduction to Bash File

Bash file is a methodology wherein bash offers various possibilities of operations to be performed to derive meaning out of the extensive data present in a file. This file is a collection of space where data, information, and various other parameters are stored for execution. In the current scenario, there is huge data present everywhere, and deriving important meaning out of them is a daunting task. Operations in a file are specifically for handling these scenarios like truncating a file, appending to a file, reading from a file, and many more.

Start Your Free Software Development Course

Web development, programming languages, Software testing & others

File operations in bash

As we mentioned in the introduction about the presence of a lot of file operations in bash and here, we will be saying about 10 of them, which are widely used in the industry. We want to inform you that this list won’t be exhaustive! So, let us get right into the gist of the article.

1. Truncating a File




Explanation: At first, we see that the size of file1 is 396 bytes, and once we truncate, the size is 0, as evident in the shaded output.

2. Appending a string to a file

This methodology is more or less like the truncation method only difference is that we add a line to the file.


echo "Size of the files in the folder: " ls -lh echo "Last line of the file: " tail -n 1 file1.txt echo "Adding a new line:"


Explanation: Here, we can see that the new last line has changed from Cras pharetra dolor eu eros iaculis porta to Mauris tempus quis est vitae eleifend.

3. Reading a single line

Now that we have seen how to write into a file, it becomes inevitable to understand how to read from a file. Here we would use the keyword read and the opposite redirection operator “<” to fulfill the utility.

Here -r ensures the line read is raw and doesn’t escape and backslash characters. The word variableName is the variable where this line would be stored after it is ready for further use. Though this is one of the ways, there are other ways of reading a file.


echo "Reading the first line " read -r variableName < file1.txt echo "The first line is stored in the variable is: $variableName"


4. Individual line-by-line read from a file


echo "Reading each line and printing them one by one" count=1 while read -r variableName; do echo "Line $count" echo $variableName count=$((count+1)) done < file1.txt


5. Copy a file

The utility of this command is to copy the file to another location while keeping one at the original location. One uses this utility to keep a backup of the files so that an accidental deletion shouldn’t lead to significant consequences.


echo "Copying chúng tôi to a backup folder" echo "Files before copying: " ls /home/user/Backup cp chúng tôi /home/user/Backup echo "Files after copying: " ls /home/user/Backup


Explanation: You will eventually realize that chúng tôi is still in the original location.

6. Move a file


echo "Moving chúng tôi to a backup folder" echo "Files in Backup before copying: " ls /home/user/Backup echo "Files in original folder before copying: " ls /home/user mv chúng tôi /home/user/Backup echo "Files after copying: " ls /home/user/Backup echo "Files in original folder after copying: " ls /home/user


Explanation: Eventually, you’ll realize that we’ve moved chúng tôi from its original source to the new directory.

7. Finding the size of the file

It is often essential to understand the size of a file to further understand the disk’s health in terms of free space. Using the below command, we can easily keep track of the size of files and hence the health.


echo "Finding File Size of File2.txt" echo "Size of the file is: $file_size bytes"



In this module, we have looked into all aspects of file operations in bash and would highly encourage you to have hands-on practice with all the commands to get acquainted. In this article, we have gone through all the utilities with an example to mark the effectiveness of learning bash the EduCBa way.

Recommended Articles

We hope that this EDUCBA information on “Bash File” was beneficial to you. You can view EDUCBA’s recommended articles for more information.

You're reading Top 10 File Operations In Bash With Examples

10 Text Functions In Excel With Examples

Excel is all about working with numbers. However, if your data consists of too much text, you don’t have to worry at all. Excel provides several functions that make it easier to manipulate text strings. These functions let you easily find a string, count the characters in a string, remove extra spaces from a string, join two or more strings, and perform other similar tasks on the textual data.

What are Text functions in Excel?

Text Functions are Microsoft Excel’s native functions that allow transforming or analyzing textual data. Excel provides a total of 30+ Text functions and many of these are often used by people for data analysis. This post highlights 10 such Text functions, with their uses and examples.

10 Text Functions in Excel with Examples

Following is the list of top 10 functions in Excel:











Let’s take a detailed look at these functions, one by one.


The FIND function allows you to find a text string within another. It returns the position at which a character or string begins within another text string.


FIND(find_text, within_text, [start_num])

find_text argument is used to enter the text the user wants to search.

within-text argument takes the text which contains the text that needs to be found.

[start_num] is an optional argument that takes the position from where to start the search. It takes the value 1 by default.


Let us say the A3 cell in an Excel sheet contains the string ‘The Windows Club’. If the user wants to find the position of ‘Win’ within the string, he may use the ‘Find’ functions as:

f(x)=FIND("Win", A1)

The output of the above function will be 5, as 5 represents the starting position of the text ‘Win’ within ‘The Windows Club’.

Note: The FIND function is case-sensitive. If you do not want to match the case, you can use the SEARCH function, which has the same syntax as the FIND function.

Read: How to use the new TEXTSPLIT function in Excel

2] LEN

The LEN function calculates the length of the string, i.e. the number of characters present in a string. It counts the spaces as characters.



text argument takes the string whose length the user wants to find.


In the above example, if the user wants to find the length of the string ‘The Windows Club’, he may use the ‘LEN’ function as:

f(x)=LEN (A3)

The output of the above function will be 16, as there are 16 characters in the string ‘The Windows Club’, including spaces.

Also read: Arrows keys are not working in Microsoft Excel.


The LEFT function returns several successive characters from the left side of a string, based on the number specified by the user.


LEFT(text, [num_chars])

text argument is used to specify the string that contains the characters that need to be found.

[num_chars] specifies the number of characters to be extracted from the left of the main string. This argument is optional. It takes ‘1’ as a default value, if not specified by the user.


In the same example as stated above, if the user wants to extract the first 7 characters from ‘The Windows Club’, he may use the ‘LEFT’ function as:

f(x)=LEFT (A3, 7)

The output of the above function will be The Win, as these are the 7 leftmost characters in the string ‘The Windows Club’, including spaces.


The RIGHT function is used to extract several characters from the extreme right of a string.


RIGHT(text, [num_chars])

text argument specifies the string that contains the desired characters.

[num_chars] argument specifies the number of characters that need to be extracted, moving from the extreme right to the left of the string. This is an optional argument that takes ‘1’ as the default value if left unspecified.


Taking the same example, if the user wants to extract the last 7 characters from the string ‘The Windows Club’, he may use the ‘RIGHT’ function as:

f(x)=RIGHT(A3, 7)

The output of the above function will be ws Club, since they are the 7 rightmost characters in ‘The Windows Club’, including spaces.

5] MID

The MID function returns several consecutive characters or a substring from the middle of another string.


MID(text, start_num, num_chars)

text argument takes the string that contains the desired characters.

start_num argument takes the position from where to start extracting the characters.

num_chars argument takes the number of characters the user wants to extract from the string.


In the above example, if the user wants to extract 4 characters starting from the 3rd character in the string ‘The Windows Club’, he may use the ‘MID’ function as:

f(x)=MID(A3, 3, 4)

The output of the above function will be e Wi, as ‘e’ is the third character and staring from ‘e’ counting spaces as well, ‘e Wi’ are the 4 consecutive characters in the string ‘The Windows Club’.


The Substitute function replaces an existing text with a new text in a given string.


SUBSTITUTE(text, old_text, new_text, [instance_num])

text argument specifies the main string.

old_text argument specifies the text that needs to be replaced.

new_text argument specifies the text that needs to be put in place of the existing text.

[instance_num] argument specifies which instance (or occurrence) of the existing text is to be replaced. This is an optional argument. If you specify this value, only that instance of the text will be replaced; otherwise, all the instances of the existing text will be replaced with the new text.


In the same example, if the user wants to substitute ‘Welcome to The’ for ‘The’ in ‘The Windows Club’, he may use the ‘SUBSTITUTE’ function as:

f(x)=SUBSTITUTE(A3, "The", "Welcome to The")

The output of the above function will be Welcome to The Windows Club, as the substitute function has replaced ‘The’ with ‘Welcome to The’ in the text string ‘The Windows Club’.


The UPPER function converts a string into uppercase, i.e., it returns a string after capitalizing each letter.



text argument takes the string that needs to be capitalized.


Following the same example, if the user wants to capitalize each letter in the string ‘The Windows Club’, he may use the ‘UPPER’ function as:


The output of the above function will be THE WINDOWS CLUB.


If you want to convert a string into lowercase, you may use the LOWER function, having the same syntax as that of the UPPER function.

If you want to capitalize the first letter of each word in a string, you may use the PROPER function with the same syntax.


The TRIM function removes all the extra spaces within a string, leaving just 1 space between two words.



text argument takes the string with irregular spacing.


In the example stated above, if the user wants to remove unnecessary spaces from the string ‘The      Windows        Club’, he may use the ‘TRIM’ function as:


The output of the above function will be The Windows Club, leaving just a single space between words.


The CONCATENATE function joins two or more strings in Excel.


CONCATENATE(text1, [text2], ...)

text1 argument is mandatory. It takes the first string to join.

text2 argument takes the additional string to join. You may join up to 255 strings.


Let us say the A3 cell in an Excel sheet contains the string ‘The’, the A4 cell contains the string ‘Windows’, and the A5 cell contains the string ‘Club’. If the user wants to join these strings, he may use the ‘CONCATENATE’ functions as:

f(x)=CONCATENATE(A3, " ", A4, " ", A5)

The output of the above function will be The Windows Club, joining the strings in A3, A4, and A5 cells with spaces between these strings.

Tip: Use the ampersand (&) symbol to concatenate two text strings.

10] TEXT

The TEXT function converts the format of a number from ‘numeric’ to ‘text’. The function can be used to place formatted numbers between text.


TEXT(value, format_text)

value argument takes the numerical value that needs to be formatted.

format_text argument takes the format that needs to be applied to the value.


Let us say the A2 cell in Excel contains the string ‘The Windows Club started on’ and the A3 cell contains the numeric data ’20-04-2009′; the two of these can be combined in a single sentence using the ‘CONCATENATE’ and the ‘TEXT’ functions as:

f(x)=A2&" "&TEXT(A3,"mmmm d, yyyy")&"."

The output of the above functions will be The Windows Club started on April 20, 2009.

Also read: How to convert currencies in Excel.

What is an example of a text function?

The TEXT function in Excel is used to join a formatted number with a text string. For example, if an Excel sheet contains the string ‘Retails sales surge by’ in cell A1, and the number ‘20000’ in cell A2, then TEXT function can be used to join the content of these two cells as:

f(x)=A1&" "&TEXT(A3,"$ ##,###")&".".

The above function will return ‘Retails sales surge by $20,000.’, where the number 20000 has been formatted using a currency symbol and comma separator.

What is the use of lower function?

The LOWER function is used to change the case of a string to lowercase. If a given string is in uppercase, proper case, or sentence case, the LOWER function will return the string with each of its alphabet converted in small letters. The syntax for LOWER function is LOWER(text), where text specifies the string or reference to the cell that contains the string that needs to be converted into lowercase.

Read Next: Top 15 Financial functions in Microsoft Excel.

Polymorphism In Python With Examples

What is Polymorphism?

Polymorphism can be defined as a condition that occurs in many different forms. It is a concept in Python programming wherein an object defined in Python can be used in different ways. It allows the programmer to define multiple methods in a derived class, and it has the same name as present in the parent class. Such scenarios support method overloading in Python.

In this Python Polymorphism tutorial, you will learn:

Polymorphism in Operators

An operator in Python helps perform mathematical and several other programming tasks. For example, the ‘+’ operator helps in performing addition between two integer types in Python, and in the same way, the same operator helps in concatenating strings in Python programming.

Let us take an example of + (plus) operator in Python to display an application of Polymorphism in Python as shown below:

Python Code:

p = 55 q = 77 r = 9.5 g1 = "Guru" g2 = "99!" print("the sum of two numbers",p + q) print("the data type of result is",type(p + q)) print("The sum of two numbers",q + r) print("the data type of result is", type (q + r)) print("The concatenated string is", g1 + g2) print("The data type of two strings",type(g1 + g2))


the sum of two numbers 132 The sum of the two numbers 86.5 The concatenated string is Guru99!

The above example can also be regarded as the example of operator overloading.

Polymorphism in user-defined methods

A user-defined method in the Python programming language are methods that the user creates, and it is declared using the keyword def with the function name.

Polymorphism in the Python programming language is achieved through method overloading and overriding. Python defines methods with def keyword and with the same name in both child and parent class.

Let us take the following example as shown below: –

Python Code:

from math import pi class square: def __init__(self, length): self.l = length def perimeter(self): return 4 * (self.l) def area(self): return self.l * self.l class Circle: def __init__(self, radius): self.r = radius def perimeter(self): return 2 * pi * self.r def area(self): return pi * self.r * * 2 # Initialize the classes sqr = square(10) c1 = Circle(4) print("Perimeter computed for square: ", sqr.perimeter()) print("Area computed for square: ", sqr.area()) print("Perimeter computed for Circle: ", c1.perimeter()) print("Area computed for Circle: ", c1.area())


Perimeter computed for square:  40 Area computed for square:  100 Perimeter computed for Circle:  25.132741228718345 Area computed for Circle:  50.26548245743669

In the above code, there are two user-defined methods, perimeter and area, defined in circle and square classes.

As shown above, both circle class and square class invoke the same method name displaying the characteristic of Polymorphism to deliver the required output.

Polymorphism in Functions

The built-in functions in Python are designed and made compatible to execute several data types. In Python, Len() is one of the key built-in functions.

It works on several data types: list, tuple, string, and dictionary. The Len () function returns definite information aligned with these many data types.

The following figure shows how Polymorphism can be applied in Python with relation to in-built functions: –

Following program helps in illustrating the application of Polymorphism in Python: –

Python Code:

print ("The length of string Guru99 is ",len("Guru99")) print("The length of list is ",len(["Guru99","Example","Reader"])) print("The length of dictionary is ",len({"Website name":"Guru99","Type":"Education"}))


The length of string Guru99 is 6 The length of the list is 3 The length of the dictionary is 2

In the above example, Len () function of Python performs Polymorphism for string, list, and dictionary data types, respectively.

Polymorphism and Inheritance

Inheritance in Python can be defined as the programming concept wherein a child class defined inherit properties from another base class present in Python.

There are two key Python concepts termed method overriding and method overloading.

In method overloading, Python provides the feature of creating methods that have the same name to perform or execute different functionalities in a given piece of code. It allows to overload methods and uses them to perform different tasks in simpler terms.

In Method overriding, Python overrides the value that shares a similar name in parent and child classes.

Let us take the following example of Polymorphism and inheritance as shown below: –

Python Code:

class baseclass: def __init__(self, name): chúng tôi = name def area1(self): pass def __str__(self): return class rectangle(baseclass): def __init__(self, length, breadth): super().__init__("rectangle") self.length = length self.breadth = breadth def area1(self): return self.length * self.breadth class triangle(baseclass): def __init__(self, height, base): super().__init__("triangle") self.height = height self.base = base def area1(self): return (self.base * self.height) / 2 a = rectangle(90, 80) b = triangle(77, 64) print("The shape is: ", b) print("The area of shape is", b.area1()) print("The shape is:", a) print("The area of shape is", a.area1())


The shape is: a triangle The area of a shape is 2464.0 The shape is: a rectangle The area of a shape is 7200

In above code, the methods have the same name defined as init method and area1 method. The object of class square and rectangle are then used to invoke the two methods to perform different tasks and provide the output of the area of square and rectangle.

Polymorphism with the Class Methods

The Python programming enables programmers to achieve Polymorphism and method overloading with class methods. The different classes in Python can have methods that are declared in the same name across the Python code.

In Python, two different classes can be defined. One would be child class, and it derives attributes from another defined class termed as parent class.

The following example illustrates the concept of Polymorphism with class methods: –

Python Code:

class amazon: def __init__(self, name, price): chúng tôi = name self.price = price def info(self): print("This is product and am class is invoked. The name is {}. This costs {self.price} rupees.") class flipkart: def __init__(self, name, price): chúng tôi = name self.price = price def info(self): print(f "This is product and fli class is invoked. The name is {}. This costs {self.price} rupees.") FLP = flipkart("Iphone", 2.5) AMZ = amazon("Iphone", 4) for product1 in (FLP, AMZ):


This is a product, and fli class is invoked. The name is iPhone, and this costs 2.5 rupees. This is a product, and am class is invoked. The name is iPhone, and this costs 4 rupees.

In the above code, two different classes named as flipkart and amazon use the same method names info and init to provide respective price quotations of the product and further illustrate the concept of Polymorphism in Python.

Difference between Method overloading and compile-time Polymorphism

In compile-time Polymorphism, the compiler of the Python program resolves the call. Compile-time Polymorphism is accomplished through method overloading.

The Python compiler does not resolve the calls during run time for polymorphism. It is also classified as method overriding wherein the same methods carry similar signatures or properties, but they form a part of different classes.


Polymorphism can be defined as a condition that occurs in many different forms.

An operator in Python helps perform mathematical and several other programming tasks.

A user-defined method in the Python programming language are methods that the user creates, and it is declared using the keyword def with the function name.

Polymorphism in Python offers several desirable qualities, such as it promotes the reusability of codes written for different classes and methods.

A child class is a derived class, and it gets its attributes from the parent class.

The Polymorphism is also achieved through run-time method overriding and compile-time method overloading.

Polymorphism in Python is also attained through operator overloading and class methods.

Top 22 Automl Case Studies/Examples: In

Though there is a lot of buzz around autoML, we haven’t found a good compilation of case studies. So we built our comprehensive list of automated machine learning case studies so you can see how autoML could be used in your function/industry.

This AutoML case study list will help us to understand what AutoML is and how you can use it in your business function. The most common application areas of autoML are decision-making and forecasting. Read on to discover how AutoML can support your business function.

What are the typical results of AutoML projects?

In these case studies, we discovered companies gain various benefits from automation processes. These benefits support companies to improve their businesses and provide more efficient services. Below, you can find the top 3 typical results of those case studies.

Time savings: AutoML provides faster deployment time by automating data extraction, and algorithms. In the end, manual parts of the analyses are eliminated and the deployment time reduces significantly. As an example, Consensus Corporation reduced its deployment time from 3-4weeks to 8 hours. 

Improved accuracy: As businesses continue, the data grows and, the industry trends change. AutoML automates these facts and removes any manual actions. As a result, any possible errors are eliminated and, continuously-evolving algorithms improve accuracy. With this benefit, companies can reach high levels of accuracy rate in their predictions. Trupanion can identify two-thirds of its customers will churn before they churn.

Democratization: Machine learning applications require high-level skills which make companies dependent on data scientists. By AutoML, these processes can be done without high-level knowledge. 

What are typical cases for AutoML?

Companies can automate their machine learning processes for a variety of purposes. In most of these use cases, companies have already implemented machine learning and want to improve their performance. Mostly, companies want to have automated insights for better data-driven decisions and predictions. The typical processes we have observed from the case studies are:

Fraud Detection


Sales Management

Marketing Management

The full list of case studies that we have collected from different AutoML vendors can be found below. You can filter the list by the vendor, industry, or use case and investigate the achieved results.

CompanyCountryAutoML ToolIndustryUse CaseResults ▪ More accurate predictions on parking lot usage ▪ More accurate identification of risk ▪ Improved profit margins ▪ Reduced deployment time from 3-4 weeks to 8 hours ▪ Reduced cost by one tenth ▪ Improved pricing optimization from 1.5% to 4% of the revenue ▪ Improved service quality ▪ Improved accuracy to 95% ▪ Shortened credit application process HortifrutChileH2O.aiAgricultureProduct Quality▪ Reduced deployment time from weeks to hours ▪ Improved diagnosis results ▪ Reduced model creation time from 4 weeks to 3 days ▪ Reduced deployment time ▪ Improved customer experience ▪ Better identification of key drivers for prices ▪ Increased ticket sales by 83% ▪ Reduced model training time to under 2 hours ▪ Increased conversion rate by 300% ▪ Democratization of the process ▪ Net $10 million savings per year from 0.1% reduction in patient length of stay ▪ Identified that two thirds of customers will churn before they churn ▪ Shortened and more accurate credit scoring process

You can also check out our sortable and data-driven list of AutoML Software.  To learn more about AutoML, you can read our in-depth AutoML guide.

You can also review our list of AutoML solution providers to find the right vendor for your business.

Cem regularly speaks at international technology conferences. He graduated from Bogazici University as a computer engineer and holds an MBA from Columbia Business School.





Top 10 Altcoins With 30X Gains Incoming In 2023

The top 10 Altcoins with explosive potential for massive gains in 2023 are enlisted here

As the cryptocurrency market continues to evolve and mature, many investors and traders are looking beyond Bitcoin and Ethereum for potential gains. Altcoins, or alternative cryptocurrencies, offer unique opportunities for growth and profitability. In 2023, several altcoins have the potential for significant gains, with some experts predicting up to 30x returns. These altcoins have strong use cases, innovative technology, and solid project teams behind them.

In this article, we will explore the top 10 altcoins with 30x gains incoming in 2023 and provide insights on why they have the potential to be profitable investments. However, it’s important to note that cryptocurrency investing carries significant risk, and investors should conduct their research and consider their risk tolerance before investing.

Ethereum (ETH)

Ethereum is the second-largest cryptocurrency in terms of market capitalization. It has a strong project team, a solid use case, and community support. Ethereum is a decentralized platform that allows developers to build decentralized applications (dApps) on top of it. The platform uses smart contracts, which are self-executing contracts that automate the negotiation and execution of a contract. Ethereum has a high adoption rate, and many companies have started using it to build dApps.

Binance Coin (BNB)

Binance Coin is the native token of the Binance exchange. It has a strong project team, a solid use case, and community support. Binance Coin is used to pay trading fees on the Binance exchange. The exchange offers a discount to users who pay trading fees in Binance Coin. Binance Coin has a high adoption rate, and the exchange has been expanding its offerings, which will increase the demand for Binance Coin.

Chainlink (LINK)

Chainlink is a decentralized oracle network that connects smart contracts to real-world data. It has a strong project team, a solid use case, and community support. Chainlink has a high adoption rate, and many companies have started using it to connect their smart contracts to real-world data.

Polkadot (DOT)

Polkadot is a blockchain protocol that allows different blockchain networks to communicate with each other. It has a strong project team, a solid use case, and community support. Polkadot has a high adoption rate, and many projects have started building on the platform.

Cardano (ADA)

Cardano is a decentralized platform that allows developers to build decentralized applications (dApps) on top of it. It has a strong project team, a solid use case, and community support. Cardano has a high adoption rate, and many companies have started using it to build dApps.

VeChain (VET)

VeChain is a blockchain platform that focuses on supply chain management. It has a strong project team, a solid use case, and community support. VeChain has a high adoption rate, and many companies have started using it to track their supply chain.

Polygon (MATIC)

Polygon (MATIC) is a Layer 2 scaling solution for Ethereum that has gained significant traction in the blockchain industry. The project aims to provide faster and cheaper transactions for the Ethereum network, which has been plagued by scalability issues due to its high usage and transaction fees. Polygon achieves this by building its network on top of Ethereum, which allows for faster and more efficient transactions. This has led to a significant increase in adoption, with many projects and dApps choosing to use Polygon as their primary scaling solution.

Solana (SOL)

Solana is a high-speed blockchain network that aims to offer fast transaction times and low fees. It has a strong project team, a solid use case, and community support. Solana uses a unique consensus mechanism called Proof of History, which allows for high throughput and scalability. Many developers have started building dApps on Solana, and it has seen significant adoption in the decentralized finance (DeFi) space.

Dogecoin (DOGE)

Dogecoin is a cryptocurrency that was created as a meme but has gained significant popularity and adoption in recent years. It has a strong community of supporters, including high-profile figures such as Elon Musk. Dogecoin has a low transaction fee and fast transaction times, making it ideal for small transactions and micropayments. However, it is important to note that Dogecoin has limited use cases compared to other cryptocurrencies.

Avalanche (AVAX)

Mlops Operations: A Beginner’s Guide In Python

This article was published as a part of the Data Science Blogathon


According to a report, 55% of businesses have never used a machine learning model before. Eighty-Five per cent of the models will not be brought into production. Lack of skill, a lack of change-management procedures, and the absence of automated systems are some of the key factors for this failure. To overcome these issues, it is vital to combine the mechanics of DevOps and Operations with machine learning development to properly launch a machine learning application.

Let’s start with a comprehensive step-by-step guide to the MLOps Operations lifecycle, which will teach you how to put machine learning models into production.

Definition of MLOps

MLOps, or Machine Learning Operations for Production, is a collection of defined methods for building, deploying, and governing the lifespan of machine learning models. This architecture facilitates cross-functional collaboration and provides an automated framework for tracking everything needed for the complete cycle of machine learning models. MLOps approaches also improve the ML systems’ scalability, security, and reliability, resulting in faster development cycles and more revenues from ML initiatives.

Capabilities of MLOps

The collection of key competencies required in MLOps Operations varies depending on the various needs of industry organizations. Some businesses may demand a single integrated platform that can handle everything from data pretreatment through storage, modelling, deployment, and monitoring. Others may merely need model development, training, and deployment services. We’ll go over the entire list of MLOps’ primary strengths in this section.

Before we go into the complicated theory and some use-case implementation, let’s have a look at some examples.

terraform – Infrastructure as a Code (IaaC) tool to manage infrastructure on cloud platforms with configuration files.

Azure Databricks – Data Analytics platform for creating a workspace and allowing easy integration with libraries for ML.

mlflow – Open source platform helps manage the complete ML lifecycle.

Lifecycle of MLOps

All the processes are iterative, and the success of the overall machine learning system is contingent on each of these phases being completed successfully. Backtracking to the previous stage to check for any defects introduced can be caused by difficulties encountered in one phase. Let’s look at what happens at each stage of the MLOps lifecycle:

Source: image

ML Development: This is the first phase, which entails building a comprehensive pipeline that starts with data processing and ends with model training and assessment algorithms.

Model Training: After the setup is complete, the model should be trained. To respond to fresh data or handle specific changes, continual training functionality is also required.

Model evaluation entails doing inference over the training model and ensuring that the output results are accurate.

Model Deployment: Once the proof-of-concept stage is complete, we must deploy the model according to industry standards to deal with real-world data.

Serving Predictions: The model is now ready to provide predictions over the incoming data after deployment.

Model Monitoring: Issues such as concept drift can cause results to become erroneous over time, so it’s critical to monitor the model to make sure it’s still working properly.

Data and Model Management is a component of the central system that monitors the management of data and models. It entails storing data, keeping track of multiple versions, facilitating access, ensuring security, and configuring systems across diverse cross-functional teams.

Let’s get started with the fundamental MLOps.

A Hands-on Approach with MLOps Operations Step 1  ML Development

ML Development is the initial work an ML project begins with. The problem statement, as well as the project outcome, should be thoroughly defined and understood at this point. This is where all the experiments for the proof of concept (POC) phase are carried out. Data selection, feature engineering, model development, and evaluation are among the steps that are carried out.

Features of ML development

1) Experiment tracking and version control are used to ensure reproducibility and the ability to go back to prior versions.

2) Data and model artifacts used in the code to allow access to previously trained modules are stored and tracked.

3) All versions of the experiment have complete information about the hyper-parameters employed.

Information on the metrics used to evaluate the model, and the process employed.

1) On your local device, install terraform CLI and Azure CLI.

a. To install HashiCorp’s terraform, follow the procedures below.

sudo apt-get update && sudo apt-get lnstall -y gnupg software- propertles-common curl

To add the HashiCorp GPG key

#Add the official HashiCorp Linux repository sudo apt-add-repository “deb arch=amd64]

To Update to add the repository, and install the Terraform CLI.

sudo apt-get update && sudo apt-get install terraform #Validate the Installation terraform -help

To install Azure’s CLI, follow the steps below.

az login

1) Infrastructure as Code (IaaC) configuration files are used to set up the infrastructure.

a. Create a directory.

mkdir terraform-azure cd terraform-azure

Create the chúng tôi configuration files and save them in the above folder.

#Save this file to the terraform-azure directory and Deploy to Azure terratorm{ required_provlders { azurem= 1 source = "nashicop/azurerm } } } # Feature for Azure provider provider "azurerm"{ features {} } #Resource group for intrastructure resource = "azurerm_resource_group" "mlops_demo_rg" 1 name = MLOpsDemo" location "eastus2" # check your locatlon of the resource group on your portal. } # databricks workspace resource "azurerm_databricks_workspace" "databricks_workspace_demo"{ name = "databricks-demo" resource_group_name = azurer_resource_group .mlops_demo_rg. name Location = azurerm_resource_group .mlops_demo_rg. location sku = "premium" #Public Storage account for storing things resource "azurerm_storage_account" "Storage_account_demo" { name = "mlopsstorageaccountdemo1" resource_group_name = Locatlon = azurerm_resource_group.mlops_demo_rg.location account_tler = "Standard" account_repllcation_type = "LRS" allow_blob_public_access = true # Public Storage container for storing things resource "azurerm_storage_container" "Storage_container_demo" { mlopsstoragecontainerdemo" Storage_account_name = azurerm_storage_account.storage_account_demo. name Container_access_type = "blob"

b. Put in place the infrastructure.

‘terraform init’ is the command to use. It will display the message ‘Terraform has been successfully initialized!’ after successful initialization.

terraform init


2) ‘terraform apply’ is the command to use. Type yes when prompted. The message ‘Apply complete!’ will appear after a successful application.

terraform apply

Check that the data bricks resource with the name ‘data bricks-demo’ has been created in the Azure portal. Then, after selecting the resource, select Launch Workspace.

From the left bar, create a new Python notebook to which we will put our code.

Now we’ll use the Keras framework to develop a basic MNIST classifier. For logging the metrics and the model, we’ll use Mlflow. You don’t need to install Mlflow individually because Databricks has it built-in.

Data Preprocessing:

First import the libraries and dependencies:

!pip install mlflow import mlflow import mlflow.keras import numpy as np from tensorflow import keras from tensorflow.keras import layers

Next, assign the value for the Model/data parameters

num_classes = 10 input_shape = (28, 28, 1)

perform the data split between train and test sets

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data() x_train = x_train.astype("float32" ) / 255 x_test = x_test.astype("float32") / 255

Next, make sure images have shape (28, 28, 1)

x_train = np.expand_dims (x_train, -1) x_test = np.expand_dims(x_test, -1)

Then convert class vectors to binary class matrices

y_train = keras.utils.to_categorical(y_train, num_classes) y_test = keras.utils.to_categorical(y_test, num_classes) Step 2 Model Building and Training

Train the model using sequential model,

from keras.models import Sequential from keras.layers import Activation, Dense from keras.layers.convolutional import Convolution2D from keras.layers.convolutional import MaxPooling2D from keras.layers import Dropout

After importing all the modules required for training, use the Sequential model to train the ML model.

model = keras.Sequential( [ keras. Input( shape=input_shape), layers.Conv2D(32 , kernel_size=( 3, 3), activation="relu"), layers.MaxPooling2D(pool_size=(2, 2)), layers.Conv2D(64, Kernel_size=(3, 3), activation="relu"), layers.MaxPooling2D( pool_size=(2, 2)), layers.flatten(), layers.Dropout(0.5), layers.Dense( num_classes, activation="softmax"), ] ) #Enable Mlflow log mlflow.tensorflow. autolog() # To Train model, y_train, batch_size=128 , epochs=15, validatlon_split=0.1)

Fine-tuning for another epoch, y_train, batch_size128, epochs=l, valldation_split=0. 1) #Evaluate model test_loss, test_accuracy = model.evaluate(X_test, y_test)

In case there is an error, it should raise an exception during failure

if test_accuracy<0.80: raise Exception ("Model is not accurate, retrain the model!")

Finally, to execute the notebook, you must build a cluster using the create option on the leftmost panel. Give it a name, make sure that we set the cluster mode to ‘Single Node,’ and leave the rest of the options alone.

You can attach the MNIST file to the cluster after it has been formed. A green button-like signal will appear on the upper left after the cluster has been successfully attached and started. Hurray! You’re ready to begin your first training session!

Step 3  Training Operationalization

After the training and testing of the model, instead, when new data is ingested or modifications to the code are made, the model must be re-trained regularly. Retraining will also solve the problem of concept drift on a technical level. One such feature is Azure Databricks, which can help automate retraining by performing a scheduled process.

You can now return to the Databricks workspace where your notebook was created.

Create a new job by selecting the Create option from the left side. You can also add options and specify their values.

You may also control who has access to the runs and to what extent they have access. In businesses where many individuals work on a project and the code requires privacy and security, this option is crucial. To check, go back to the job running dashboard and select the Edit Permissions option.

The following are characteristics of training operationalization:

1) Configure scheduled or event-driven runs that are triggered when new data is present and the model degrades.

2) Set up ongoing training pipelines with specific hyper-parameter values, as well as keeping track of and archiving prior runs.

3) Access to the model registry, which houses the machine learning artifact repository.


Step 4  Model Versioning

Model versioning makes it easier to maintain track of the various versions of the model that have been developed. It involves storing model files, source code, training parameters like hyper-parameters, and data split information during training. When something goes wrong with the current system and you need to revert to a previous stable version, versioning is vital. Based on the parameters used on a single dashboard, one can easily analyze the performance of multiple versions.

Display the dashboard.

The following are some of the key elements of model versioning:

1) Tracking and storing different versions of the model.

2) The model variants with their parameter settings are easily accessible and convenient for keeping track of.

3) After each run, it automatically created a MLflow model object. The entire project environment, including the chúng tôi and chúng tôi files, is provided.

Step 5  Registry Model

The model registry gives a larger view to regulate the lifecycle of the ML models, as model versioning helps track models on a model-by-model basis. It’s found in a type of infrastructure known as a central repository. This results in higher-quality manufacturing models.

If you select one model and then any version, you will be taken to a page containing the following information.

The following are some of the model registry’s most important features:

1) Throughout the development and deployment lifecycle, a central repository to track, manage, and regulate the versions of the ML model.

2) Model information maintenance and storage make reviewing, rolling back, and approving/rejecting models for other processes easier.

Step 6  Model Governance

Model governance is concerned with the registration, validation, and evaluation of models at various points of the MLOps lifecycle, from staging to production. It aids in the maintenance of model status information, ensuring a smooth transition to deployment. This governance can be performed manually, semi-automatically, or entirely automated.

The following are some of the key characteristics of model governance:

1) The ability to save and add/update model artifacts.

2) Based on the provided evaluation indicators, we can select a winner model from among the several variants. Accuracy, precision, recall, and F-score are examples of metrics.

3) Allowed people can view the stored artifacts and version progress. We will assess the models, after which they will be approved or rejected based on their performance. This ensures that concerns linked to security, privacy, and financial elements are mitigated.

Step 7 Implementation of the Model

One of the most important processes in the machine learning lifecycle is model deployment. In industry, most of the models developed never see the light of day. A model that is not implemented is of limited utility. After the training and validation steps have been finished and accepted, the next critical step is model deployment.

Deployment is a multi-step process that includes features like Continuous Integration (CI), Continuous Delivery (CD), online testing, and production deployment.

You can look through the section of registered models in Databricks.

The following are some of the most important characteristics of model deployment:

1) The continuous integration stage involves reading the source code and the model from the model registry to ensure that the model’s input-output format is correct. The target infrastructure’s performance is also validated here.

2) The three essential phases of the continuous delivery stage are deployment to staging, acceptance testing, and deployment to production, followed by progressive delivery.

3) Canary deployment, Shadow deployment, and Blue/green deployment are three different production deployment methodologies.

a) Canary deployment delivers services in small increments, allowing enterprises to test their apps with real users while also analyzing their various versions for improvement.

b) To test the predictions, shadow deployment includes both a new and an older version. After successful testing, they release a newer version. To assure successful operation, they delivered a replica of the production environment traffic to the earlier version.

c) The blue/green deployment keeps two phases running at the same time: staging and production. In the blue environment, new versions are tested for quality assurance, and then real traffic is sent from the green environment to the blue environment. After satisfactory testing, the product is moved to a green environment.

4) Smoke testing, A/B testing, and MAB testing are all examples of online experimentation that are used to see if the new model outperforms the old one. When a new model is being considered for production deployment, the old model continues to run in the background. Later, a fraction of traffic is transferred to the newer version; the final model is selected depending on performance.

Step 8 Make a Serving Prediction

The model would be deployed to its target use case environment after completing the following stages successfully. It will begin receiving actual data traffic in real-time. Real-time streaming inference, near real-time online inference using REST endpoints, offline batch inference, or embedded inference on edge devices are all options for serving.

The following are some of the most important characteristics of prediction serving:

1) With real-world data, test the performance of the target infrastructure.

2) Keep track of performance by storing requests and responses in the serving logs and using them for model analysis and monitoring.

Step 9 Model Monitoring

After deployment, model monitoring is necessary to ensure that the deployed model’s effectiveness is maintained. Ingestion of new data over time might cause differences in the properties of the model, which can lead to model degradation. Performance checks can be carried by using saved serving logs by comparing prediction schemas to ideal schemas. It can deliver a signal to the allowed person when an abnormality is detected, allowing them to take action.

In Databricks, go back to the serving area of registered models.

Model versions, model events, and cluster settings are the three tabs over there. The model versions tab will show you the various variants and their production-ready status.

For all activities performed at that interface, the model events tab provides the timestamp, event type, and message.

The following are some of the most important aspects of model monitoring:

1) Provides protection against the challenges of data and concept drift.

a) The disparity between the data used for training and validation and the data used to make predictions in production is known as data drift.

b) The shift in the link between the input data and the goal data mapping is known as concept drift.

2) Aids in the analysis and improvement of assessment parameters, such as memory usage, resource usage, latency, and throughput.

MLOps Production management in GCP

The AI platform bundle also addresses hyper-scalers like LinkedIn and Uber’s proliferation of internal MLOps systems. Due to “technology sprawl” and the short shelf-life of production machine learning models, data science suppliers such as Cloudera and Anaconda have remarked that managing machine learning models in production has proven difficult.

As a result, MLOps proponents are attempting to integrate continuous model training and monitoring with open source software and application interfaces.

Others, like algorithmia, provide MLOps suites with controls and tools for monitoring machine learning models in production.

About Myself

Hello, my name is Lavanya, and I’m from Chennai. I am a passionate writer and enthusiastic content maker. The most intractable problems always thrill me. I am currently pursuing my B. Tech in Computer Science Engineering and have a strong interest in the fields of data engineering, machine learning, data science, and artificial intelligence, and I am constantly looking for ways to integrate these fields with other disciplines such as science and chemistry to further my research goals.

End Notes

1) We learned why MLOps Operations is necessary, what MLOps entails, and how MLOps progresses. Terraform, Microsoft Azure Databricks, and MLflow support for model management were used to implement the entire lifecycle.

2) Currently, being a good ML developer isn’t enough to meet industry demands; thus, this tutorial is a great place to start if you want to learn MLOps. Implementing hands-on real-world MLOps projects is the greatest method of gaining a thorough grasp of how a real-world machine learning system operates in production.

I hope this article will be more descriptive and interesting!

Thank you for reading.

The media shown in this article is not owned by Analytics Vidhya and are used at the Author’s discretion. 


Update the detailed information about Top 10 File Operations In Bash With Examples on the website. We hope the article's content will meet your needs, and we will regularly update the information to provide you with the fastest and most accurate information. Have a great day!