Data Cleaning Techniques: Learn Simple & Effective Ways To Clean Data

In this article, we will learn about the different data cleaning techniques and how to effectively clean data using them. Each technique is important and you also learn something new.

Top Data Cleaning Techniques to Learn
Let’s understand, in the following paragraphs, the different data cleaning techniques.

Remove Duplicates
The likelihood of having duplicate entries increases when data is collected from many sources or scraped. People making mistakes when keying in the information or filling out forms is one possible source of these duplications.

All duplicates will inevitably distort your data and make your analysis more difficult. When trying to visualize the data, they can also be distracting, so they should be removed as soon as possible.

Remove Irrelevant Data
If you’re trying to analyze something, irrelevant info will slow you down and make things more complicated. Before starting to clean the data, it is important to determine what is important and what is not. When doing an age demographic study, for instance, it is not necessary to incorporate clients’ email addresses.

There are various other elements that you would want to remove since they add nothing to your data. They include URLs, tracking codes, HTML tags, personally identifiable data, and excessive blank space between text.

Standardize Capitalization
It is important to maintain uniformity in the text across your data. It’s possible that many incorrect classifications would be made if capitalization were inconsistent. Since capitalization might alter the meaning, it could also be problematic when translating before processing.

Text cleaning is an additional step in preparing data for processing by a computer model; this step is much simplified if all of the text is written in lowercase.

Convert Data Types
If you’re cleaning up your data, converting numbers is probably the most common task. It’s common for numbers to be incorrectly interpreted as text, although computers require numeric data to be represented as such.

If they are shown in a readable form, your analytical algorithms will be unable to apply mathematical operations because strings are not considered numbers. Dates that are saved in a textual format follow the same rules. All of them need to be converted into numbers. For instance, if you have the date January 1, 2022, written down, you should update it to 01/01/2022.

Clear Formatting
Data that is overly structured will be inaccessible to machine learning algorithms. If you are compiling information from several resources, you may encounter a wide variety of file types. Inconsistencies and errors in your data are possible results.

Any pre-existing formatting should be removed before you begin working on your documents. This is typically a straightforward operation; programs like Excel and Google Sheets include a handy standardization feature.

Fix Errors
You’ll want to eliminate all mistakes from your data with extreme caution. Simple errors can cause you to lose out on important insights hidden in your data. Performing a simple spell check can help avoid some of these instances.

Data like email addresses might be rendered useless if they contain typos or unnecessary punctuation. It may also cause you to send email newsletters to those who have not requested them. Inconsistencies in formatting are another common source of error.

A column containing just US dollar amounts, for instance, would require a conversion of all other currency types into US dollars to maintain a uniform standard currency. This also holds for any other unit of measurement, be it grams, ounces, or anything else.

Language Translation
You’ll want everything to be written in the same language so that your data is consistent. Also, most data analysis software is limited in its ability to process many languages because of the monolingual nature of the Natural Language Processing (NLP) models upon which it is based. In that case, you’ll have to do a complete translation into a single language.

Handle Missing Values
There are two possible approaches to dealing with missing values. You can either input the missing data or eliminate the observations that contain this missing value. Your decision should be guided by your analysis objectives and your intended use of the data.

The data may lose some valuable insights if you just eliminate the missing value. You probably have your reasons for wanting to retrieve this data in mind. It may be preferable to fill in the blanks by determining what should be entered into the relevant fields. If you don’t recognize it, you can always substitute “missing.” If it’s a number, just type “0″ into the blank. However, if too many values are missing to be useful, the entire section should be eliminated.

Conclusion
We reach the final parts of the article, having discussed 8 highly important data cleaning techniques professionals must know about. These techniques make your job easier to deal with data, removing unwanted ones. If you feel data and numbers are where you feel at ease, data science is the ideal career path for you.

Skillslash can help you build something big here. With Best Dsa Course, and with its Data Science Course In Hyderabad with a placement guarantee, Skillslash can help you get into it with its Full Stack Developer Course In Bangalore. you can easily transition into a successful data scientist. Get in touch with the support team to know more.

What is Stress and Strain Curve?

When you are talking about solids and various other materials, it is crucial to understand how these types of materials usually react when a force is applied. This process helps the students identify their strengths, deformations, and various other parameters acting on the objects, by calculating the various types of forces acting on them. And in order to find these parameters, the stress and strain quantities are important, and here in this article, we are going to provide a detailed guide about them below.

What is Stress?
Stress is defined as the force per unit area that is observed, in different types of materials, when an external force is applied. These external forces are generally uneven heating, permanent deformation, etc.

Types of Stress
There are different types of Stress that can be applied to a material, such as,

Compressive Stress
When a force acts on a body, it causes a reduction in the volume of the said body, resulting in deformation. This type of stress is referred to as Compressive stress.

Tensile Stress
When an external force is applied per unit area on a material, and it results in the stretching of the said material, then it is described as Tensile Stress.

What is Strain?
If a body experiences deformation due to the applied external force in a particular direction, then it is called strain. Moreover, the strain does not have any dimensions, as it only explains the change in the shape of the object.

Types of Strain
Similar to stress, strain is also differentiated into Compressive Strain and Tensile Strain.

Compressive Strain
Compressive strain is defined as the deformation observed on an object when compressive stress acts on it. And in this type of strain, the length of the material or object generally decreases.

Tensile Strain
The Tensile stress acting on a body or a material that causes the increase in the length of said material is referred to as a tensile strain.

Stress-Strain Curve
This graph explains how stress and strain act on a body with respect to each other, as well as the different regions formed on the graph.

The OA line represents the Proportional Limit, as it described the region, where the material or body obeys Hooke’s Law. And this line can help students to calculate Young’s Modulus, using the ratio of
Stress and Strain Curve.
Now, the AB line represents the Elastic Limit of the object, which means that after this point, the body does not retain its original shape or size, when the acting force is removed.
As you can guess, the BC lines describe the Yield Point. Which, when force is applied on the material, then there is complete deformation in the object, which cannot be reversed, even if the force is removed.
D point on the graph is the point beyond which students can observe the complete failure of the object, as it crosses the maximum stress a material can endure. This point is stated as Ultimate Stress Point.
E is the Fracture of Breaking Point, at which students can observe the complete failure of deformation of the object, regardless of the force whether it is applied or removed.
Hooke’s Law
From the above sections, we have learned all about types of stress and strain, and their units, as well as a graphical representation of stress and strain on objects. Now let us talk about Hooke’s law of stress and strain, which plays an important role in helping us understand how stress and strain work on an object when force is applied.

According to this principle, the strain of the material is equal to the applied stress, in the elastic limit region of the said object or material. And it is represented as,

F = –k.x

F = Force

X = Extension of Length

K = Spring Constant

In the above article, we have explained in detail the terms, stress and Strain, how they act, units of stress and strain, types of stress and strain, etc. This will be helpful for students to solve any kind of problems from these chapters, or understand other subtopics easily from the next chapters. However, if you are still worried, about how to cover a large number of complex topics and chapters in Physics. Then the best solution for you is to join Online Coaching Platforms. Like the Tutoroot platform, which offers cost-effective online interactive classes that come with various amazing benefits for the students.

CVs are outdated, Portfolio is the future

Who made the CV for the first time in the world?Interestingly, it was the same person who conceptualized the flying machines. Leonardo de Vinci. He made a CV 600 years back to offer his services to the Duke of Milan. The famous painter highlighted his skills in building bridges, trenches, mines, weapons and sculpture in his CV. Then it became a trend. Artists used to prepare CVs to present to the lords of the countries they traveled to.

The Modern CV
In 1937 Napoleon Hill published steps of success in a book “Think and Grow Rich”. One of the key steps was to prepare a killer resume. That is how modern CVs were formalized. After the advent of computers, the format and structure of the CVs also got standardized to a large extent.

Before the liberalization of the Indian economy, people used to follow one career, one job model. Hence there was no need to prepare a CV more than once in a career. After the liberalization in the 1990s, people started changing jobs. The model changed to “one career, multiple jobs”. Each job change required an updated CV. Hence the importance of CV grew multifold. The entire recruitment business started running on CVs.

Making CV Effective
Apart from the candidate, recruiter and hiring manager are the primary consumers of CV. The shape and the form of CV needs to be adjusted for these two key stakeholders to be effective. Here are some of the pointers candidates must remember to make CV more effective.

Prepare CV like an Advertisement
Guess how many CVs come for a position? On an average, each job attracts 250 resumes. Out of these resumes, 88% of the resumes are irrelevant. These resumes are screened within 2 to 5 hours. Which means each resume gets less than 30 seconds on an average. Hence, CV must be prepared like a 30 second advertisement for the target audience.

Moderate the Length
When you search for something on google, how many times you go to the results beyond the first page. Research shows that the second page click through ratio is less than 1 per cent. That is why it is important to have all the key information of your CV in the first page itself.

Structure as pe eyeballs
Nielsen Norman Group figured out that when people look at the screen they read content in an F pattern. The most attention goes to the first line, then people move vertically down and read with lesser attention, making eye heat maps in the shape of an F. Knowing this, candidates should structure their CVs in such a manner that key highlights are covered under the F map.

Customized for a Job
When Leonardo de Vinci wrote his resume for the Duke of Milan, He mentioned only those things which were important for the Duke of Milan at that time. He mentioned what he could do for the Duke instead of quoting all his achievements and skills. Candidates must customize their resume for the job instead of dumping all that they know about themselves.

The need for PortfolioAs the world is going digital, some fundamentals are changing. These fundamental changes will pave the path for portfolios. Here are the two important drivers for portfolios.

Multiple Jobs, Multiple Careers
Society is moving away from the concept of one career. More so, with the advent of the Gig economy, people have realized that they have multiple talents which they can put to use for making money. TED started promoting this idea. TED asks you to write your skill or expertise in your introduction instead of designation. If you see LinkedIn or Twitter profiles, you see multiple talents like |Author| Speaker| Business Leader| Inventor| Social Media Expert| Dancer| Standup Artist|. Can you afford to have one resume for all the talents?

Support your claims
When CV templates are available, then people can copy content as well. Anyone can claim to have any skill. This has brought down the credibility of CVs. Hence recruiters need CVs backed by evidence. They prefer software developers to showcase their projects on platforms like Github, Designers to showcase their portfolios on platforms like dribble, Experts to showcase their knowledge on platforms like quora or passing certifications by clearing skill assessments on LinkedIn.

Easily Searchable

As the world is going digital, there are digital tools which are getting developed. Search is becoming more and more powerful. It is imperative for each job seeker to manage their own online presence with proper keywords and linkage to evidence of claims made on the front page (CV). One can make this happen by preparing their own website or web presence. These may include webpages, supporting projects, infographics and videos.

CVs in their current shape and form are going to be outdated. They will emerge as Portfolios. The first movers will have a definitive advantage as digital inventory in the digital space is limited. This is the time to wake up, if you do not want to appear on the second page of the google search.