What is Data Science?
While trying to figure out what is big data, the term "datascience" has started to circulate in recent years. The result of a natural development, namely, no one is producing data science, data scientists, big data terms out of the blue. This is how the terms began to arise at the end of a natural adventure:
Technological progress in information systems continued at full speed. It's gotten easier to generate data every day. The data transmission performance of network technologies has started to reach incredible speeds with fiber technology in order to move this data from one place to another. The cost of disk unit space to store data has been reduced to a very low level. Also, I don't want to go without mentioning the abundance of data-producing resources. Information systems have become more and more widespread and have entered every part of human life. There are many data sources generated by sensors such as customer transactions, banking transactions, e-commerce transactions, product reviews, RFID data, electronic health records, insurance reimbursement records. As a result, there is an abundance of data in the world. However, with the ease of producing, transmitting, processing and storing data, other problems are beginning to arise. We have a lot of data. So what's the big deal? So what are we going to do with all this data? We can't stash away forever. Which ones are we going to keep and which ones we’re going to throw it away? And the real question is how do we do this? At this point, data science and data scientist terms began to emerge along with big data technologies.
Until very recently, let's say before the big data flow, the relational database was the most common database for databases, meeting almost all needs. However, with the large data flow, the relational database became unable to cope with data with big data characteristics (volume, speed, diversity). NoSQL databases and big data technologies with horizontal scaling have started to replace relational databases that do not have the ability to scale out. This technology allows a cluster of servers to extend to hundreds to thousands of nodes.
So what kind of information do we aim to reach from the data?
- We want to get useful information about the past, the present and the future. In other words, it is as important as what time frame the information we want to reach belongs to, as well as whether this information is useful or not.
- When we come across observations similar to our data but that we’ve never encountered before, we want to be able to make sense of them within the framework of our data. So within the categories we know, we want to classify new observations that we don't know about.
The answer above refers to the objectives in terms of data science. So, how does data science do this and what does it benefit from when it leads us to these goals?
Statistics: The discipline called Statistical Learning Theory, aims to solve problems through optimum parameters by expressing the problems we want to solve as a statistical model.
Computer Science: Programming is as important a component as statistics. Data Science basically uses programming to identify algorithms to computers that can learn without being programmed. Sure enough, we do not use programming languages only for this purpose. The purposes for which we use programming languages can be listed as follows:
- Pulling data from repositories, databases or files.
- Implementing data manipulation, cleansing and generation
- Visualizing the data.
- Extracting descriptive statistics by performing mathematical operations on the data.
- To put machine learning methods into codes that computers will understand.
- To train our models with data.
- Transferring our models to production systems to serve the outside world and keeping our models alive in the production environment at all times.
Field Information: If you do not have the expertise specific to the field you work in, you are at risk of being deceived by the data. Actually, there is no need for us to take the matter so extreme. With simple logic, we can think of it like this: Content knowledge not only guides us in terms of which data is useful, but also sheds light on our way at the point of causality. Expertise in what factors can have what consequences, I can assure you, is valuable beyond any data and methods.
Leave your thought here
Your email address will not be published. Required fields are marked *
Comments (0)