Big Data, the opportunity to transform information into knowledge

© Shutterstock
© Shutterstock
Nowadays we collect a huge amount of data also called Big Data, a term that became a trend. However it is important to understand the context of this term and to be aware of what it signifies. Big Data, massive data or large-scale data are difficult to manage with an Excel or Access programme. They need more powerful and specific software able to manage and take advantage of this data, such as Oracle, R or SAS, which store terabytes and even petabytes of data. All of these three programmes are abstract and difficult to use, and therefore mathematicians, statistics and data mining specialists are the ones who are in charge of structuring this software.


The reason Big Data gains importance is due to the huge amount of information that is generated and the need to take advantage of it in order to transform this quantity of information into knowledge.

Regarding the pharmaceutical industry, most of the activities in the production process are managed by computer applications and devices with digital output that produce data that can reach the terabyte for a midsize pharmaceutical industry. There is no use in producing information if we are not able to integrate and connect them. The lack of connection, most of the times, is due to the fact that the technology applications are not able to do it. The devices installed in the pharmaceutical companies have the most advanced technological systems, but this does not mean necessarily a competitive advantage. In order to transform the implementation of technology into earnings for the company, an inclusive design which does not isolate the process in which it is added should be drawn up. In most cases where a highly technological capacities equipment is installed, the interaction of the information produced by this equipment and the sub-processes which directly or indirectly depend on it are not taken into account.


                When it happens, the design is not able to integrate and exchange the information between the different technologies that are used and two immediate costs are being assumed:

  1. The cost that will be produced in the future due to the modification of the established infrastructure in order to develop the design.
  1. The cost associated with the lack of exploitation of the inter-processes information. In the manufacturing process it is important to study the connection between the different sub-processes in order to obtain an added knowledge to manufactured products. Therefore the information should be interconnected in order to achieve the processes.

On the basis of these conclusions, to ensure a proper implementation of a knowledge structure based on a Big Data system it is important to take into account the following terms:

Having a detailed designed about the different components which provide or could provide knowledge. This components can be resources (equipment, staff, materials), materials, intangible things, departments, PNT, manuals, KPI, ccp, etc.

Integrating the information generated by each component in a structure which allows a quick access through different searching criteria.

Big amounts of information are not only produced in the manufacturing process, but also in marketing, sales and distribution departments. The subsidiary Pfizer in Ecuador used Excel to register and analyse movements in these departments but at the time of Big Data, it becomes difficult to bring altogether every single datum from different sources with an Excel program. Therefore, Pfizer went to approach Noux (a company with the headquarters in Ecuador that is working with SAP), and with this new system, was able to manage the data overloading and collect multiple flows of information, in order to monitor the products and to make better decisions.

On the other hand, on October 2014, the National Health Institute (NIH) stated it will be going to invest 32 million dollars to develop new strategies to analyse and take advantage of the huge amount of data resulting of the intensive biomedical research also known as Big Data to Knowledge (BD2K). The funds for the BD2T are coming from the 27 NIH Institutes and Centres as well as from the NIH Common Fund. They cover four areas which together mean the enrichment, accessibility and efficiency of the Big Data in biomedical science.

Apart from the pharmaceutical and research industry, there are a lot of areas and professional activities that base a fundamental part of its work in the management of big amounts of data, such as engineering, meteorology, police investigation, social networks, financial system, etc. All of them need to exploit Giga-bytes or even Tera-bytes of information in short periods of time in order to extract knowledge in real time.


Once this is achieved, it will be possible to take advantage of this to create knowledge from information and transform all this “raw” and “messy” information into intelligent and useful one by the use of the proper methodology.