Düsseldorf, Germany

Hello, I'm

Viktor Hudkov

Experienced IT Professional with Knowledge in SQL, Python, Linux and Mentoring.

Certified by Google Cybersecurity Program.

About me

My proficiency in managing projects was demonstrated by providing new user guidelines for over 200 data categories. I conducted educational meetings and presentations on topics of PII Security, data optimization and data querying.

Beyond my technical skills, I'm experienced in self-management, critical-thinking and mentorship. I am a swift and curious learner, highly-adaptable, trustworthy and reliable team member.

Cyber-Security BreachesBerlin Bike Theft
Security Breaches Analysis

Berlin Bike Theft

April 2021 – September 2022

Bike thefts

Bike theft is a prevalent issue today. Being a bike rider myself, I've derived thought-provoking insights on the topic of bicycle stealing in such a major city as Berlin.

This analysis serves as a foundation for a more thorough case study, a base from where one can further decide in which direction to dig deeper.

Its goal is to showcase my skill and abilities in the realm of Data Transformation, Data Modeling, Data Analytics and Data Visualization.

The End Result: Visualization of the Data and Deriving Insights

Buro Engineers

Interactive Power BI Report

Once all the data had passed all the levels of transformation and formatting, visualizations are created to derive insights from analysis and to display compelling finds.

First page displays theft statistics narrated by crime location. The heat-map visual is able to be filtered by neighborhood and areas of selection. Zoom function allows further analysis of a more precise position.

The second page presents theft numbers by day of the month and week during which the crime occurred. Bars with the values that exceed specific limit are highlighted in blue, while others remain gray. That has been done to not overload users with information and to quickly draw attention to important numbers.

The third page demonstrates relations between bike theft and the hour of the day. Additionally, it shows the average time people wait before reporting the loss to the police and length of the period for which the bike was left unattended.

April 2018 – March 2021

Data Preparation

I have obtained the datasets from public and government resources including the:
The offical portal for European data
Berlin Open Datata

The first major dataset relates to police reports regarding bike thefts. The second one describes city area codes and zones of urban development.
That all is further used for analyzing the crime activity by loction in Berlin.

Once the data was aqcuired, I had imported the entries into desktop SQL Server Management Studio using Python connector script that I developed. This code makes it possible to upload numerous csv files at once, so the importing process became automated.

Since the data derived from the offical resources was in German language, I have translated the csv files using Python scripts and Google Translate API.

Data Transformation and Data Modeling

Buro Engineers

After having all the necessary data translated and imported into my SQL Server Management Studio, I proceeded with formatting and transforming the data to prepare it for the end goal analysis. At first, I have separated the data into two main tables, defining the columns and information we need and the ones to remove.

I have also renamed and formatted particular columns for better readability and analytics purpose.

Two tables are joined via Location ID values that I've configured to be present in both datasets.

Git codes

Importing the Tables into Power BI

Buro Engineers

After the initial data modeling and transformation, I had the primary datasets ready to be imported into Power BI for further formatting and analysis.

I have added the Tables into Power BI and created a simple data model starring the Bike Theft Stats and the Location Description tables, having Berlin city area code IDs as a relation.

Additional data formatting via Power BI interface took place, to make the entries better suited for analysis and further visualizations. As a next step, I created additional calculated columns and measures using DAX scripts and Power BI functionality.

Insights:

Buro Engineers

• Total Thefts registered in the period from rom December 2022 to April 2023: 5447. Total finacial damage caused by crimes: 6.5M EUR.

• One might think that once a bike has been left unattended for a long period of time, it is highly likely to be stolen, yet the data says that it is not the case. The determining factors were rather the hour of the day and the location where bikes were parked.

• The most enticing hour for bike thefts turned out to be between 1 and 2 in the night. Interestingly enough, the second position for the top-theft hour takes the time between 11 and 12 in the midday. It could mean that people leave their bikes out of reach unsafely around the time they have lunch, which has contributed to bike thefts statistics during the middle of the day.

• There were a few cases where the wait-period until reporting the crime to police is quite long, some outliers lasted for more than a year. It's rather a strange tendency that people still decide to apply to police after such a long time. The reasoning behind it could be some potential social benefits or insurance payments, yet the cases remain individual for every specific bike loss.

• The most popular day for a theft was Monday (the night from Sunday to Monday). Not a great surprise, since it's a given that the night between these two is reserved for a good sleep before start of the work week, granting less people out in the streets.

• The most affected neighborhoods were Wilmersdorf, Kreuzberg, Lichtenberg, Prenzlauerberg. Overall, mostly central regions of the city.

• The target number one for thefts was a Men's Bike. The minimum amount of stolen units goes to Cargo Bikes (most likely due to their scarcity on the streets, in favor to other bike types).

• The average wait-period until reporting a bike theft was 6 days, which demonstrates that, usually, people do not rush immediately to the police. The reasoning behind it is either they do not have the time to start this process right away due to intensive work week, or people simply try to recover their bikes themselves at first.

That sums up my Data Analysis Project on the topic of Bike Theft in Berlin.

Buro Engineers

Cyber-Security Breaches

April 2021 – September 2022

Bike thefts

With ever-increasing digitalization of our world the matter of cyber-security grows in importance exponentially.

Hacking methods improve by hour while companies are focusing more resources on trying to win this race of technologies to keep their data intact.

Having that in mind, I have analyzed particular foundational metrics regarding major data breaches over the years from 2004 to 2022 to derive valuable insights and prepare the data for further analysis in the realm of cyber-security.  

The Final Report: Data Visualization and Analysis

Buro Engineers

Interactive Power BI Report

A simple yet effective dashboard emerged from all the derived data on security breaches. The main focus of the dashboard is scattered between three main visualizations:

1. Line chart - dynamics of compromised records and overall breach occurrences throughout the years (from 2004 to 2022),

2. Bar chart - entities that have been affected, sorted in descending order by their compromised record count,

3. Pie chart - distribution of security incidents categorized by the type (e.g. hacker attack, accidental exposure, etc.).

On the top left, we see two cards that represent live-refreshing cards with numeric values displaying total of cases and sum of compromised records.

The top right section allows to filter through various organization types to narrow down the information (e.g. filter only governmental data loss, social network organizations, etc.)

April 2018 – March 2021

Data Preparation

I started with deriving an open dataset related to security breaches across major organizations of various types  (source link)

Since the original data was not formatted ideally for analysis, I performed a series of tests and data cleansing activities. I have also edited names and order of the columns for better readability and analysis.

Buro Engineers

The first Data Cleaning issue to overcome was the NULL values detected in two significant columns. These values were related to the method (type)of the data breach and to the number of records that had been compromised due to the leak.

Since the number of compromised values was a numeric type of data, it was acceptable to keep the NULL values as they are. It is better not to artificially add the missing values based on average / median values in this particular case, in order to not skew the trends into misleading direction.

NULL values related to the method of data breach were converted into a string - "unknown", for better readability and further analysis.

Git codes

Further Data Cleaning and Formatting

Buro Engineers

The next challenge was to bring a variety of synonymic and duplicate values to the same format to get amore precise analysis of the data.

Using my industry knowledge, I have combined the breach types and data types to concise and more accurate forms.

As a result, the number of DISTINCT types of organization went down from 70 to 41. For breach types, the number of DISTINCT values went from 25 to 8. It was a combined effort of SQL scripts and Microsoft Power BI functionality.

Such a decrease in numbers of distinct 'type' or 'a kind' values in a dataset leads to more precise and efficient analysis and more accurate insights and aggregations.

Importing the Data into Power BI

Buro Engineers

After having all the data prepared in SQL Server Management Studio, I have imported the processed entries into Microsoft Power BI.

A few further transformations and formatting events had rounded off the final dataset that is now ready for analysis and building reports.

Insights:

Buro Engineers

• Most of the data leakage happens due to hacker attacks (55% of the whole pool). This signifies the importance of implementing up to date cyber-security measures for organizations of any type, especially in the realm of government and health care. This trend is projected to grow with further digitalization of our world.

Top 4 organization types for data breaches are: web services, healthcare institutions, finance and banking firms, governmental establishments.

• The clear outlier that has suffered an absolute max of compromised records is a webservice Yahoo. Over3 billion records related to user account data have been exposed due to a hacker attack.

• The overall trend for data breaches and cyber-attacks throughout the years is rather volatile, which demonstrates an unstable race and confrontation between the newest technologies that are used by attackers and continuous improvement and implementation of the newest security measures by the companies of all types.

Go To Berlin Bike Theft


Looking forward to tell you more, contact me below.

Buro Engineers

Work Process (gallery)