Prajwal S R on LinkedIn: Academy Accreditation - Azure Databricks Platform Architect • Prajwal S R… (2024)

Prajwal S R

Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

  • Report this post

I am very glad to share that I have completed and received the Azure Databricks Platform Architect Accreditation badge from the Databricks Academy.Databricks#azuredatabricks #platformarchitect

Academy Accreditation - Azure Databricks Platform Architect • Prajwal S R • Databricks Badges credentials.databricks.com

29

8 Comments

Like Comment

Tejaswini Paturi

Senior Manager, Agile Leadership, Product Vision, Strategic Planning, Operational Excellence and Customer Success

6d

  • Report this comment

Congratulations Prajwal S R

Like Reply

1Reaction 2Reactions

Smrithy C

Azure Developer@DATABEAT || Top DataEngineering Voice || Ex-Mindtree || Ex-Picktail || AZ-900/DP 900 /DP 600 Microsoft Certified

6d

  • Report this comment

Congrats! Prajwal S R

Like Reply

1Reaction 2Reactions

Padmaja Kuruba

Dr.Padmaja Kuruba

5d

  • Report this comment

Congrats!

Like Reply

1Reaction 2Reactions

See more comments

To view or add a comment, sign in

More Relevant Posts

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    How to add a new column to the created dataframe and add new data to the columns using the existing data. For example, if I have employee details in a table which has First_name, Last_name, Emp_ID, Role. How can we add a new column to the dataframe by creating the email ID for all the employees.Consider the below data:|F_Name|L_Name| Company | ID ||----------|------------|------------|----||Sachin |Tendulkar| ABC | 10 ||Rahul |Dravid | BAC | 19 ||Virat |Kohli | XYZ | 18 ||Rohit |Sharma | ABC | 45 ||Jasprit |Bumrah | BAC | 93 |With this data, If I have to add a new column to the dataframe: Email which has to take the data from the existing columns and populate the email ID all the users, I can use the CONCAT function along with the df.withCoulmn() functions available in pyspark to add a new column and generate new data for the column. Below is the example code snippet which we can use:from pyspark.sql.functions import litfrom pyspark.sql.functions import concatdf1 = df.withColumn("Email", concat("F_name", lit("."), "ID", lit("@"), "Company", lit(".com")))We will have to import the lit, concat functions and with the above command, there will be a new column added to the dataframe by populating the email ID's from the data available in the dataframe itself.|F_Name|L_Name | Company | ID | Email ||----------|-------------|------------|----|------------------------||Sachin |Tendulkar | ABC | 10 | Sachin.10@ABC.com||Rahul | Dravid | BAC | 19 | Rahul.19@BAC.com||Virat | Kohli | XYZ | 18 | Virat.18@XYZ.com||Rohit | Sharma | ABC | 45 | Rohit.45@ABC.com||Jasprit | Bumrah | BAC | 93 | Jasprit.93@BAC.com |Please feel free to add any points about this in the comments and the methods you would be using for this.#azuredatabricks #dataengineer #databricks

    16

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    In my previous post, I had explained about the different types of Private endpoint sub-resources that we can create and also its uses. In this post, we will discuss about the different types of private endpoints that we can create with respect to the Azure Databricks workspaces. Based on how the private endpoint is created, there are 2 types of endpoints: Frontend endpoint and Backend endpoint. Even though there is no specific pages or configs separately available for these endpoints, it depends on which VNET we are using to create the endpoints. Based on this, we will identify the of endpoint that is created.Frontend endpoint: This is the endpoint that is created for connections from the users to the Control Plane. For example, the frontend endpoint will make sure the connection requests from the users (Azure Portal page) REST API's etc are securely connected.When we are creating a VNET-injected Databricks workspace, there will already be 2 subnets created for private and public subnets. Along with these 2, we can create another subnet in the same VNET and use it to create a private endpoint. This is considered as the Frontend endpoint.Backend endpoint: This is the endpoint which is created for connections between the Data plane and the control plane, which is from the workspace to the control plane. All the cluster startup requests, job run requests etc will go through this private endpoint to connect securely.This endpoint can be created for provide access to the on-prem networks or the other networks. So for this, along with the VNET used to deploy the Databricks workspace, we can create another VNET. In this separate VNET, we can create a subnet and use it to create the endpoint. This will be considered as the Backend endpoint. This VNET can have a peering with the on-prem network or with other networks for which the access should be allowed securely.Please feel free to add any points if I have missed.#azuredatabricks #privateendpoint #dataengineer #networking

    8

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    When we get the data in raw format, there will be a need to clean the data and have it ready with the desired format.It is important to find the null values and remove the duplicate data which will also reduce the number of records fetched while querying the table. Below I have written sample queries in SQL and Pyspark to find the null records and also to remove the duplicate records:Finding null values:Select count_if(email IS NULL) from users;Select count(*) FROM users WHERE email IS NULL;From pyspark.sql.functions import colusersDF = spark.read.table("users")usersDF.selectExpr("count_if(email IS NULL")usersDF.where(col("email").isNull()).count()‐-----------------------------------------------Remove duplicate records:Create or replace TEMP VIEW sample ASSELECT user_id, timestamp, max(email) AS email_id, max(updated) AS max_updatedFROM usersWHERE user_id IS NOT NULLGROUP BY user_id, timestamp;SELECT count(*) FROM sample From pyspark.sql.functions import maxsampleDF = (usersDF .where(col("user_id").isNotNull()) .groupBy("email", "timestamp") .agg(max("email").alias("email_id"), (max("updated").alias("max_updated"))sampleDF.count()Let me know in the comments about the methods you have used to find null and duplicate records.#azuredatabricks #sql #pyspark #dataengineer

    13

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    Once we start using Unity Catalog in our Azure Databricks account, we come across the different types of tables that we can create: Managed tables and External tables. It is important to know the difference between these types.1. Managed Tables: These are the tables which will be saved in the managed storage. This is the location which we would have given while creating the metastore. 2. External table: These are the tables whicj will be saved in the external location that we have created in the locations. It can be either the exact external location or the nested folder of the external location.In both the cases, the metadata will be stored in the Unity catalog only. Also there is no differnce in the access permissions. When we drop the table which is a managed storage, both the data and the metadata will be deleted. But with external tables, only the metdata available in the workspace will be deleted and the data will still be available in the external location.Which type of table have you created and which is better. Let me know about your thoughts in the comments.#azuredatabricks #databricks #tables #dataengineering

    18

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    One of the recent features added to the Azure Databricks service is the ability to create Private link connection for the workspaces to ensure the connection is secure is going through only the approved network. We can also connect from our on-prem networks securely using the transit VNET.Private link is a feature where we can create a private endpoint for the resources and there will be a IP assigned to the endpoint which will be used for all the connections. We can create private endpoints for a wide range of resources and the types of the sub-resoucres in the endpoint varies depending on the types of service we are creating the endpoint for.For ex, we can create a private endpoint for ADLS with the below sub-resource types: dfs, blob, file etc. Similarly there are two different types we can select for the Azure Databricks workspaces: databricks_ui_api and browser_authentication:1.Browser_authentication: This endpoint type can be selected when we have multiple workspaces in the same region where we can create one endpoint per region. Once created and connected with the network, all the authentication requests (SSO) for all the workspaces in that region will go through this endpoint itself.2.Databricks_ui_api: This is the endpoint that is used to connect to the Databricks control plane and also the connections to the other Azure resources. Each workspace must have a separate endpoint with this type. The network traffic for a Private Link connection between a transit VNet and the workspace control plane always traverses over the Microsoft backbone network.Please feel free to add any points if I have missed. I will be posting later about the different types of private endpoints that we can create for Azure Databricks workspaces.#azuredatabricks #networking #privatelink

    15

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    How can we access the ADLS resource from Azure Databricks workspaces and which is the better method to use.If we are not using Unity Catalog in our environment, we can still connect to the ADLS resource from the workspace using different methods. First of all, there are 3 main different types of authentication available as listed below:1. Service Principal authentication (also called as Oauth method).2. SAS Key method.3. Account Key method.All the above methods has its own advantages and disadvantages. Bearing in mind managing the keys and key rotation, many of them go for SPN authentication. Even this method has secret key creation step and requires specifying key expiry. It requires to generate a new key if it expires and we must update the same in the spark config commands or in the secrets that we have stored in the Key vault.Once we have decided which authentication method to use, there are two different access methods which we can use with any of the above 3 authentication types:1. Mounting method.2. Direct access method.Even though mounting method is used by most of the users, it is not the recommended method as this method has been deprecated by the Databricks team. There are several reasons for deprecating this method, such as:A. The mount point created using a cluster can be accessed from any other cluster if the user has the mount point name.B. The mount point can also be deleted by any user if they have the mount point name and has access to any cluster.So it is recommended to use Direct access method where we avoid creating the mount point. As we will be using the spark config commands to access ADLS, we can utilize the below points to ensure the content is not visible to everyone:1. Store the credentials in Key vault and access it using dbutils command.2. Use notebook ACL's to give access to limited people.3. Pass the spark configs through the Advanced configs tab in the cluster and enable cluster ACL's.Feel free to add more points on this.#azuredatabricks #dataengineer #adls #spark

    32

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    To calculate the databricks usage cost, here is the formula: Total cost for Databricks service = VM Cost + DBU CostVM Cost = [Total Hours] X [No. of Instances] X [Linux VM Price]DBU Cost = [Total Hours] X [No. of Instances] X [DBU] X [DBU Price/hour - Standard / Premium Tier] Here is an example on how Azure Databricks billing works? Depending on the type of workload your cluster runs, you will either be charged for Jobs Compute or All-Purpose Compute workload. For example, if the cluster runs workloads triggered by the Databricks jobs scheduler, you will be charged for the Jobs Compute workload. If your cluster runs interactive features such as ad-hoc commands, you will be billed for All-Purpose Compute workload. If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for All-Purpose Compute workload: VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598 DBU cost for All-Purpose Compute workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.55/DBU = $1,100 The total cost would therefore be $598 (VM Cost) + $1,100 (DBU Cost) = $1,698. If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for Jobs Compute workload: VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598 DBU cost for Jobs Compute workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.30/DBU = $600 The total cost would therefore be $598 (VM Cost) + $600 (DBU Cost) = $1,198. If you run Premium tier cluster for 100 hours in East US 2 with 10 DS13v2 instances, the billing would be the following for Jobs Light Compute workload: VM cost for 10 DS13v2 instances —100 hours x 10 instances x $0.598/hour = $598 DBU cost for Jobs Light Compute workload for 10 DS13v2 instances —100 hours x 10 instances x 2 DBU per node x $0.22/DBU = $440 The total cost would therefore be $598 (VM Cost) + $440 (DBU Cost) = $1,038. In addition to VM and DBU charges, you may also be charged for bandwidth, managed disks, storage cost.#databricks #azuredatabricks #dataengineer #cost

    30

    9 Comments

    Like Comment

    To view or add a comment, sign in

  • Prajwal S R

    Consultant at Capgemini | Azure Databricks | Azure Data Factory | Pyspark | SQL | Cloud Academy certified Azure Databricks Specialist | Microsoft Certified Azure Fundamentals | Ex-LTIMindtree

    • Report this post

    I’m happy to share that I’m starting a new position as Consultant at Capgemini!

    This content isn’t available here

    Access this content and more in the LinkedIn app

    119

    94 Comments

    Like Comment

    To view or add a comment, sign in

Prajwal S R on LinkedIn: Academy Accreditation - Azure Databricks Platform Architect • Prajwal S R… (32)

Prajwal S R on LinkedIn: Academy Accreditation - Azure Databricks Platform Architect • Prajwal S R… (33)

866 followers

  • 27 Posts

View Profile

Follow

Explore topics

  • Sales
  • Marketing
  • Business Administration
  • HR Management
  • Content Management
  • Engineering
  • Soft Skills
  • See All
Prajwal S R on LinkedIn: Academy Accreditation - Azure Databricks Platform Architect • Prajwal S R… (2024)
Top Articles
Crazy Eights Rules - Instructions and common scoring variants clearly explained
Is Costco Gas Good? Quality, Cost & Benefits | Ridester
Craigslist Free En Dallas Tx
Avonlea Havanese
Mrh Forum
Online Reading Resources for Students & Teachers | Raz-Kids
Flights to Miami (MIA)
Select The Best Reagents For The Reaction Below.
Hallowed Sepulchre Instances & More
Irving Hac
Whiskeytown Camera
Notisabelrenu
Valentina Gonzalez Leak
6th gen chevy camaro forumCamaro ZL1 Z28 SS LT Camaro forums, news, blog, reviews, wallpapers, pricing – Camaro5.com
Lenscrafters Huebner Oaks
Gma Deals And Steals Today 2022
Eka Vore Portal
24 Hour Drive Thru Car Wash Near Me
Florida History: Jacksonville's role in the silent film industry
Site : Storagealamogordo.com Easy Call
St. Petersburg, FL - Bombay. Meet Malia a Pet for Adoption - AdoptaPet.com
Dover Nh Power Outage
Atdhe Net
Mega Personal St Louis
8005607994
Best Sports Bars In Schaumburg Il
F45 Training O'fallon Il Photos
Sound Of Freedom Showtimes Near Movie Tavern Brookfield Square
Meta Carevr
Nearest Ups Ground Drop Off
Encore Atlanta Cheer Competition
91 Octane Gas Prices Near Me
Nurtsug
Ravens 24X7 Forum
Palmadise Rv Lot
Tamilrockers Movies 2023 Download
Dreammarriage.com Login
Metro By T Mobile Sign In
Bismarck Mandan Mugshots
Nancy Pazelt Obituary
Craigslist En Brownsville Texas
Mid America Irish Dance Voy
Citibank Branch Locations In Orlando Florida
Skyward Marshfield
Stranahan Theater Dress Code
Luciane Buchanan Bio, Wiki, Age, Husband, Net Worth, Actress
The Nikki Catsouras death - HERE the incredible photos | Horror Galore
Suppress Spell Damage Poe
Used Auto Parts in Houston 77013 | LKQ Pick Your Part
Prologistix Ein Number
What Is The Gcf Of 44J5K4 And 121J2K6
Factorio Green Circuit Setup
Latest Posts
Article information

Author: Saturnina Altenwerth DVM

Last Updated:

Views: 6244

Rating: 4.3 / 5 (44 voted)

Reviews: 83% of readers found this page helpful

Author information

Name: Saturnina Altenwerth DVM

Birthday: 1992-08-21

Address: Apt. 237 662 Haag Mills, East Verenaport, MO 57071-5493

Phone: +331850833384

Job: District Real-Estate Architect

Hobby: Skateboarding, Taxidermy, Air sports, Painting, Knife making, Letterboxing, Inline skating

Introduction: My name is Saturnina Altenwerth DVM, I am a witty, perfect, combative, beautiful, determined, fancy, determined person who loves writing and wants to share my knowledge and understanding with you.