About

  • 2020: The first case of COVID-19 is reported in Kenya. Working shorter hours made me consider a career change.
  • 2021: resigned and applied for Master of Science in Data Science and Analytics (MSc DSA) in Strathmore University and got accepted but choose to self-teach. Learnt HTML, CSS, JavaScript, Python, Git and GitHub.
  • 2022: advanced to back-end web development. Learnt NodeJS/Express, MongoDB and PostgreSQL.
  • 2023: my interest in Data Science and Analytics is rekindled. Learnt SQL .Built a Python pharmacy locator web scraper.

To offer you more and better insights on your business's historical position, present performance and its going concern I use SQL, Python, Excel, Tableau and Microsoft Power BI.

Pharmacy Locator
Python Web Scraper

Photo by Gene Gallin on Unsplash

Pharmacy Locator is a Python automated tool that aids in collecting data of the 19,400 independent community pharmacies listed on the National Community Pharmacist Association's website.


Data Exploration of
Northwind Traders
with SQL

Photo by Andy Li on Unsplash

This is an indepth data exploration of the Microsoft's Northwind Traders database with SQL.

Blog

Photo by Gene Gallin on Unsplash

How to Build a Pharmacy Locator Python Web Scraper | Nov 1st, 2023

In year 2021, I embarked on a front-end web development self-taught journey. In 2022 I decided to advance to back-end web development. This is where I was exposed to databases and my interest in data analysis was sparked.On researching the different paths to this new field, I discovered Alex The Analyst on youtube and I 'joined' his Youtube Data Analysis Bootcamp. On completing his web scraping with python 'class', I decided to challenge myself to work on a personal project. I was adventurous enough to check web scraping work posted on UpWork and I found the task below.

I thought wow $100 to scrape a website, this must be easy.But, I was very wrong. Kindly allow me to take your through the following two weeks of my life as I worked on the three 'easy' steps as listed on the task.

Step One: Go to this website: https://ncpa.org/pharmacy-locator

Note that it indicates that their database contains 19,400 pharmacies. I took that information lightly until it was time to implement my project.

Please note that the user is required to input some information specifically City & State OR Zip Code, choose the radius in miles from a dropdown, choose service category from another dropdown and finally click on the search button. This takes us to the next step.

Step Two: Scrape from every zip code

On doing a quick search I quickly discovered that there were 41,704 ZIP codes ranging from 00501 to 99950. Please do not be fooled by the familiar word range as I was. I thought that all I needed to do was to simply find all the individual numbers within the ranges. I was wrong again. So being so very resourceful I looked for a website where I could scrape the ZIP codes, easy right?

I used BeautifulSoup to scrape the ranges from the website and saved the results in an excel worksheet. My intention was to use SQL to find the individual numbers within the ranges. I had previously learnt how to use Microsoft SQL server on Alex's class but on doing a quick search I realized that the function generate_series() would be the most efficient method to get the individual numbers. The generate_series() function in SQL server cannot be used with select clause but PostgreSQL allowed the function to be used with the select clause. So I downloaded PostgreSQL and googled how to use it on the fly. The process was not easy but I learnt the pros and cons of PostgreSQL and had a hands-on experience. I imported the excel worksheet to PostgreSQL which I found to be not as easy as it with SQL Server. The function was actually very easy to use and I got a list of the populated ZIP codes from the ranges. While testing the codes on the zip locator I realized that not all the derived numbers were valid as ZIP codes. On further research I found a ZIP codes database set online.

I decided to use the available database set to scrape pharmacies for every zip code. I quickly realized that since the task required user input BeautifulSoup would not be sufficient. But Selenium proved to be more useful. I used Selenium's documentation, imported the necessary functionalities and that leads us to the next step.

Step Three: Provide the available fields (pharmacy name, address, phone number, website, business hours, and services)

This was the most exciting part. With a background in web development, I found myself stuggling with how to read global variable inside a function while coding with python. How to handle timeout errors and hence preventing program crashes. The use of try except blocks instead of try catch blocks. I used time.sleep() function to automate the process of scraping and recording the pharmacy details to a csv file.

Thank you for reading this far. If you would like to see the source code, have a look here.


Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form