How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Relax and enjoy The Lounge. For all Non-Joomla! topics or ones that don't fit anywhere else. Normal forum rules apply.
Post Reply
MyFirstPage
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 217
Joined: Thu Feb 25, 2010 4:37 pm

How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by MyFirstPage » Tue Nov 12, 2019 8:52 pm

Hi
A Simple Question. How does everyone here keep external Data Updated?
For example you write about Stuff that happen in your Area and you constantly have to Monitor Press Articles, Articles from the different Political Party, different Public Transport Company, Road Service,...
How to do this without a army of People?
Sure I saw many [youtube] Videos about Website Scraping and other stuff how "Researcher" aka Hacker Scraped a News Platform and Analysed parts of the Articles and showed how "right wing" the ared... :pop

I want to some different think for example just monitor the Opening Hours, Phone Number,... and looking for some Interesting [youtube] Videos who I can embed.
The same for Article from Political Party. We have in Vienna 23 Districts with (maybe I never counted them) 23 "local" Branches of each Party lets say 5 major Political Partys that mean 120 different Facebook Accounts to Monitor if some Buzz Words, Phases fall who I am interested into.

So I still mention before I saw some People who do this before but the refuse to say what kind of Software the used for. The only replay I got from the Internet was "learn a Script Language"....
Hopefully nobody here want bake a Cake or Cook a Meal those People would maybe replay with: "Lean how to become a professional Chef"... Help full? No...

 
User avatar
pe7er
Joomla! Master
Joomla! Master
Posts: 22420
Joined: Thu Aug 18, 2005 8:55 pm
Location: Nijmegen, Netherlands
Contact:

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by pe7er » Tue Nov 12, 2019 10:02 pm

First I'd look for an API to use.
If there isn't an API available, I would use a scraper script to get the necessary information from the page.
Kind Regards,
Peter Martin, Global Moderator
https://db8.nl - Joomla specialist, Nijmegen, Nederland
Co-developer of d2 Content https://data2site.com/joomla-extensions/d2-content

MyFirstPage
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 217
Joined: Thu Feb 25, 2010 4:37 pm

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by MyFirstPage » Tue Nov 12, 2019 11:14 pm

The former Austrian State owned Railway have an RSS Feed: https://fahrplan.oebb.at/bin/help.exe/e ... _news_oebb& who could be interesting to read out.
In Germany where often a Train get Broken there is a way to link some Date together to geth a Database of Broken thinks on there Train.
So I would need to get there Data into a DB to Work with later. Sadly I dont find any Information how to scrape it.

User avatar
pe7er
Joomla! Master
Joomla! Master
Posts: 22420
Joined: Thu Aug 18, 2005 8:55 pm
Location: Nijmegen, Netherlands
Contact:

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by pe7er » Wed Nov 13, 2019 12:22 am

MyFirstPage wrote:
Tue Nov 12, 2019 11:14 pm
So I would need to get there Data into a DB to Work with later. Sadly I dont find any Information how to scrape it.
I've done a presentation "Scraping your HTML site to Joomla" at JandBeyond 2017.
Maybe you could use my code to create your own scraping script: https://github.com/pe7er/scrape-demo
Kind Regards,
Peter Martin, Global Moderator
https://db8.nl - Joomla specialist, Nijmegen, Nederland
Co-developer of d2 Content https://data2site.com/joomla-extensions/d2-content

MyFirstPage
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 217
Joined: Thu Feb 25, 2010 4:37 pm

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by MyFirstPage » Wed Nov 13, 2019 10:06 am

Thx I "know" wget. The problem with that is I can, wont, need to download a full Site just the Information I looking for. Or maybe just search for a Phrase, Key Word on the Site.
Do you know https://imacros.net/ ?
When I ask in there Forum how to do more fisticaed Extraction the say "javascript".
I replay to them "how to make such a script?" the dont replay to me...

Lets say for example I need a Script who look if the phone Number change.
If the change the should write the Number into a Local DB and send me an Internal E-Mail for notifying me.

Another think I need to "parse" some Open Data from "there" Formats into mine Formate. In Austria we have 9 Federal states who use different formats. So that also mean I need to make there Data equal.

User avatar
pe7er
Joomla! Master
Joomla! Master
Posts: 22420
Joined: Thu Aug 18, 2005 8:55 pm
Location: Nijmegen, Netherlands
Contact:

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by pe7er » Wed Nov 13, 2019 10:29 am

MyFirstPage wrote:
Wed Nov 13, 2019 10:06 am
I can, wont, need to download a full Site just the Information I looking for. Or maybe just search for a Phrase, Key Word on the Site.
You've to look beyond the purpose of the scraper demo script that I've shared in my previous post. You don't have to download the whole site, just the one page with the changing information that you need.

I've written a scraper that visits every day a one-day-for-sale website. It retrieves the daily offers (one page!) and puts all the offers as separate records in my MySQL database. When certain specified keywords appear in the product names, it sends me an email notification with hyperlink to the original offer.
Kind Regards,
Peter Martin, Global Moderator
https://db8.nl - Joomla specialist, Nijmegen, Nederland
Co-developer of d2 Content https://data2site.com/joomla-extensions/d2-content

MyFirstPage
Joomla! Enthusiast
Joomla! Enthusiast
Posts: 217
Joined: Thu Feb 25, 2010 4:37 pm

Re: How to monitor, overwatch external Information from 3rd Party Sites? With Website Scraping?

Post by MyFirstPage » Wed Nov 13, 2019 10:54 am

I dont look into the thinks right now. I have sadly 100x other thinks to do and watch your [youtube] Video.
When you say scrape an Webshop do you use it to also catalogue it? I mean go out to all Product and look for new stuff?

The "main" Problem is for me to find something who can deal with "new" stuff like when a Menu Point is added the need to include them to. Or on hight Dynamic Sites where the Work with Wrapper so the div and other stuff change instantly.

I played around with Imacros but that thinks who sucked a lot.
For example here: (if you youse FF): view-source:http://wr.zahnaerztekammer.at/patientin ... enstsuche/
search for: "<div class="column small-12 medium-12 end">"
a good Script could just read out: <tr></tr> and dont need to traditional Scraping and write it into a local DB. I dont know how to make such a script.

Sadly I could give 100x samples for.

 

Post Reply

Return to “The Lounge”