Monday, January 09, 2023

The wonder of a Database and SQL.

They say the wonders of the world are African Pyramids and Indian Rivers... But no Miracle was equal to when I discovered SQL...


Suddenly I didnt have to mess with type-defs and linked lists anymore...

Suddenly I didnt have to worry about working-memory space anymore...


SQL allowed me to 

SELECT exactly what I needed 

FROM tables 

WHERE I could JOIN, link or combine the data, 

AND I could filter on conditions, and also immediately

ORDER it BY whatever I fancied.


I could even pre-format it in the select-list. My poor FORTRAN program at the time became a PRO*FORTRAN program and it lost some 50% of its 10.000 lines...(and pre/compiled/linked a lot faster).

We used to open flat-file data-sets with WMO (weather) data....

Suddenly that database did more than half the work for me.


That was a Good Time. It felt a bit as "on credit of the RDBMS", but it was Heaven and it lasted forever. 30 years later, I still use SQL. 

SQL!  





 


Monday, January 02, 2023

Operating and Diagnosing an RDBMS.

 This is one of those "Back to Basic" posts, but one with a good number of years and projects of Experience written into it.


I’ve been playing a little with, Postgres (again) and Yugabyte (link), notably the explain-plan of queries. Of course, I am looking with the biased eyes of an oracle-dba, and that comes with some prior knowledge and possibly a strong personal preference from my education in the big-red environment.


Several findings come to mind, here are the 4 items I think you need to be able to properly operate or diagnose an RDBMS:


  1. Know The Concepts, and have a mental picture of your RDBMS.
  2. SQL-access to ALL relevant information. 
  3. Measure Effort (and time)
  4. Establish “Normality”, your reference, your benchmark.

Anyone working with Databases, RDBMSes, especially as troubleshooter (“Consultant” if you wear a tie), should be aware of these items, and have them Mastered to some degree.



Let me elaborate on those..



re-1, The Concept. You need a mental model, a “concept description” of the underlying layers to be able to understand your RDBMS. For something as simple as :


select ename from emp where id = :emp_id ;


In this case,  you should realise that this query needs to be parsed and executed (different stages) and the execution-phase will traverse various “objects”. The RDBMS needs to use its knowledge of the emp-table, and decide to either scan the table or use an index if available. This Also means the RDBMS will have to read data, from memory or disk. And if a query is too slow, or not doing what you want, you need the background information to diagnose and fix your problem.

Hence you need some conceptual knowledge of your RDBMS.



re-2: SQL-access to all relevant is strongly recommended.  Am I biased in this? It is not just me: re-read the 12-rules of Codd! 

The best examples are the catalog-tables that expose how your RDBMS actually stores its objects. But you can dig much deeper using things like pg_stat_statements or by doing explain-plans of the query.


With the knowledge of the “concepts” (see above), you should be able to search for, and query, the relevant tables and views from the RDBMS to help you diagnose and fix your problem.


As the ex-oracle-DBA, I know I can use :

v$sql : for basic info about the SQL.

v$sql_plan : which steps were included in the explain plan 

v$sql_bind_capture : I can even find recent values of :emp_id.


With so much information available, I can even re-create some of the executions of the statement above (future link to demo...)


I want to stress the point of using the SQL interface here, over, for example, using dtrace or the examining+grepping of log- or tracefiles. In many circumstances, you will not have access to the backend where those files are. So apart from the “Codd-Correctness" of exposing all data via SQL, there is the practical need for an RDBMS to expose its elements and its workings as tables/views/functions.

(Note: this is something that may pose a problem for a multi-node system, a clustered system, or a serverless RDBMS : the work is done in many places, and the SQL-front-end may not have easy access to all the components, Yuga, Cockroach, pay attention here).



re-3 Measure Effort (and time). Don't make the mistake of just keeping a stopwatch next to your system.  Any complex IT system is notorious for inconsistent response times. It is relevant to measure the (elapsed) time, but that time is only really helpful if you also know “what happened” in that time. For example, a select query may be slow on the first retrieval, but faster on the next due to caching. Or an individual full-table-scan query may be fast when testing just because your test-data is cached and the system is idle, whereas under higher load, that response will slow down because the in-efficiency of the full-scan quickly deteriorates your system performance. Your hardware- or cloud-vendor will smile at you, though, as you will need more of his resources.

(this topic is worth a separate blog, sometime...)

(future: link to example...)



re-4 Establish what is "Normal”, of what can you expect from a certain system. This takes some time to get to know your environment, some time for logical-reasoning, and some experience to recognise if your observations are reasonable or not. What was “fast storage” in 1995, is now considered abysmal, and what was considered slow-network in 2005 is now considered Fast due to the system being half a continent away in some cloud-region.

I tend to look for typical elapsed times (my “normal” is : better than 1-10 milliseconds per SQL), and used-effort per sql (cost or block-reads per SQL) and some overall system metrics: percentage idle of the CPU should typically be 30% or better. But for some systems, and idle of near-zero can be the norm and that can be fine as well, as long as the users are happy and no degradation occurs.


The best examples I know for “normality” come from the Oracle-world: I can easily recognise a quiet looking OEM-graph, or a “regular looking” AWR report for databases that I am familiar with. 


And an anecdote about "normality": in my view, 5ms is a perfectly “normal” response-time for an employee lookup (actually a bit slow in 2022, but hey: blame network- and storage-latency for some of it…). Now, If a system has a legitimate need to do lookups of 1000 employee-records per second, then a 5ms lookup will need at least 50+ CPUs to do so. There is no magic bullet to fix this on a 4-cpu system. However, if the 1000-lookups are out of touch with the actual 2 ppl working on it, something else is wrong.



Why all this...


Well, I have some reflexions on this “basic stuff”, and some wishful thinking.


With single-server RDBMS, like conventional ms-sql, mySQL, Oracle or Postgres, we learned how they were built, most experienced DBAs know the concepts of their systems. And we can diagnose them and reason out where errors or bottlenecks are located. In the case of Oracle (which I am most familiar with), we can do that using purely SQL (e.g. I can diagnose a system by using only SQL-queries and the scritps that me and others have developed for the purpose of diagnosing).


Also, for those “known” systems, we have enough experience to say if something is “normal” or not (e.g. 2ms disk-response is probably normal, slower raises questions, faster shows good SSDs or efficient use us Cache).


For the newer systems, and notably the “serverless” systems we are in the process of learning this.

And if I am allowed some pedantic advice to to the builders of those systems: Help us to find the 4 items mentioned. 



Here is what makers/vendors of an RDBMS need to think about:


Help us at item1, Concepts, by explaining your “concepts” clearly and in some detail. Documentation. 


Help us at item2, SQL-access, by making All Information accessible via SQL (yes, I know this is complicated if systems span multiple nodes)


Help us at item3, Measure Effort, by opening up the “effort underneath” via SQL (views). Good example is the Oracle “wait interface” where 99% of “time spent” by the system can be attributed to activity using views on v$system_event and v$session_event.

 

Help us at item4, benchmarking Normality, by allowing easy reporting and possibly adding some statistics-info to the RDBMS.



(Disclaimer: disillusioned-old-oracle-DBA here…)


It took Oracle some 10-15 years (oracle version 6 and 7, around 1994) to stabilise their concept, document it in a “concepts guide” and to seriously open up and publish the v$ views, although the original bstat/estat scripts go back further. I’m expecting the new breed of RDBMSes, especially the ones that are Open Source and claim to be Planet-Scalable, to do that a little speedier.


TBC...


Wednesday, September 28, 2022

Plans for Autumn 2022 - Part 2.

After a Successful Ride-Around to visit POUG, visit an Austria-Meetup, visit Friends all over, and to attend the Massively Large DOAG-Konferenz-und-Ausstellung, here is Part-2 of the Autumn-travel plan:



This time I want to visit :

Lux-OUG conference on 05 Oct.  (agenda)

JavaCro in Rovinj, starting 09 Oct.

HrOUG in Rovinj, starting 11 Oct.

And I am hoping to meet various Friends who are willing to provide coffee on the way.


Here are the Logos + links 

(because organizers love to see their logon on blogs and announcements...)


Luxembourg:



JavaCro:



HrOUG:



Note: The two Croatian Conferences are deliberately held in 1 location and in the same week: some speakers/attendees (and the organizers) will be Very Exhausted after a full week of those Famous Croatian Conferences...

Furthermore, I hope the Weather isn't too bad because I nearly got to DOAG late due to some early snow in Austria...



Later this year, in December, there is also the UK-OUG conference, and I will probably go there by airplane, because "December".

Hope to see (some of you) at one or more of these events! 





Sunday, August 28, 2022

Plans for Autumn 2022

 That season is approaching: Autumn Conferences ! 

And here is The Map with the first half of "The Plan":



Here is my list of Conferences:

POUG, Wroclaw - 09-10 Sept. Always proud to be selected to speak there.

PG-Day Austria, Vienna - 16 Sept. Let's go visit and meet ppl.

DOAG, Nuremberg - 19-24 Sept. The Big Event, comes early this ear.

JavaCro + HROUG - 09-14 Oct. Two conferences in 1 week in Rovindj.

And there will be the UKOUG "Breakthrough" event in Birmingham in December of course. 01-02 Dec.


Map above is the approximate route-plan for the first part: POUG, PGday, and DOAG, with some visits to friends mixed in.

Main reason for this post is of course, as always, that I'm looking out for coffee, wifi, a place to connect, or just interesting places to visit on the road.

(links + logos? Do I really need to post links and banners to make my content more relevant? Do I care ? ... )




Tuesday, April 26, 2022

Plans for Spring 2022

 Because Covid is over and Conferences are re-starting, I can finally plan a Real Trip again.

Via lunch/coffee/visits to friends in Germany, Austria and Slovenia, I will be on my way to the combined conferes of JavaCro and HrOUG from 15-20 May.



Then a short (working) holiday and there is MakeIT in Portoroz.




After that: there are some meetups scheduled, notably RoOUG in Bucharest on 14 June (link soon).

And on the way back I hope to visit several more friends for coffee.



Sunday, August 29, 2021

About ACE program, Conferences, and Dev-Rel

 The oracle “community” has been buzzing about the ACE program.

Firstly I want to Welcome all the new+promoted ACE-es. I know some of you personally, and I am glad to have you on board. Hope to meet in person (again) soon.


But given some of the discussions I’ve seen, I’ll briefly do 4 statements, will elaborate each of them in text below and end with some final thoughts. My aim is to help the discussion, without popping a load of less-relevant text into tweet-threads or slack-channels.


Statement 1: Neutral. 

I want to encourage objective, independent information about IT, tools, products, usage, practices etc…  And I want to stand with the user/developer/sysadmin, not with the vendors. But I am OK to provide feedback and suggestions to vendors, some  of my friends do Good Work in the employment of vendors.


Statement 2: In-Person. 

I want to encourage in-person meetings, of both large and small groups. Communication is simply more complete when done in person. All other networking (online, stackoverflow, youtube, zoom etc) will also just happen and grow. But I dont want to comment too much about "online" now.


Statement 3: No bribes to speakers.

As an independent conference-customer and sometimes speaker, I dont want money from a Vendor, not even “travel and lodging”. As such, I am Happy with oracle cutting the travel-funding for ACE-Ds.


Statement 4: Transparancy. 

A Vendor-sponsored event will be biased, but I may still come to the event. I want organizers to be transparent about their funding. A sponsored-event can still provide good opportunities for in-person meeting and exchange of real information. A clever and wise vendor will understand.


So far the TL;DR part. I will elaborate for the curious in text below.




The Elaborations...


Caveats:

 - Some ppl will disagree. Please Keep exchanging Ideas, politely.

 - I make assumptions, and the usual ass-u-me applies, so be it.

 - I have my own “desired state”, and I probably reason towards that.



Stmnt 1 Neutral.


This one, Neutrality, is the most difficult; What is Objective, neutral information ?


In the IT landscape, “neutral” is probably whatever you + I say is “neutral”. 

But I would add the criterium that the information given is verifiable, demonstrable, or measurable.


Where do you find Neutral information ? 

From users /Developers ?

From testing and benchmarks (can of worms….)?

In University and Scientific institutions ? 

In the market : whichever product survives the longest, or sells the most licences is the bests ? 


This "neutral" item needs expanding, maybe later… Compare for example how the pg-day organisation tries to enforce rules and transparency for conferences under that brand. Or the Oak-Table-Network of “scientists”. I think the oakies had some good criteria, but also a self-proclaimed high consumption of alcohol, and a vetting process for members that could depend on "who pays for drinks tonight”.



Stmnt 2 : in-person meetings.


I specifically want to encourage in-person meetings, because I think the in-person communication is the most complete. As someone said during dinner in Vienna: “We could never have discussed all this over zoom”.

In-person needs encouraging, sponsoring, because all other forms of information-exchange will simple happen anyway. There are very few restrictions on creating online-groups, forums, listings and more. In Database-land, Twitter seems to be the platform of choice, followed by Stack-Overflow etc. The moderations of blogs, aggregators, forums and mailing-lists is a different topic, but I will try to subscribe to “neutral” ones as well as to vendor-driven ones.


In-person or “Physical” conferences and meetup need some sponsoring because venue, food, transport, lodging all require a budget. And this is where Oracle “helped” in the past: they did not only send “employees”, but they used to sponsor some of the other speakers as well, the ACE-Ds. 


My controversial stmnt here is: Only sponsor Organizers, don't Sponsor Speakers. See als stmtn3: don't pay the ppl, pay the event. 


I also think that only “small events” need explicit support. Larger, professional events, (UKOUG, DOAG, Buildstuff) do not need a lot of  artificial “vendor sponsoring”. They are able to draw sufficient crowd and sponsors to run on their own. The smaller events act as proving-ground and training-space for the larger ones. For this reason I want to visit both types of events: small meetup/groups and large conferences. Compare it to a football-fan who likes to see both the local amateurs and the professional teams, and who knows the sport can only exist with both types of clubs.


From this reasoning, it follows that the attendees (delegates?), will have to pay some fee to participate in most events. This means not everyone will be able to visit, but it also means you end up with the more motivated, and spending-power audience. A free event (as in free beer) attracts way too much freeloaders like myself: I only go to conferences for the food and drink.


For small events, it may be as simple as paying for your travel and own food+drink at the venue, but some meetups/user-groups would definitely benefit form some dev-rel-funding or marketing-budget (e.g. the old oracle ACE-funding).


Supporting evidence:

Firstly, the ppl that met in-person before Covid still seem to form a fairly good network, and 

Secondly, the existence of both commercially and voluntarily organised, physical conferences proved there is a market, a demand for them.  My intention was generally to seek-out the more “neutral” and serious conferences and go there to meet ppl and learn information.

In the Postgres world, I know the local meetups in NL were quite successful. But they benefitted from a few large “installed base” customers that made venue and food available for meetings of up to 100 ppl. Those “users” noticed the direct and indirect benefit of keeping a user-community alive, even if this did not bring them immediate measurable revenue (oracle + cstmers take note - I never see a large “financial institution” hosting an OUG of 100 ppl!)



Stmnt 3: No Bribes. Don't pay speakers, pay the (small) events.


When a vendor pays (or reimburses) the speaker and this is not explicitly clear (e.g. ACE-D funding), the message becomes biased. Even if the speaker tries hard to remain neutral. If the Organizer (and thus the audience) pays for the speaker, it is a more transparant and more a system of demand+supply, and therefore more Neutral, IMHO.

I would try to ensure that the whole setup of a conference stems from the demand + supply. The more a vendor pays (visibly or invisibly), the less neutral the event is going to be. I accept that some organizers will pay for “rock star speakers” (Hoogland, McDonald, Lewis etc…), because those speakers draw in the crowds, and that is fine. But if the money comes from the event or from the attendees, there is more room for neutral or critical views, and possibly better information, just saying…



Stmnt 4: Transparency. 


vendor-organised events are biased.

Hence, please make it clear when a vendor pays for the event or for some of the speakers.

This doesn;t need much elaboration. At least be transparant: you want to know who is paying for the event.

But I might still use the opportunity to meet with like-minded ppl, and to gain as much information as I can. There may still be a whole off-stage or informal gathering going on at the same time (e.g. the Tap-n-Spile in Birmingham during UKOUG).



Final thoughts.


Looking back at it all... It comes down to : Honest and Neutral information, and “meeting ppl”.

For “honest information”, it may mostly be a matter of educating IT staff to think critical, ask quesitons, and put the interest of user/customer first.

But Also, to help see through the BS (Who remembers : You probably don't need…. ).

For meeting ppl: You will always depend on funding and motivated ppl. But I am of the opinion that transparency and “serendipity” can only occur if ppl can see each other in the eyes from time to time.


Of course, I also wrote this whole text to improve my hopes/chances of riding around europe by motorcycle, visiting events, meeting ppl…

(insert picture of motor with view over Silezia: Some ppl only go to POUG for the beer… )





And if you are a new or promoted ACE: Welcome again. 

Dont pay too much attn to me or other dinosaurs and Do Whatever YOU Want to do with the ACE-status. 

Hopefully we can meet somewhere over coffee, food or a drink.

Thursday, June 10, 2021

You can Speak at IT conferences

Conferences are Great - you should go (and Speak)



Conferences are for getting out of the (home)office, to discuss topics, meet people, and  learn new things, ideas, concepts, trends. And for the food+drink, although that varies from place to place.

And the best way to get to a conference is ... to do a Presentation.

Especially when I had to "ask my boss", it would always Greatly help if I could say: "I'm Speaking at XYZ, can you please budget the trip+time" (maybe more on that later).


So... What to Speak about? Here is one of my best 3-step tricks to find a topic:

1. Find the biggest Obstacle you had at work in the last 12 months (e.g. learning python, creating K8s pods, Designing your Datamodel, arguing with your architect..). That can be your Topic.

2. Now write down what you want to Tell / Warn / Laugh to others about (The Quirks of Python, Yaml/Ansible-syntax, how to manipulate the architect). This will be your "Message".

3. Turn that into slides (max 25), notably with a clear Conclusion. Give your listeners a "Take Home Lesson" at the end.

There you are!


My reasoning behind this is multiple: 

First, you will be a better speaker if you have a grain of Passion, a Mission. Something you Really Want to Communicate. And even better: something you have Experienced Yourself. 

Secondly, you probably were not the only one with whatever challenge you had. Others will have been in the same situation and will recognise it. Those will be your Audience, and they will spark the discussion afterwards.

Thirdly: Because you are "On a Mission" to convey your learnings, you will be better motivated, and you will more easily overcome any stage-fright your may have.


There is more to speaking, but  you will learn in the process of Doing! 

A good source of information is also the MASH program (link). And there are the practicalities, things like: 

Avoid boring slides (important, but less important than Your Mission)

How to work towards the conclusion (important, but less important than your intrinsic Motivation)

Use of Clip-art and moderate humour (important, not Essential to your Message)

How to determine your Tempo, Timing. Your first presentation will run-over, that is normal, and not a problem: Organizers will keep-time, or not. And getting late to other talks is (partly) the problem of the audience. Truly Motivated listeners may even remain behind, and harass you with Questions. The secret to not run over is.. a) do the presentation a few times, and b) remove irrelevant content (this can be hard - especially if you want to tell a "whole story")

Note: running-over is Totally Impolite because it creates problems for audience, for other speakers, and for organizers. If an experienced speaker goes over time, you can tell him off. But every beginner-speaker should be allowed to run-over (once, just once :-) ). If a "sales-pitch-speaker" runs over time: Ban Him (that is 99% of cases a him) and Shame his Product.


Oh, and about that food+drink: After the event, tell the organizers how good or bad their catering was. Some will learn, some wont.

Now go out and Enjoy!





 

This is the footer...and this should be small text for disclaimers and the like. and some small stuff

Locations of visitors to this page And this text is placed next to the map. we could possibly hide some stuff here too for the benefit of the search-engines and if it is at color ffffff cam we put all sort of rubbish that we do not want readers to see. travel itinirary reservation ticket agent flight plane boarding attendant train connection rail ticket wait time booking flight boardingtime taxi ramp luggage suitcase trolley wheely laptop bagpack corpaorate wifi connection oracle. it will also be interesting to see what happens when this wrap around. or even if we put in spherus and worwood as additional word.