To be and not to be – the uncertainty principle in SAS

32

Uncertainty Principle blackboard

If I were to say that we live in uncertain times, that would probably be an understatement. Therefore, I won’t say that. Oops, I already did. Or did I?

For centuries, people around the world have been busy scratching their heads in search of a meaningful answer to Shakespeare’s profoundly elementary question: “To be or not to be?”

Have we succeeded? Sure. And in pursuit of even further greatness, we have progressed beyond the simple binary choice. Thanks to human ingenuity, it is now possible to have it all: to be and not to be.

But doesn’t this contradict human logic? Not at all, according to the Heisenberg uncertainty principle – a cornerstone of quantum mechanics asserting a fundamental limit to the certainty of knowledge.

According to the uncertainty principle, it is not possible to determine both the momentum and position of particles (bosons, electrons, quarks, etc.) simultaneously. Here is the famous formula:

where
Δx = uncertainty in position.
Δp = uncertainty in momentum.
h = Planck’s constant (a rare and precious number equal to 6.62607015×10−34 representing how much the energy of a photon increases, when the frequency of its electromagnetic wave increases by 1).
4π = π π π π (4 pi’s; no mathematical formula of any scientific significance can do without at least one of them!)

In addition, every particle or quantum entity may be defined as either a particle or a wave depending on how you feel about it according to the wave-particle duality principle. But let’s not let the dual meaning inconvenience us. Let’s just call them matters, or things for simplicity.

Then we can formulate the uncertainty principle in plain and clear terms:

Since it is impossible to know whether the position of a thing is X or not X, then that thing can be in position X and not be in position X simultaneously. Thus “to be and not to be”.

Capeesh?

There is an abundance of examples of the uncertainty principle in SAS software. Let’s consider several of them.

History of the present and present of the history

Some of you may remember SAS version 7.0. It’s remarkable in a way that it was the shortest-lived SAS version that lasted roughly one year. It was released in October 1998 and was replaced by SAS 8.0 in November 1999. There were no 7.1 or 7.2 sub-versions, only 7.0.

But (and this is a big BUT), have you noticed that even today the latest SAS products (9.4 and Viya) use the following version 7 file extensions?

  • .sas7bdat – SAS data set
  • .sas7bvew – SAS view
  • .sas7bndx - SAS index
  • .sas7bcat – SAS catalog
  • .sas7bmdb – SAS multi-dimensional database file
  • .sas7bdmd – SAS data mining database file

. . . and this is just a partial list.

When you define a SAS library with v9 engine
libname AAA v9 'c:\temp';

SAS log will indicate:
NOTE: Libref AAA was successfully assigned as follows:
Engine:        V9
Physical Name: c:\temp

Notice how it’s SAS Engine V9, but SAS datasets created with it have .sas7bdat extensions.

Where do you think that digit “7” came from? Obviously, even almost two decades after version 7.0’s demise it is still alive and kicking. How can you explain that other than by the uncertainty principle: “it is while it is not”!

Transience and permanence

Let’s take another example. How long have you known the fact that in order to create a permanent SAS data set you need to specify its name as a two-level name, e.g. LIBREF.DATASETNAME, while for temporary data sets you can specify a one-level name, e.g. DATASETNAME, or you can use a two-level name where the first level is WORK to explicitly signify the temporary library. Now, equipped with that “settled science” knowledge, what do you think the following code will create, a temporary or a permanent data set?

options user='c:\temp';
 
data MYDATA;
   x = 22371;
run;

Just run this code and check your c:\temp folder to make sure that data set MYDATA is permanent. Credit for this shortcut goes to the option user= . Now we can say that to create a permanent data set we can use a two-level name or one-level name, which makes it indistinguishable from temporary data sets.

To bring this uncertainty to an even higher level, you can drop MYDATA name altogether and still create a permanent data set:

options user='c:\temp';
 
data;
   x = 22371;
run;

SAS Log will show:
NOTE: The data set USER.DATA1 has 1 observations and 1 variables.

Isn’t an ultimate proof of the “to be and not to be” principle (sponsored by DATAn naming convention)!

In addition, you can create a data set by defining its physical pathname without even relying on SAS data set names, whether one or two-level:

data "c:\temp\aaa";
   x = 22371;
   format x date9.;
run;

This code runs perfectly fine, creating a SAS data set as a file named aaa.sas7bdat in the c:\temp folder.

And I am not even talking  about the NOWORKTERM option (well, I am now) which preserves all the SAS files and directory of the temporary WORK library at the termination of a SAS session, which essentially makes temporary SAS files permanent.

As you can see even “well settled science” crumbles right in front of your eyes under the certainty of the uncertainty principle.

Uncertainty principle: Final Exam

And now, ladies and gentlemen, you will have to pass your final exam to receive an official April Fools diploma from SAS University.

Problem to solve

You know that every SAS data step creates automatic variables, _N_ and _ERROR_, which are available during the data step execution. Is it possible to save those automatic variables on the output data set?

In other words, will the following code create 3 variables on the output data set ABC?

data ABC (keep=MODEL _N_ _ERROR_);
   set SASHELP.CARS(keep=MODEL);
run;

If you answered “No” you get 1 credit. If you answered “Yes” you get 0 credit. But that’s only if you answered the second question (I assume you noticed that I asked two questions in a row). If your “Yes”/ ”No” answer relates to the first question your credits are in reverse.

Bonus for creativity

However, if you not only answered “Yes” to the first question, but also provided a “how-to” code example, you get a bonus in the amount of 10 credits. Here is your bonus for creativity:

data BBC;
   set SASHELP.CARS(keep=MODEL);
   x = _n_;
   e = _error_;
   rename x=_n_ e=_error_;
run;

You still have to run this code to make sure it creates data set BBC with 3 variables: MODEL, _N_, and _ERROR_ in order to get your 10 credits vested.

Problem solved = problem created

And lastly, the final curiosity test and exercise where you find out about SAS’ no-nonsense solution in the face of uncertainty. What happens in the following data step when the SAS-created automatic data step variables, _N_ and _ERROR_, collide with the same-name variables brought in by the previously created BBC data set?

data CBC;
   set BBC;
run;

After you complete this test/exercise and find out the answer, you can grab your diploma below and proudly brag about it and display it anywhere.

SAS Institute diploma

WAIT! Before you leave, please do not forget to provide your answers, questions, code examples, and comments below.

More April Fools’ Day SAS articles

April 1, 2020: Theory of relativity in SAS programming
April 1, 2019: Dividing by zero with SAS
April 1, 2018: SAS discovers a new planet in the Solar System
April 1, 2017: SAS code to prove Fermat's Last Theorem

Share

About Author

Leonid Batkhan

Leonid Batkhan is a long-time SAS consultant and blogger. Currently, he is a Lead Applications Developer at F.N.B. Corporation. He holds a Ph.D. in Computer Science and Automatic Control Systems and has been a SAS user for more than 25 years. From 1995 to 2021 he worked as a Data Management and Business Intelligence consultant at SAS Institute. During his career, Leonid has successfully implemented dozens of SAS applications and projects in various industries. All posts by Leonid Batkhan >>>

32 Comments

    • Leonid Batkhan

      Thank you, Anne! While trying to make it funny, I also tried to make it very serious... Yes, there are some Sasensei questions that I used in this article to illustrate my points. Although I am not quite sure where they appeared first 🙂

  1. I agree with Marinela (April 1, 2021 2:11 pm) ... most creative post ever!
    As usual, thank you Leonid!
    You have the best sense of humor - loved the joke!
    Keep these blogs coming! I might learn SAS afterall!
    🙂

  2. Thank you, Leonid, for such brilliant insight into uncertainty in SAS realm!

    For me the upshot of this humorous discourse into uncertainty in our lives is this - we may never be able to figure out what is going on with every dataset we deliver, but as data specialists we are still responsible for providing the data in the best cleanest way possible.

    • Leonid Batkhan

      Thank you, Azat, for your insightful comment. Interestingly, how you projected this humorous mental exercise into practical positivity. You hit the nail on the head: despite all the uncertainty of the world SAS software and SAS users invariably deliver quality and reliable results.

  3. Thank you very much, Leonid! And nice comment, Allan! 🙂 I was caught by the last exercise (shame on me, but good surprise!)

  4. One of the best blog, to have come across, I never knew that sas version 7 was the shortest lived sas version and actually upto date the 7 appears almost in every sas extension.

    Probably we gonna have sas extensions update to match the sas version.

    • Leonid Batkhan

      Thank you, Daniel, for your comment. I don't think though we need to match SAS files extensions to SAS version. Think about it from the position of backward compatibility. If we keep the same file extensions,then new SAS versions will still work fine with older data. Otherwise all the data will have to be migrated to a new version/extension.

  5. Leonid,
    Nice set of blog posts. I especially like the one link at end of the post; dividing by zero. In the 1990's I was using SAS on an IBM370 (or maybe 360) mainframe. A consultant told us that the main problem with run times is to divide by 0.

    My favorite interview question was on using the MIN and/or MAX functions. The MIN function will return the non-missing minimum value. Question I would ask an applicant was, what would you replace the MIN function to get a missing value returned. My solution, still today, is listed below but you need to be careful if you have more than 2 variables to the right of the equal sign:

    A = B >< C;

    if B and/or C are missing, A=.;

    Thanks. I learned a few more functions and syntax today!

    • Leonid Batkhan

      Thank you, Jonas, for your comment. I have to admit that if you were interviewing me, I would not pass the interview as I did not know the min operator ><. I would probably come up with something like this though:

      A = ifn(nmiss(of X1-X10),.,min(of X1-X10));

  6. "NOTE: The variable _ERROR_ exists on an input data set, but was also specified in an I/O statement option. The variable will not be included on any output data set."

    Fantastic article! I'd love to know more about "I/O statement options"!!!

    On the uncertainty of `_n_` and `_error_`. It's also possible to save (and re-instate) values from _previous_ observations, by storing / restoring them using hash tables.

    The following macro illustrates: https://core.sasjs.io/mp__prevobs_8sas.html

  7. Adding to the fun, a not-to-be-named SAS R&D "scribe" of nearly ancient extraction points out the following. The '7' in those file extensions might be interpreted as the seventh (and therefore perfect) evolution of file format internals developed at SAS. They universal total quantity of files having such extensions is now measured between Petabytes and Exabytes.

      • Andrea Zimmerman on

        ODS
        Output Delivery System, pretty much my fave SAS coding tool. I think it might have been in v7 as experimental, but it was fully operational in v8. Allowed me to create really great PDF, RTF, and eventually Excel and PPT output directly from SAS. It also gave us ways to grab SAS tables that were in the LST file and input them to more SAS code.
        QED

        • Leonid Batkhan

          Thank you, Andrea, for this history excursus. Your last sentence also suggests that ODS serves as IDS which is in full compliance with uncertainty principle (Output=Input).

  8. Chris Hemedinger
    Chris Hemedinger on

    Before SAS Version 7 we had a long series of Version 6 releases, reaching into the double-digits with 6.10, 6.11, 6.12, and the very targeted 6.14. No 6.13? I guess release managers held some superstitions after all. Or maybe v6.13 does exist, but it's locked in the box with Schrödinger's cat.

    • Leonid Batkhan

      Great history addition Chris! With uncertainty principle the possibilities are endless. Not only v6.13 may exist or be locked in the box with Schrödinger's cat, it may be the Schrödinger's cat itself.

        • Leonid Batkhan

          This reminds me of an old Jewish story:

          Two neighbors were having a financial dispute. They couldn’t reach an agreement, so they took their case to the local rabbi. The rabbi heard the first litigant’s case, nodded his head and said, “You’re right.”
          The second litigant then stated his case. The rabbi heard him out, nodded again and said, “You’re also right.”
          The rabbi’s attendant, a boy who had been standing by this whole time, was justifiably confused. “But, rebbe,” he asked, “how can they both be right?”
          The rabbi thought about this for a moment before responding, “You’re right, too, boy!”

Leave A Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to Top