CSC148 SLOG

Monday, March 31, 2014

Week 11 - Sorting and Efficiency

Hi there!

This will be my final post in the blog (so sad :( ). In this week's blog, I will be going over sorting and efficiency of algorithms.

Sorting...
If I asked you to sort a list of integers from least to greatest, how would you do it? Of course! You just - uh.... never mind. It's actually quite difficult to come up with an algorithm in your head because we're accustomed to compute seemingly simple tasks subconsciously.

When I was first asked to come up with my own sorting algorithm prior to being introduced to any of the basic ones, it was really difficult. I pulled a sheet of paper and started to write down an unsorted list of integers. Then, I used the process of elimination, starting with the lowest integer and I drew an arrow showing that I moved the lowest number to the beginning of the list. I then ignored the integer that I moved and went through the same selection process again.

It turns out that the algorithm that I came up with was selection sort. Like selection sort, there are other algorithms of sorting a list of numbers.

Below is a list of a few sorting algorithms, with a small description of each algorithm:

Bubble sort
Merge sort
Timsort - Python uses this algorithm.
Insertion sort
Quick sort

Efficiency and Time Complexity...
As there are many ways to sort a list of integers, some of the above algorithms can be quite slow, or rather, inefficient. How do we computer scientists express how slow or fast an algorithm is? One way is to use Big-Oh notation to do this.

If f(x) is some function, then O(f(x)) means that some function grows no faster than f(x). For example, for any natural number n, as n approaches infinity, n^3 grows faster than n^2. So, we can write this as:

In other words, we could think of that as the growth or the order of n^3 is an upper bound of the growth or the order of n^2.

How do we come up with these functions for sorting algorithms? One method is to measure the number of steps ignoring the size of the list.

I'll summarize a few types of functions representing the number of steps of any case.

Let c be any positive number, and n be a natural number.

Constant - f(x) = c. A sorting algorithm's behavior is constant if the algorithm performs the same amount of steps regardless of input. This is undoubtedly the best possible performance for any algorithm. We generally represent an algorithm like this with O(1) (it doesn't matter what number we choose for a constant function since we're representing growth. Since f(x) = 1 doesn't grow as x approaches infinity, it's the same case for any other positive constant. So something like O(2) is also acceptable).

Logarithmic - f(n) = c lg n (where lg n is log n, where log is base 2). You would usually see this for any "divide and conquer" algorithms. log n is the number of times you can divide n in half before reaching 1.
Linear - f(n) = cn. A for loop that grows linearly can be represented as O(n). For example, for i in range(1, i + 1) grows linearly.
Quadratic - f(n) = cn^2. A for loop that is O(n) that has another for loop that is O(n) can be represented as O(n^2). A for loop that has any sort of multiplicative increment may also be considered as O(n^2), but it depends on the increment itself.
Cubic and Exponential - f(n) = cn^3, f(n) = ca^n, where a is some positive number. You might see this as a mix of linear and quadratic loops.

So how do we represent the 'worst case' performance/complexity (how long it takes to execute an algorithm given some input) using big-Oh? Well, we simply observe the number of steps that some algorithm takes regardless of the size of the input, and we use our knowledge to construct a function that represents the number of steps. Eventually, we will get some function representing the number of steps of an algorithm that can have one or more terms.

For example, let f(n) = n^3 + n^2 + 2 represent the number of steps given an input size n. As n -> infinity (because the worst case is usually the result of a large size of input), we could see that n^2 + 2 becomes less significant, while n^3 takes more precedence, hence we say that this function is O(n^3).

However, big-O is usually associated with the "worst case" scenario, however, this isn't always true. O(n^4) is also a valid representation of f(n), but it's not exactly the tightest bound, and you can think of O(n^4) as an overestimation of f(n).

Big-O simply represents an upperbound of f(n) and not always the worst case scenario time complexity of an algorithm! Remember that!

Then we could get into big omega, which represents a lowerbound and big theta, which represents tight bound of some time complexity of an algorithm. However, that's outside the scope of this course, and you shall discover it without me!

It's been fun. I hope you have enjoyed my SLOGs and I wish you well in your computer science careers!

GR

Friday, March 7, 2014

Week 8 - LinkedLists and Binary Search

Hi there!

This week, we have learned more about linked lists. I haven't explained in depth what LinkedLists were in my previous blogs, but that's because we haven't learned about it in depth until now.

What is a LinkedList? Well it is as it sounds. Put simply, it is a series of linked nodes. I have went over nodes already, but I will explain what they are for any newcomers. A node is simply a part of a LinkedList. That is, a node has some sort of data, or cargo, and possibly a single link to another node . The confusing part of a LinkedList is understanding how they are designed.

Note: The alternate term to 'link' is 'rest', but I personally like 'link' better.

The first time that I have attempted to recreate a LinkedList was quite troublesome. This is because I did not understand how to store more than one node containing some data without using Python's primitive list data type. I have only encountered this problem because I did not see that the "links" of the LinkedList are just other LinkedList nodes.

A LinkedList node will have some cargo or value containing some data, and children as its attributes. To continue the "chain" of data, the child will simply be the next LinkedList node. If there's more data, then the child will have it's own child. Then the child's child will have another child with additional data, and so on. Eventually, the last child of a node will point to None if there's no more data (that's so sad...).

The functions of a LinkedList node would involve the procedure of adding more 'data' or LinkedList nodes, usually called "prepend" (adds to the beginning of the chain, which is counter-intuitive), a method to remove a node at the end of a LinkedList, called "decapitate" (sounds quite intrusive), inserting nodes, deleting nodes and reversing nodes (those method names should be obvious and self-explanatory).

The last thing I want to mention about is binary search, which is a searching algorithm. Basically, this type of search starts at the middle node of the LinkedList, creates an upper-set and a lower-set of nodes (where upper-set goes towards the end, lower-set goes towards the beginning), and each node is checked above and below until some data is found. In your mind, imagine a ball that bounces higher and higher each time it drops. Then imagine a trajectory of the ball and imagine a horizontal lines showing the symmetry of each drop and bounce. That is essentially what binary search does. It starts in the middle of the list and it works its way out both ways.

Anyways...

I can't believe that the course is nearly over! This course is still quite easy to me, and I hope it stays that way.
I really need to prepare for those exams...

That's all from me today!
Bye for now!

Friday, February 28, 2014

Week 7 - Recursion

Hey there!

Today, I feel like I am a computer scientist! Today, you will read about recursion. Recursion? Again? Really? Yes, I know the recursion sections of some of my posts seem to be just rewrites of the other, but if you didn't notice, for every entry on recursion, I provide more information as I understand it more.

This will likely be the first and last entry going in-depth about this, as I can confidently tell you that I now understand how recursion works (wow, I must be awesome)! Alright, let's begin.

What is Recursion?
Recursion is when you call a function or a method inside itself (in simple terms). The purpose of this varies, from 'unwrapping' a nested list to perform functions on its elements. It's rather easy to analyse a recursive method or function, but occasionally difficult to come up with your own to solve a problem because it's counter-intuitive. If I told you to create a method that prints the string "spam", n amount of times, where n is some natural number, then you will probably use some sort of loop that involves n as a terminating condition. You will write something similar to the following:

However, this method can be written using recursion:

In the above example, we have completely avoided any loops without ruining the functionality. When you are working with larger-scaled data types (such as nested lists), you can in most cases significantly decrease the number of lines of code when iterating over them by using the above method. The above example is simplistic, so the change in the number of lines between the two examples is negligible.

Problems
Recursion does have its own set of problems. For example, it does take an ample amount of time to come up with recursive methods for seemingly large functions, and so, developers usually tend to avoid them. With recursion comes the RuntimeError due to incorrectly designed recursive methods. It is raised as a safeguard to prevent a stack overflow, preventing the Python interpreter from tracing a recursive call after a certain amount of times. You can change the amount of times of recursion tracing depth if you would like (the developers of Python assume that we are all smart and kind people), but note that if this safeguard was not there or prevented, then your program may cause a segmentation fault (accessing read-only memory illegally), meaning that your program will crash, and in extreme cases, your computer will too. In order to prevent this problem, I suggest that you create a robust method on paper and test it thoroughly. Make sure you understand how a function works in the first place before you attempt to remake it. Remember, it isn't necessary to make a function recursive. Stick to my motto: "Do not fix what isn't broken - unless the professor says so."

I'm glad that I understand recursion now, and I hope you also have an idea of what it is as well. I'll see you next week!

Bye for now!

Thursday, February 13, 2014

Week 6 - Tree Traversals, Recursion with the Tree ADT

Hi there! This week's post is going to be about traversing the tree data structure. When I think about the tree data structure, I think about a more detailed version how you may draw a tree as a young child, or a family tree diagram. You would start with a leaf and then put two 'branches' below it, with each branches containing another leaf that has two branches (it doesn't have to be just two branches, it can vary). In computer science, we say that a node (like a leaf in real life) is something that contains a value and possibly some labels pointing to its children (similar to a leaf having branches with other leaves attached to the end of them). The superseding node of a given node, if it exists, is known as the root, or the parent node. You can think of the tree ADT as a series of smaller sub-trees connected to each other such that there are no paths that form loops (or cycles).

There's a lot more terminology that I can go through, but I believe that it wouldn't be difficult to understand what I'm about to explain: the tree abstract data type (ADT). Every node in a tree has a cargo, which contains some sort of data, and labels or links to other nodes. Sometimes, other nodes can be the roots of other trees - nested trees. We've also learned about methods of traversing trees - pre-order and in-order traversal. When traversing a tree of integers to find a sum of all of them, we could use recursion by using 0 as the base case if there isn't a tree on the left or the right of any given tree node. For example, given a method called sum, and some TreeNode class that has instance variables, left, right and cargo; we can find the summation of all of the cargo given that the cargo of each node is some integer. We add sum(tree.left), sum(tree.right) with the tree's cargo given the base case that if the tree is None, we return 0. Whenever you are done traversing the tree completely on the left with sum(tree.left), you are left with '0 + 0 + tree.cargo' where tree.cargo is some integer. And you keep returning it to the recursive caller, so the effect is that you keep adding tree.cargo for every tree (interesting, right)?

I feel that we are now becoming more like computer scientists. I could only imagine that we are approaching searching algorithms to ADTs, such as depth-first search - this course will only become more interesting...

Until next time,

GR

Thursday, February 6, 2014

Week 5 - Recursion Continued, Namespaces, Scopes, and Unit Testing

Hey there! I do have some good news for those who might care about my life: I have a rather large assignment due next week and I've just barely started it, even though it was assigned a week ago - there goes the spare time that I don't have!

This week, we have learned about scopes and namespaces. Our lecturer has done this by placing 'spam' strings by changing the value of a variable called 'spam' via the different scopes, which are local, non-local, and global. Local variables are usually created and accessed within the first 'occurrence' of a method header and its return value. Non-local variables are usually created and accessed within a superclass or the top most method if there are nested methods. Global variables are usually defined outside a method and it's made accessible to all methods defined in a class or in a Python file. I find this quite easy to understand, and it's self explanatory if you have practiced with it enough.

We have went deeper into testing our modules. We were re-introduced to unittest (assuming most of us came from CSC108), except to much further detail. We were made aware that there is a module called 'doctest', and in this module, there is a function called 'testmod', which will test all methods'/functions' docstring examples to see if we get the expected results. Instead of always relying on docstring examples to help us, there is a unittest module that we could use to test out a class. In order to use it, we have to use the TestCase class found inside the unittest module. Once we inherit this class, we write methods that will load up an instance of a class, and then we write code for each and every method or function and we have to see if we get an expected result. This is done by making an assertion, otherwise known as the keyword in Python, 'assert'. An assertion takes in a boolean statement then raises an exception which implies that we haven't written some method correctly. This class is useful if you wish to analyze a method thoroughly, step-by-step, however I personally find in most cases, we write 'small' programs, so the 'doctest' module usually suffices.

We have also touched up on our first assignment, the Tower of Anne Hoi (similar to the Tower of Hanoi), where a person is expected to move cheese blocks sequentially from stand to stand until you form a 'tower' where the cheese blocks go from smallest to largest from top to bottom. It was mainly just to show us recursion techniques in the program, but I do find this assignment to be a bit intimidating, or rather simply time consuming.

This will wrap up my blogging for this week, hopefully I can get this assignment done, especially with exams coming up as well in the same week!

Week 4 - Recursion on nested lists, code/algorithm testing

Hey there! It's been a while since I've made a post, so here it is! I will feature a double post today to make up for last week (for last week and this week).

Week four has been a huge step up from week three, as now I feel as if the class is heading more towards the algorithm direction of computer science rather than software engineering.

When designing a method or a function, it's a good idea to test out your methods rigorously. From what I remember from CSC108, which is considered as an "optional" prerequisite course, we have learned the function design recipe, and from that 'recipe', it dictates that we should come up with examples after you come up with the method header and type contracts. I found this rather counter-intuitive at first, as I had previous experience programming prior to the start of my post-secondary education, so I usually become lazy and just go on ahead with the coding. That only led me to problems. However, the examples that were given in class weren't that difficult to follow, and I did find it to be 'common sense'. To thoroughly test a method, you would need to come up with what I like to call, counterexamples. In other words, you have to come up with some sort of parameter inputs where it can possibly break your method. I find this to be the hardest part of designing any method or function, as I usually get a hunch that my code is almost always 'broken' in some form.

The main topic of the lecture was recursion. In Week 3, we have just lightly touched up on it. This week, we have went deeper into it. Recalling previous experience from my high school days, I remember the topic of recursion as the most disliked concept in my computer science class. Recalling from my previous post, recursion in a nutshell is calling a method or function inside itself repeatedly until a desired value is achieved. I think that this is the most challenging concept to master. I imagine that Euclid's greatest common denominator recursive algorithm took him months just to come up with it - it's quite a bit of an exaggeration of course, but I hope you understand what I mean. Tracing recursive methods are not as bad as creating them. They're actually quite easy to follow, as you can simply use your understanding of Python to understand what's happening at every step.

Reading Flexy's SLOG for week 4 (link below), he or she says that most CSC148 students haven't encountered recursive functions prior to the start of the course. I do believe that this is almost irrelevant, as I am almost certain that people with experience for recursive functions still have trouble with creating their own. As my lecturer pointed out, most programmers actually do not use recursion - I imagine it's because of the difficulty of creating one.

I think this will wrap up my Week 4 thoughts.

Here's a link to Flexy's SLOG: http://flexographyslog.blogspot.ca/

Tuesday, January 21, 2014

Week 3 - Object Oriented Programming

Hey there, I'm known as GR and this is the start of my blog. I'm mainly starting this blog because it is an assignment, but feel free to read on if you find the above topic interesting.

A note: I do have previous experience with other languages like Java, C# and Visual Basic, and this course is rather like a refresher.

Object Oriented Programming...

This week, my lecture section was given an introduction to Object Oriented Programming (OOP). What is OOP? Of course, it means using objects in programming - well that makes sense (not really). I will give you a brief definition of what an object in programming is. An object is something that has related states and behaviors. What does this mean? Instead of giving a technical definition, I will give you an example of what an object might look like using a real-life application. Suppose that we have a bicycle. Some features of the bicycle include the ability to ride it, turning the front wheel to go left or right, variable speeds, etc. Some attributes that the bicycle include pedals for our feet to push against to move the bicycle, gears to assist us when going uphill or simply to gain speed, and the type of seat on the bicycle. I'm sure that you have a basic understanding of what a bicycle is; if not, there is a search engine called Google if you want more information!

Given my bicycle example, we can represent a bicycle as a class in an object oriented programming language. Classes are essentially objects, which have related states and behaviors, as stated previously. The features of the bicycle in my real-life example can be seen as methods or functions while the attributes are similar to instance variables of a Bicycle class. For example, one feature of a real bicycle is to turn the wheel left or right. If we introduce an instance variable that keeps track of the direction of a Bicycle object's "wheel", we can make a function that would assign 'left' or 'right' to an instance variable. In other words, we are setting a behavior of the Bicycle object. One attribute of a real bicycle could be the gear that its on. In a Bicycle class, an instance variable called 'gear' which stores the gear number can be used (perhaps 1 can be the lowest, 6 being the highest). When we set what the gear shift should be on (e.g. by assigning an integer by initializing a Bicycle object), we are setting a state of the bicycle class.

An Interesting Issue....
In CSC148, we learn OOP using Python. Python lacks the ability to encapsulate objects, fields, functions or methods. When you are encapsulating an object, you are essentially preventing access from other developers from accessing specific fields from outside a given class. Since Python doesn't have that ability, developers use conventions to notify that specific methods or instance variables shouldn't be touched. For example, instance variable names are led with an underscore before the intended name of a variable (e.g. self._gear, assuming we are talking about my Bicycle example). I had a small quarrel with the instructor because he claims that the Python programming language is completely Object Oriented (OO). As I have mentioned, Python lacks effectiveness to encapsulate objects or fields. If I were to create a class in Java which is OO, for example, and I wanted to create an Application Programming Interface (API) for some program for a developer, I want to be able to block access a few sensitive features of the program. Java has a keyword called private, and when you use this keyword prior to declaring a field, class or method, then it would ensure that its only accessible within the class and nowhere else. In Python, in an initialization method of a class, for example, you could only state a convention that only notifies other developers not to touch a certain field or method (underscore, followed by the field or method name). Python does not have a private keyword, and anything that is meant to be private can be touched. I proposed that Python is a hybrid between a functional and OO paradigm. At the end, he concluded that because Python's creators call it object oriented, then it must mean that it is.I was not satisfied with the answer, and I will continue to learn more about the Python language to uncover the answer for myself.

Sorting and Efficiency...
The next topic I want to talk about is about types of sorting and efficiency of algorithms. Simple sorting algorithms include bubble sort, insertion sort, and selection sort. In most cases, they are used to sort a list or lists of integers in ascending or descending order. An algorithm as simple as bubble sort is usually avoided when sorting a list of integers. This is because of the number of steps that it needs to take to sort a list of integers. Bubble sort continuously compares neighboring elements to see which one has a greater value, then swaps the two values depending on what kind of order the list needs to be in. The problem with this is that there are too many loops and comparisons being made, and it makes this algorithm inefficient. As you move on to a more complex algorithm such as insertion sort, the number of comparisons and loops made are reduced, which makes the task of sorting a list of integers faster, or rather, more efficient.

Recursion...
I want to briefly talk about recursion, which is also what I learned this week. In a nut-shell, recursion is when you call a function or a method inside itself. Most of the time, it is used to modify the values of parameters of a method passed into the function or method under certain conditions to get a desired result. Examples include adding integers using binary operations, or computing the factorial of a natural number. Recursion when used correctly can be used to significantly reduce the amount of coding required for a method when used correctly. It's rarely used by developers due to the complexity of coming up with ways to use them.

Final thoughts...
I really hope that I get to learn more about algorithms in CSC148. I will end my post here, and looking back at this, it's much longer than I intended it to be. The SLOGs for this week from other students seem to be centered around their feelings towards OOP while I did get a bit technical. After reading a few, I see that they also find OOP a clean-cut concept - quite easy to understand. If you have any questions about OOP or if you have suggestions about this post or blog, feel free to drop a comment below and I will answer them whenever I can. Bye for now!